Re: Region Servers going down frequently

Andrew Purtell Wed, 08 Apr 2009 10:15:54 -0700

Hi Rakhi,

No, that's not what I meant.


You can run HDFS daemons on one set of instances, and
TaskTrackers on another set of instances. No need to write up
access lists. Yes, the mapreduce subsystem will also use the
HDFS volume along with HBase, this is not a problem. The
potential benefit of splitting function this way is that
mapreduce tasks would not contend with the various functions
of the storage cluster. However I think you have more basic
problems at this time.

As Amandeep has followed up with you already, your instances do
NOT have enough RAM. You must use large or x-large instances.
Start there. Another benefit of larger instances is more local
instance storage in addition to the additional RAM. Also please
follow Amandeep's advice to raise the maximum number of 
allowable open files at the OS level and the maximum number of
xceivers allowable at the HDFS level. Have you looked at the
troubleshooting page on the wiki?
http://wiki.apache.org/hadoop/Hbase/Troubleshooting

Given your current small data set you do not need 20 nodes for
it. Try standing up a smaller cluster of large or x-large
instances. 

Also 

   - Andy


> From: Rakhi Khatwani <[email protected]>
> Subject: Re: Region Servers going down frequently
> To: [email protected], [email protected]
> Cc: "Ninad" <[email protected]>
> Date: Tuesday, April 7, 2009, 11:04 PM
> Hi Andy,
>             I think i figured it out.
> We will have to set mapred.hosts and dfs.hosts property in
> hadoop-site.xml
> as follows:
> 
> <property>
>   <name>dfs.hosts</name>
>   <value>filename1</value>
>   <description>Names a file that contains a list of
> hosts that are
>   permitted to connect to the namenode. The full pathname
> of the file
>   must be specified.  If the value is empty, all hosts are
>   permitted.</description>
> </property>
> 
> [where filename1 will contain list of instances to be
> considered for
> storage]
> 
> <property>
>   <name>mapred.hosts</name>
>   <value>filename2</value>
>   <description>Names a file that contains the list of
> nodes that may
>   connect to the jobtracker.  If the value is empty, all
> hosts are
>   permitted.</description>
> </property>
> 
> [Where filename2 will contain list of instances which will
> carry out
> computation tasks]
> 
> Correct me if i am wrong.
> 
> Thanks once again,
> Rakhi.
> 
> On Wed, Apr 8, 2009 at 10:45 AM, Rakhi Khatwani
> <[email protected]>wrote:
> 
> > Hi Andy,
> >              Thanks for your suggesstion.
> > But i was wondering how could we seperate HDFS storage
> from Mapred
> > Computations. as mapred uses the same master/slave
> configuration as HDFS.
> >
> > did you mean using a set of instances as slaves and
> another set of
> > instances as regionservers.??
> >
> > Thanks in Advance,
> > Rakhi
> >
> >
> > On Tue, Apr 7, 2009 at 11:06 PM, Andrew Purtell
> <[email protected]>wrote:
> >
> >>
> >> Hi Rakhi,
> >>
> >> The "cannot obtain block" error is
> actually a HDFS problem. Most
> >> likely this block was lost by HDFS during a period
> of excessive
> >> load. Usually the first sign you are using
> insufficient
> >> resources for your load is filesystem issues such
> as these. To
> >> address the problems I recommend you do two things
> at once.
> >>
> >> 1) The minimum usable instance type for HBase (and
> Hadoop) is
> >> large in my opinion. The basic rule of thumb for
> HBase and
> >> Hadoop daemons is you must allocate 1GB of
> heap/RAM and one
> >> CPU (or vcpu) thread for each daemon. You can
> search the
> >> hbase-user@ archives for previous discussion on
> this topic.
> >>
> >> 2) Allocate more instances to spread the load on
> DFS.
> >>
> >> On EC2 I recommend running storage such as
> HDFS/HBase on one set
> >> of instances and mapreduce computations on another
> set. Hadoop
> >> and HBase daemons are sensitive to thread
> starvation problems.
> >>
> >> Hope this helps,
> >>
> >>   - Andy
> >>
> >> > From: Rakhi Khatwani
> >> > Subject: Region Servers going down frequently
> >> > Date: Tuesday, April 7, 2009, 2:45 AM
> >> > Hi,
> >> >       I have a 20 node cluster on ec2(small
> instance).... i
> >> > have a set of tables which store huge amount
> of data (tried
> >> > wid 10,000 rows... more to be added).... but
> during my map
> >> > reduce jobs, some of the region servers shut
> >> > down thereby causing data loss, stop in my
> program
> >> > execution and infact one of my tables got
> damaged. when ever
> >> > i scan the table, i get the could not obtain
> block error.
> >>
> >>
> >>
> >>
> >>
> >

Re: Region Servers going down frequently

Reply via email to