Re: HBase cluster design
Setting cache blocks to false while running an MR Jon serves 2 purposes - it helps real time requests by not out caches for MR jobs which don't really need/use them, it helps prevent churn/fragmentation in block cache which helps GC There are properties you need to set in hbase-site.xml for that Sent from Yahoo Mail on Android
Re: HBase cluster design
Thanks again for the tips! And what about cache blocks? Why should I avoid it? On Wed, May 28, 2014 at 2:57 PM, Vikram Singh Chandel < vikramsinghchan...@gmail.com> wrote: > Hi Flavio > > I suppose you are using Cloudera Manager for HBase management, to change > RAM usage cloudera has different steps: > > 1: Go to HBase service > 2: Click on configurations > 3: Click on View and Edit > 4: Search for Environment safety valve > 5: make following entry there "HBASE_HEAPSIZE=3072" (without quotes) > 6: Save -> deploy client configurations (under Action tab) -> Restart HBase > > > On Wed, May 28, 2014 at 12:35 PM, Flavio Pompermaier > wrote: > > > Thank you all for the great suggestions, I'll try ASAP to test them. > Just 2 > > questions: > > - why should I set setCacheBlocks to false? > > - How cai I increasing/decreasing the amount of RAM you provide to block > > caches and memstores? > > > > Best, > > Flavio > > > > > > On Tue, May 27, 2014 at 2:54 AM, Henry Hung wrote: > > > > > @Flavio: > > > One thing you should consider: is CPU over saturated? > > > > > > For instance, if one server has 24 cores, and the task tracker is > > > configured to also execute 24 MR simultaneously, then there is no core > > left > > > for HBase to do GC, and then crash. > > > > > > Few weeks ago my region server sometimes crash when MR is running, > then I > > > decide to move MR cluster to another machine leaving only DataNode + > > > RegionServer running. > > > After change, my regionserver still running without crash. > > > > > > My suggestion is you could try to decrease the MR task numbers to > around > > > 12 or lesser, and see if the frequency of crash decrease? > > > > > > Best regards, > > > Henry Hung > > > > > > -Original Message- > > > From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] > > > Sent: Tuesday, May 27, 2014 8:03 AM > > > To: user@hbase.apache.org > > > Subject: Re: HBase cluster design > > > > > > A few things pop out to me on cursory glance: > > > - You are using CMSIncrementalMode which after a long chain of events > has > > > a tendency to result in the famous Juliet pause of death. Can you try > Par > > > New GC instead and see if that helps? > > > - You should try to reduce the CMSInitiatingOccupancyFraction to avoid > a > > > full GC > > > - Your hbase-env.sh is not setting the Xmx at all. Do you know how much > > > RAM you are giving to your region servers? It may be too small or too > > large > > > given your use case and machines size > > > - Your client scanner caching is 1 which may be too small depending on > > > your row sizes. You can also override that setting in your scan for the > > MR > > > job > > > - You only have 2 zookeeper instances which is not at all recommended. > > > Zookeeper needs a quorum to operate and generally works best with an > odd > > > number of zookeeper servers. This probably isn't related to your > crashes > > > but it would help stability if you had 1 or 3 zookeepers > > > - I am not 100% sure if the version of hbase you are using has mslab > > > enabled. If not you should enable it. > > > - You can try increasing/decreasing the amount of RAM you provide to > > block > > > caches and memstores to suit your use case. I see that you are using > the > > > defaults here > > > > > > On top of these, when you kick off your MR job to scan HBase you should > > > setCacheBlocks to false > > > > > > > > > > > > Regards, > > > Dhaval > > > > > > > > > > > > From: Flavio Pompermaier > > > To: user@hbase.apache.org; Dhaval Shah > > > Sent: Friday, 23 May 2014 3:16 AM > > > Subject: Re: HBase cluster design > > > > > > > > > > > > The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk > each > > > server > > > Attached my hbase config files. > > > > > > Thanks, > > > Flavio > > > > > > > > > > > > > > > > > > On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah < > > prince_mithi...@yahoo.co.in> > > > wrote: > > > > > > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs > of > > &
Re: HBase cluster design
Hi Flavio I suppose you are using Cloudera Manager for HBase management, to change RAM usage cloudera has different steps: 1: Go to HBase service 2: Click on configurations 3: Click on View and Edit 4: Search for Environment safety valve 5: make following entry there "HBASE_HEAPSIZE=3072" (without quotes) 6: Save -> deploy client configurations (under Action tab) -> Restart HBase On Wed, May 28, 2014 at 12:35 PM, Flavio Pompermaier wrote: > Thank you all for the great suggestions, I'll try ASAP to test them. Just 2 > questions: > - why should I set setCacheBlocks to false? > - How cai I increasing/decreasing the amount of RAM you provide to block > caches and memstores? > > Best, > Flavio > > > On Tue, May 27, 2014 at 2:54 AM, Henry Hung wrote: > > > @Flavio: > > One thing you should consider: is CPU over saturated? > > > > For instance, if one server has 24 cores, and the task tracker is > > configured to also execute 24 MR simultaneously, then there is no core > left > > for HBase to do GC, and then crash. > > > > Few weeks ago my region server sometimes crash when MR is running, then I > > decide to move MR cluster to another machine leaving only DataNode + > > RegionServer running. > > After change, my regionserver still running without crash. > > > > My suggestion is you could try to decrease the MR task numbers to around > > 12 or lesser, and see if the frequency of crash decrease? > > > > Best regards, > > Henry Hung > > > > -Original Message- > > From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] > > Sent: Tuesday, May 27, 2014 8:03 AM > > To: user@hbase.apache.org > > Subject: Re: HBase cluster design > > > > A few things pop out to me on cursory glance: > > - You are using CMSIncrementalMode which after a long chain of events has > > a tendency to result in the famous Juliet pause of death. Can you try Par > > New GC instead and see if that helps? > > - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a > > full GC > > - Your hbase-env.sh is not setting the Xmx at all. Do you know how much > > RAM you are giving to your region servers? It may be too small or too > large > > given your use case and machines size > > - Your client scanner caching is 1 which may be too small depending on > > your row sizes. You can also override that setting in your scan for the > MR > > job > > - You only have 2 zookeeper instances which is not at all recommended. > > Zookeeper needs a quorum to operate and generally works best with an odd > > number of zookeeper servers. This probably isn't related to your crashes > > but it would help stability if you had 1 or 3 zookeepers > > - I am not 100% sure if the version of hbase you are using has mslab > > enabled. If not you should enable it. > > - You can try increasing/decreasing the amount of RAM you provide to > block > > caches and memstores to suit your use case. I see that you are using the > > defaults here > > > > On top of these, when you kick off your MR job to scan HBase you should > > setCacheBlocks to false > > > > > > > > Regards, > > Dhaval > > > > > > > > From: Flavio Pompermaier > > To: user@hbase.apache.org; Dhaval Shah > > Sent: Friday, 23 May 2014 3:16 AM > > Subject: Re: HBase cluster design > > > > > > > > The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each > > server > > Attached my hbase config files. > > > > Thanks, > > Flavio > > > > > > > > > > > > On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah < > prince_mithi...@yahoo.co.in> > > wrote: > > > > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of > > your cluster? > > > > > > > > > > > > > > >Regards, > > >Dhaval > > > > > > > > > > > > From: Flavio Pompermaier > > >To: user@hbase.apache.org > > >Sent: Saturday, 17 May 2014 2:49 AM > > >Subject: Re: HBase cluster design > > > > > > > > > > > >Could you tell me please in detail the parameters you'd like to see so i > > >can look for them and learn the important ones?i'm using cloudera, cdh4 > in > > >one cluster and cdh5 in the other. > > > > > >Best, > > >Flavio > > > > > >On May 17, 2014 2:48 AM, "prince_mithi...@ya
Re: HBase cluster design
For #2, see http://hbase.apache.org/book/perf.configurations.html Relevant config parameters start with 12.4.3 Cheers On May 28, 2014, at 12:05 AM, Flavio Pompermaier wrote: > Thank you all for the great suggestions, I'll try ASAP to test them. Just 2 > questions: > - why should I set setCacheBlocks to false? > - How cai I increasing/decreasing the amount of RAM you provide to block > caches and memstores? > > Best, > Flavio > > > On Tue, May 27, 2014 at 2:54 AM, Henry Hung wrote: > >> @Flavio: >> One thing you should consider: is CPU over saturated? >> >> For instance, if one server has 24 cores, and the task tracker is >> configured to also execute 24 MR simultaneously, then there is no core left >> for HBase to do GC, and then crash. >> >> Few weeks ago my region server sometimes crash when MR is running, then I >> decide to move MR cluster to another machine leaving only DataNode + >> RegionServer running. >> After change, my regionserver still running without crash. >> >> My suggestion is you could try to decrease the MR task numbers to around >> 12 or lesser, and see if the frequency of crash decrease? >> >> Best regards, >> Henry Hung >> >> -Original Message- >> From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] >> Sent: Tuesday, May 27, 2014 8:03 AM >> To: user@hbase.apache.org >> Subject: Re: HBase cluster design >> >> A few things pop out to me on cursory glance: >> - You are using CMSIncrementalMode which after a long chain of events has >> a tendency to result in the famous Juliet pause of death. Can you try Par >> New GC instead and see if that helps? >> - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a >> full GC >> - Your hbase-env.sh is not setting the Xmx at all. Do you know how much >> RAM you are giving to your region servers? It may be too small or too large >> given your use case and machines size >> - Your client scanner caching is 1 which may be too small depending on >> your row sizes. You can also override that setting in your scan for the MR >> job >> - You only have 2 zookeeper instances which is not at all recommended. >> Zookeeper needs a quorum to operate and generally works best with an odd >> number of zookeeper servers. This probably isn't related to your crashes >> but it would help stability if you had 1 or 3 zookeepers >> - I am not 100% sure if the version of hbase you are using has mslab >> enabled. If not you should enable it. >> - You can try increasing/decreasing the amount of RAM you provide to block >> caches and memstores to suit your use case. I see that you are using the >> defaults here >> >> On top of these, when you kick off your MR job to scan HBase you should >> setCacheBlocks to false >> >> >> >> Regards, >> Dhaval >> >> >> >> From: Flavio Pompermaier >> To: user@hbase.apache.org; Dhaval Shah >> Sent: Friday, 23 May 2014 3:16 AM >> Subject: Re: HBase cluster design >> >> >> >> The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each >> server >> Attached my hbase config files. >> >> Thanks, >> Flavio >> >> >> >> >> >> On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah >> wrote: >> >> Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of >> your cluster? >>> >>> >>> >>> >>> Regards, >>> Dhaval >>> >>> >>> >>> From: Flavio Pompermaier >>> To: user@hbase.apache.org >>> Sent: Saturday, 17 May 2014 2:49 AM >>> Subject: Re: HBase cluster design >>> >>> >>> >>> Could you tell me please in detail the parameters you'd like to see so i >>> can look for them and learn the important ones?i'm using cloudera, cdh4 in >>> one cluster and cdh5 in the other. >>> >>> Best, >>> Flavio >>> >>> On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < >>> prince_mithi...@yahoo.co.in> wrote: >>> >>>> Can you describe your setup in more detail? Specifically the amount of >>>> heap hbase region servers have and your GC settings. Is your server >>>> swapping when your MR obs are running? Also do your regions go down or >> your >>>> region servers? >>>> >>>
Re: HBase cluster design
Thank you all for the great suggestions, I'll try ASAP to test them. Just 2 questions: - why should I set setCacheBlocks to false? - How cai I increasing/decreasing the amount of RAM you provide to block caches and memstores? Best, Flavio On Tue, May 27, 2014 at 2:54 AM, Henry Hung wrote: > @Flavio: > One thing you should consider: is CPU over saturated? > > For instance, if one server has 24 cores, and the task tracker is > configured to also execute 24 MR simultaneously, then there is no core left > for HBase to do GC, and then crash. > > Few weeks ago my region server sometimes crash when MR is running, then I > decide to move MR cluster to another machine leaving only DataNode + > RegionServer running. > After change, my regionserver still running without crash. > > My suggestion is you could try to decrease the MR task numbers to around > 12 or lesser, and see if the frequency of crash decrease? > > Best regards, > Henry Hung > > -Original Message- > From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] > Sent: Tuesday, May 27, 2014 8:03 AM > To: user@hbase.apache.org > Subject: Re: HBase cluster design > > A few things pop out to me on cursory glance: > - You are using CMSIncrementalMode which after a long chain of events has > a tendency to result in the famous Juliet pause of death. Can you try Par > New GC instead and see if that helps? > - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a > full GC > - Your hbase-env.sh is not setting the Xmx at all. Do you know how much > RAM you are giving to your region servers? It may be too small or too large > given your use case and machines size > - Your client scanner caching is 1 which may be too small depending on > your row sizes. You can also override that setting in your scan for the MR > job > - You only have 2 zookeeper instances which is not at all recommended. > Zookeeper needs a quorum to operate and generally works best with an odd > number of zookeeper servers. This probably isn't related to your crashes > but it would help stability if you had 1 or 3 zookeepers > - I am not 100% sure if the version of hbase you are using has mslab > enabled. If not you should enable it. > - You can try increasing/decreasing the amount of RAM you provide to block > caches and memstores to suit your use case. I see that you are using the > defaults here > > On top of these, when you kick off your MR job to scan HBase you should > setCacheBlocks to false > > > > Regards, > Dhaval > > > > From: Flavio Pompermaier > To: user@hbase.apache.org; Dhaval Shah > Sent: Friday, 23 May 2014 3:16 AM > Subject: Re: HBase cluster design > > > > The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each > server > Attached my hbase config files. > > Thanks, > Flavio > > > > > > On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah > wrote: > > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of > your cluster? > > > > > > > > > >Regards, > >Dhaval > > > > > > > > From: Flavio Pompermaier > >To: user@hbase.apache.org > >Sent: Saturday, 17 May 2014 2:49 AM > >Subject: Re: HBase cluster design > > > > > > > >Could you tell me please in detail the parameters you'd like to see so i > >can look for them and learn the important ones?i'm using cloudera, cdh4 in > >one cluster and cdh5 in the other. > > > >Best, > >Flavio > > > >On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < > >prince_mithi...@yahoo.co.in> wrote: > > > >> Can you describe your setup in more detail? Specifically the amount of > >> heap hbase region servers have and your GC settings. Is your server > >> swapping when your MR obs are running? Also do your regions go down or > your > >> region servers? > >> > >> We run many MR jobs simultaneously on our hbase tables (size is in TBs) > >> along with serving real time requests at the same time. So I can vouch > for > >> the fact that a well tuned hbase cluster definitely supports this use > case > >> (well-tuned is the key word here) > >> > >> Sent from Yahoo Mail on Android > >> > >> > > The privileged confidential information contained in this email is > intended for use only by the addressees as indicated by the original sender > of this email. If you are not the addressee indicated in this email or are > not responsible for delivery of the email to such a person, please kindly > reply to the sender indicating this fact and delete all copies of it from > your computer and network server immediately. Your cooperation is highly > appreciated. It is advised that any unauthorized use of confidential > information of Winbond is strictly prohibited; and any information in this > email irrelevant to the official business of Winbond shall be deemed as > neither given nor endorsed by Winbond. >
RE: HBase cluster design
@Flavio: One thing you should consider: is CPU over saturated? For instance, if one server has 24 cores, and the task tracker is configured to also execute 24 MR simultaneously, then there is no core left for HBase to do GC, and then crash. Few weeks ago my region server sometimes crash when MR is running, then I decide to move MR cluster to another machine leaving only DataNode + RegionServer running. After change, my regionserver still running without crash. My suggestion is you could try to decrease the MR task numbers to around 12 or lesser, and see if the frequency of crash decrease? Best regards, Henry Hung -Original Message- From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] Sent: Tuesday, May 27, 2014 8:03 AM To: user@hbase.apache.org Subject: Re: HBase cluster design A few things pop out to me on cursory glance: - You are using CMSIncrementalMode which after a long chain of events has a tendency to result in the famous Juliet pause of death. Can you try Par New GC instead and see if that helps? - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC - Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you are giving to your region servers? It may be too small or too large given your use case and machines size - Your client scanner caching is 1 which may be too small depending on your row sizes. You can also override that setting in your scan for the MR job - You only have 2 zookeeper instances which is not at all recommended. Zookeeper needs a quorum to operate and generally works best with an odd number of zookeeper servers. This probably isn't related to your crashes but it would help stability if you had 1 or 3 zookeepers - I am not 100% sure if the version of hbase you are using has mslab enabled. If not you should enable it. - You can try increasing/decreasing the amount of RAM you provide to block caches and memstores to suit your use case. I see that you are using the defaults here On top of these, when you kick off your MR job to scan HBase you should setCacheBlocks to false Regards, Dhaval From: Flavio Pompermaier To: user@hbase.apache.org; Dhaval Shah Sent: Friday, 23 May 2014 3:16 AM Subject: Re: HBase cluster design The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server Attached my hbase config files. Thanks, Flavio On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah wrote: Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your cluster? > > > > >Regards, >Dhaval > > > > From: Flavio Pompermaier >To: user@hbase.apache.org >Sent: Saturday, 17 May 2014 2:49 AM >Subject: Re: HBase cluster design > > > >Could you tell me please in detail the parameters you'd like to see so i >can look for them and learn the important ones?i'm using cloudera, cdh4 in >one cluster and cdh5 in the other. > >Best, >Flavio > >On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < >prince_mithi...@yahoo.co.in> wrote: > >> Can you describe your setup in more detail? Specifically the amount of >> heap hbase region servers have and your GC settings. Is your server >> swapping when your MR obs are running? Also do your regions go down or your >> region servers? >> >> We run many MR jobs simultaneously on our hbase tables (size is in TBs) >> along with serving real time requests at the same time. So I can vouch for >> the fact that a well tuned hbase cluster definitely supports this use case >> (well-tuned is the key word here) >> >> Sent from Yahoo Mail on Android >> >> The privileged confidential information contained in this email is intended for use only by the addressees as indicated by the original sender of this email. If you are not the addressee indicated in this email or are not responsible for delivery of the email to such a person, please kindly reply to the sender indicating this fact and delete all copies of it from your computer and network server immediately. Your cooperation is highly appreciated. It is advised that any unauthorized use of confidential information of Winbond is strictly prohibited; and any information in this email irrelevant to the official business of Winbond shall be deemed as neither given nor endorsed by Winbond.
Re: HBase cluster design
A few things pop out to me on cursory glance: - You are using CMSIncrementalMode which after a long chain of events has a tendency to result in the famous Juliet pause of death. Can you try Par New GC instead and see if that helps? - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC - Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you are giving to your region servers? It may be too small or too large given your use case and machines size - Your client scanner caching is 1 which may be too small depending on your row sizes. You can also override that setting in your scan for the MR job - You only have 2 zookeeper instances which is not at all recommended. Zookeeper needs a quorum to operate and generally works best with an odd number of zookeeper servers. This probably isn't related to your crashes but it would help stability if you had 1 or 3 zookeepers - I am not 100% sure if the version of hbase you are using has mslab enabled. If not you should enable it. - You can try increasing/decreasing the amount of RAM you provide to block caches and memstores to suit your use case. I see that you are using the defaults here On top of these, when you kick off your MR job to scan HBase you should setCacheBlocks to false Regards, Dhaval From: Flavio Pompermaier To: user@hbase.apache.org; Dhaval Shah Sent: Friday, 23 May 2014 3:16 AM Subject: Re: HBase cluster design The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server Attached my hbase config files. Thanks, Flavio On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah wrote: Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your cluster? > > > > >Regards, >Dhaval > > > > From: Flavio Pompermaier >To: user@hbase.apache.org >Sent: Saturday, 17 May 2014 2:49 AM >Subject: Re: HBase cluster design > > > >Could you tell me please in detail the parameters you'd like to see so i >can look for them and learn the important ones?i'm using cloudera, cdh4 in >one cluster and cdh5 in the other. > >Best, >Flavio > >On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < >prince_mithi...@yahoo.co.in> wrote: > >> Can you describe your setup in more detail? Specifically the amount of >> heap hbase region servers have and your GC settings. Is your server >> swapping when your MR obs are running? Also do your regions go down or your >> region servers? >> >> We run many MR jobs simultaneously on our hbase tables (size is in TBs) >> along with serving real time requests at the same time. So I can vouch for >> the fact that a well tuned hbase cluster definitely supports this use case >> (well-tuned is the key word here) >> >> Sent from Yahoo Mail on Android >> >>
Re: HBase cluster design
The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server Attached my hbase config files. Thanks, Flavio On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah wrote: > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of > your cluster? > > > > > Regards, > Dhaval > > > > From: Flavio Pompermaier > To: user@hbase.apache.org > Sent: Saturday, 17 May 2014 2:49 AM > Subject: Re: HBase cluster design > > > Could you tell me please in detail the parameters you'd like to see so i > can look for them and learn the important ones?i'm using cloudera, cdh4 in > one cluster and cdh5 in the other. > > Best, > Flavio > > On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < > prince_mithi...@yahoo.co.in> wrote: > > > Can you describe your setup in more detail? Specifically the amount of > > heap hbase region servers have and your GC settings. Is your server > > swapping when your MR obs are running? Also do your regions go down or > your > > region servers? > > > > We run many MR jobs simultaneously on our hbase tables (size is in TBs) > > along with serving real time requests at the same time. So I can vouch > for > > the fact that a well tuned hbase cluster definitely supports this use > case > > (well-tuned is the key word here) > > > > Sent from Yahoo Mail on Android > > > > > hbase-env.sh Description: Bourne shell script hbase.rootdir hdfs://server4:8020/hbase hbase.client.write.buffer 2097152 hbase.client.pause 1000 hbase.client.retries.number 10 hbase.client.scanner.caching 1 hbase.client.keyvalue.maxsize 10485760 hbase.security.authentication simple zookeeper.session.timeout 6 zookeeper.znode.parent /hbase zookeeper.znode.rootserver root-region-server hbase.zookeeper.quorum server1,server2 hbase.zookeeper.property.clientPort 2181
Re: HBase cluster design
Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your cluster? Regards, Dhaval From: Flavio Pompermaier To: user@hbase.apache.org Sent: Saturday, 17 May 2014 2:49 AM Subject: Re: HBase cluster design Could you tell me please in detail the parameters you'd like to see so i can look for them and learn the important ones?i'm using cloudera, cdh4 in one cluster and cdh5 in the other. Best, Flavio On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < prince_mithi...@yahoo.co.in> wrote: > Can you describe your setup in more detail? Specifically the amount of > heap hbase region servers have and your GC settings. Is your server > swapping when your MR obs are running? Also do your regions go down or your > region servers? > > We run many MR jobs simultaneously on our hbase tables (size is in TBs) > along with serving real time requests at the same time. So I can vouch for > the fact that a well tuned hbase cluster definitely supports this use case > (well-tuned is the key word here) > > Sent from Yahoo Mail on Android > >
Re: HBase cluster design
Could you tell me please in detail the parameters you'd like to see so i can look for them and learn the important ones?i'm using cloudera, cdh4 in one cluster and cdh5 in the other. Best, Flavio On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" < prince_mithi...@yahoo.co.in> wrote: > Can you describe your setup in more detail? Specifically the amount of > heap hbase region servers have and your GC settings. Is your server > swapping when your MR obs are running? Also do your regions go down or your > region servers? > > We run many MR jobs simultaneously on our hbase tables (size is in TBs) > along with serving real time requests at the same time. So I can vouch for > the fact that a well tuned hbase cluster definitely supports this use case > (well-tuned is the key word here) > > Sent from Yahoo Mail on Android > >
Re: HBase cluster design
Can you describe your setup in more detail? Specifically the amount of heap hbase region servers have and your GC settings. Is your server swapping when your MR obs are running? Also do your regions go down or your region servers? We run many MR jobs simultaneously on our hbase tables (size is in TBs) along with serving real time requests at the same time. So I can vouch for the fact that a well tuned hbase cluster definitely supports this use case (well-tuned is the key word here) Sent from Yahoo Mail on Android
Re: HBase cluster design
Thanks for the response. Actually I'm still trying to understand why some of the regions of my Hbase goes down from time to time during my mapred job if table updates occur because in the logs there's nothing interesting..The updates usually happens in bursts of 10/100 sequential puts per seconds. Is there any rule of thumb about those scenarios to avoid problems or some fundamental tuning to check? I have a lot of ram (48g) and 24 processor per server (for a total of 4 servers) and I have not that much data (20g) so I don't understand why the region servers goes down (usually after a couple of mapred job). However in general, speaking also with other people using Hbase, it seems that is not very safe to run mapred jobs while updating the table..are we wrong? Best, Flavio On Wed, May 14, 2014 at 7:04 PM, Stack wrote: > On Tue, May 13, 2014 at 3:14 AM, Flavio Pompermaier >wrote: > > > So just to summarize the result of this discussion.. > > do you confirm me that the last version of HBase should (in theory) > support > > mapreduce jobs on tables that in the meantime could be updated by > external > > processes (i.e. not by the mapred job)? > > One of the answer about this was saying: "Poorly tuned HBase clusters can > > fail easily under heavy load".. > > Could you suggest me some tuning to avoid the crashing of HBase in such > > situations? > > > > > Run less mappers/reducers. Start with one only and move up from there. > Ditto for other processes updating hbase. > > You have monitoring going on on this cluster? What is it telling you about > the loadings? > > St.Ack
Re: HBase cluster design
On Tue, May 13, 2014 at 3:14 AM, Flavio Pompermaier wrote: > So just to summarize the result of this discussion.. > do you confirm me that the last version of HBase should (in theory) support > mapreduce jobs on tables that in the meantime could be updated by external > processes (i.e. not by the mapred job)? > One of the answer about this was saying: "Poorly tuned HBase clusters can > fail easily under heavy load".. > Could you suggest me some tuning to avoid the crashing of HBase in such > situations? > Run less mappers/reducers. Start with one only and move up from there. Ditto for other processes updating hbase. You have monitoring going on on this cluster? What is it telling you about the loadings? St.Ack
Re: HBase cluster design
Of course I did..but that's not very helpful, is too broad! I need something more specific to the tuning of HBase for the support of the run of multiple mapred jobs while updating the tables.. Best, Flavio On Tue, May 13, 2014 at 3:20 PM, Ted Yu wrote: > Have you looked at http://hbase.apache.org/book/performance.html ? > > Cheers > > On May 13, 2014, at 3:14 AM, Flavio Pompermaier > wrote: > > > So just to summarize the result of this discussion.. > > do you confirm me that the last version of HBase should (in theory) > support > > mapreduce jobs on tables that in the meantime could be updated by > external > > processes (i.e. not by the mapred job)? > > One of the answer about this was saying: "Poorly tuned HBase clusters can > > fail easily under heavy load".. > > Could you suggest me some tuning to avoid the crashing of HBase in such > > situations? > > > > Best, > > Flavio > > > > > > On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier > > wrote: > > > >> Today I was able to catch an error during a mapreduce job that actually > >> mimes the rowCount more or less. > >> The error I saw is: > >> > >> ould not sync. Requesting close of hlog > >> java.io.IOException: Reflection > >>at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230) > >>at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141) > >>at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245) > >>at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100) > >>at java.lang.Thread.run(Thread.java:662) > >> Caused by: java.lang.reflect.InvocationTargetException > >>at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) > >>at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>at java.lang.reflect.Method.invoke(Method.java:597) > >>at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228) > >>... 4 more > >> Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300 > File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not > have any open files. > >>at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308) > >>at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299) > >>at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095) > >>at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) > >>at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) > >>at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) > >>at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > >>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) > >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) > >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) > >>at java.security.AccessController.doPrivileged(Native Method) > >>at javax.security.auth.Subject.doAs(Subject.java:396) > >>at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > >>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) > >> > >>at org.apache.hadoop.ipc.Client.call(Client.java:1160) > >>at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > >>at $Proxy14.addBlock(Unknown Source) > >>at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > >>at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>at java.lang.reflect.Method.invoke(Method.java:597) > >>at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > >>at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > >>at $Proxy14.addBlock(Unknown Source) > >>at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) > >>at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) > >>at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) > >>at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) > >> > >> > >> What can be the cause of this error? > >> > >> > >> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel < > michael_se...@hot
Re: HBase cluster design
Have you looked at http://hbase.apache.org/book/performance.html ? Cheers On May 13, 2014, at 3:14 AM, Flavio Pompermaier wrote: > So just to summarize the result of this discussion.. > do you confirm me that the last version of HBase should (in theory) support > mapreduce jobs on tables that in the meantime could be updated by external > processes (i.e. not by the mapred job)? > One of the answer about this was saying: "Poorly tuned HBase clusters can > fail easily under heavy load".. > Could you suggest me some tuning to avoid the crashing of HBase in such > situations? > > Best, > Flavio > > > On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier > wrote: > >> Today I was able to catch an error during a mapreduce job that actually >> mimes the rowCount more or less. >> The error I saw is: >> >> ould not sync. Requesting close of hlog >> java.io.IOException: Reflection >>at >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230) >>at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141) >>at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245) >>at >> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100) >>at java.lang.Thread.run(Thread.java:662) >> Caused by: java.lang.reflect.InvocationTargetException >>at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>at java.lang.reflect.Method.invoke(Method.java:597) >>at >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228) >>... 4 more >> Caused by: >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >> No lease on >> /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300 >> File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not >> have any open files. >>at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308) >>at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299) >>at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095) >>at >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) >>at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) >>at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) >>at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) >>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >>at java.security.AccessController.doPrivileged(Native Method) >>at javax.security.auth.Subject.doAs(Subject.java:396) >>at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) >>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >> >>at org.apache.hadoop.ipc.Client.call(Client.java:1160) >>at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) >>at $Proxy14.addBlock(Unknown Source) >>at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>at java.lang.reflect.Method.invoke(Method.java:597) >>at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) >>at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) >>at $Proxy14.addBlock(Unknown Source) >>at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) >>at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) >>at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) >>at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) >> >> >> What can be the cause of this error? >> >> >> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel >> wrote: >> >>> You have one other thing to consider. >>> >>> Did you oversubscribe on the m/r tuning side of things. >>> >>> Many people want to segment their HBase to a portion of the cluster. >>> This should be the exception to the design not the primary cluster design. >>> >>> If you over subscribe your cluster, you will run out of memory, then you >>> need to
Re: HBase cluster design
So just to summarize the result of this discussion.. do you confirm me that the last version of HBase should (in theory) support mapreduce jobs on tables that in the meantime could be updated by external processes (i.e. not by the mapred job)? One of the answer about this was saying: "Poorly tuned HBase clusters can fail easily under heavy load".. Could you suggest me some tuning to avoid the crashing of HBase in such situations? Best, Flavio On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier wrote: > Today I was able to catch an error during a mapreduce job that actually > mimes the rowCount more or less. > The error I saw is: > > ould not sync. Requesting close of hlog > java.io.IOException: Reflection > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230) > at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228) > ... 4 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300 > File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not > have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) > > at org.apache.hadoop.ipc.Client.call(Client.java:1160) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at $Proxy14.addBlock(Unknown Source) > at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at $Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) > > > What can be the cause of this error? > > > On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel > wrote: > >> You have one other thing to consider. >> >> Did you oversubscribe on the m/r tuning side of things. >> >> Many people want to segment their HBase to a portion of the cluster. >> This should be the exception to the design not the primary cluster design. >> >> If you over subscribe your cluster, you will run out of memory, then you >> need to swap, and boom bad things happen. >> >> Also, while many suggest not reserving room for swap... I suggest that >> you do leave some room. >> >> While t
Re: HBase cluster design
Today I was able to catch an error during a mapreduce job that actually mimes the rowCount more or less. The error I saw is: ould not sync. Requesting close of hlog java.io.IOException: Reflection at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300 File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1160) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy14.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) What can be the cause of this error? On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel wrote: > You have one other thing to consider. > > Did you oversubscribe on the m/r tuning side of things. > > Many people want to segment their HBase to a portion of the cluster. > This should be the exception to the design not the primary cluster design. > > If you over subscribe your cluster, you will run out of memory, then you > need to swap, and boom bad things happen. > > Also, while many suggest not reserving room for swap... I suggest that you > do leave some room. > > While this doesn't address the issues in your question directly, they are > something that you need to consider. > > More to your point... > Poorly tuned HBase clusters can fail easily under heavy load. > > While Ted doesn't address this... consideration, it can become an issue. > > YMMV of course. > > > > On Apr 4, 2014, at 9:43 AM, Ted Yu wrote: > > > The 'Connection refused' message was logged at WARN level. > > > > If you can pastebin more of the region server log before its crash, I > would > > be take a deeper look. > > > > BTW I assume your zookeeper quorum was healthy during that period of > time.
Re: HBase cluster design
You have one other thing to consider. Did you oversubscribe on the m/r tuning side of things. Many people want to segment their HBase to a portion of the cluster. This should be the exception to the design not the primary cluster design. If you over subscribe your cluster, you will run out of memory, then you need to swap, and boom bad things happen. Also, while many suggest not reserving room for swap... I suggest that you do leave some room. While this doesn't address the issues in your question directly, they are something that you need to consider. More to your point... Poorly tuned HBase clusters can fail easily under heavy load. While Ted doesn't address this... consideration, it can become an issue. YMMV of course. On Apr 4, 2014, at 9:43 AM, Ted Yu wrote: > The 'Connection refused' message was logged at WARN level. > > If you can pastebin more of the region server log before its crash, I would > be take a deeper look. > > BTW I assume your zookeeper quorum was healthy during that period of time. > > > On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier > wrote: > >> Yes I know I should update HBase, this is something I'm going to do really >> soon. Bad me.. >> I just wanted to know if the fact of adding/updating rows in HBase while >> running a mapred job could be problematic or not.. >> From what you told me it's not, so the problem could be caused by the old >> version of HBase or some other os configuration. >> The update was performed via an application accessing HBase directly, >> adding and updating rows of the table. >> Once in a while some region servers goes down and marked as "bad state" by >> Cloudera so I have to restart them. >> >> The error I usually see is: >> >> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x13b2cf447fd for server null, unexpected error, closing socket >> connection and attempting reconnect >> java.net.ConnectException: Connection refused >>at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) >>at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) >>at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047 >> >> Best, >> Flavio >> >> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu wrote: >> >>> Was the updating performed by one of the mapreduce jobs ? >>> HBase should be able to serve multiple mapreduce jobs in the same >> cluster. >>> >>> Can you provide more detail on the crash ? >>> >>> BTW, there are 3 major releases after 0.92 >>> Please consider upgrading your cluster to newer release. >>> >>> Cheers >>> >>> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier >>> wrote: >>> Hi to everybody, I have a probably stupid question: is it a problem to run many >> mapreduce jobs on the same HBase table at the same time? And multiple jobs on different tables on the same cluster? Should I use Hoya to have a better cluster usage..? In my current cluster I noticed that the region servers tend to go down >>> if I run a mapreduce job while updating (maybe it could be related to the >>> old version of HBase I'm currently running: 0.92.1-cdh4.1.2). Best, Flavio >>> >> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
Re: HBase cluster design
The 'Connection refused' message was logged at WARN level. If you can pastebin more of the region server log before its crash, I would be take a deeper look. BTW I assume your zookeeper quorum was healthy during that period of time. On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier wrote: > Yes I know I should update HBase, this is something I'm going to do really > soon. Bad me.. > I just wanted to know if the fact of adding/updating rows in HBase while > running a mapred job could be problematic or not.. > From what you told me it's not, so the problem could be caused by the old > version of HBase or some other os configuration. > The update was performed via an application accessing HBase directly, > adding and updating rows of the table. > Once in a while some region servers goes down and marked as "bad state" by > Cloudera so I have to restart them. > > The error I usually see is: > > 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session > 0x13b2cf447fd for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047 > > Best, > Flavio > > On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu wrote: > > > Was the updating performed by one of the mapreduce jobs ? > > HBase should be able to serve multiple mapreduce jobs in the same > cluster. > > > > Can you provide more detail on the crash ? > > > > BTW, there are 3 major releases after 0.92 > > Please consider upgrading your cluster to newer release. > > > > Cheers > > > > On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier > > wrote: > > > > > Hi to everybody, > > > > > > I have a probably stupid question: is it a problem to run many > mapreduce > > > jobs on the same HBase table at the same time? And multiple jobs on > > > different tables on the same cluster? > > > Should I use Hoya to have a better cluster usage..? > > > > > > In my current cluster I noticed that the region servers tend to go down > > if > > > I run a mapreduce job while updating (maybe it could be related to the > > old > > > version of HBase I'm currently running: 0.92.1-cdh4.1.2). > > > > > > Best, > > > Flavio > > >
Re: HBase cluster design
Yes I know I should update HBase, this is something I'm going to do really soon. Bad me.. I just wanted to know if the fact of adding/updating rows in HBase while running a mapred job could be problematic or not.. >From what you told me it's not, so the problem could be caused by the old version of HBase or some other os configuration. The update was performed via an application accessing HBase directly, adding and updating rows of the table. Once in a while some region servers goes down and marked as "bad state" by Cloudera so I have to restart them. The error I usually see is: 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session 0x13b2cf447fd for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047 Best, Flavio On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu wrote: > Was the updating performed by one of the mapreduce jobs ? > HBase should be able to serve multiple mapreduce jobs in the same cluster. > > Can you provide more detail on the crash ? > > BTW, there are 3 major releases after 0.92 > Please consider upgrading your cluster to newer release. > > Cheers > > On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier > wrote: > > > Hi to everybody, > > > > I have a probably stupid question: is it a problem to run many mapreduce > > jobs on the same HBase table at the same time? And multiple jobs on > > different tables on the same cluster? > > Should I use Hoya to have a better cluster usage..? > > > > In my current cluster I noticed that the region servers tend to go down > if > > I run a mapreduce job while updating (maybe it could be related to the > old > > version of HBase I'm currently running: 0.92.1-cdh4.1.2). > > > > Best, > > Flavio >
Re: HBase cluster design
Was the updating performed by one of the mapreduce jobs ? HBase should be able to serve multiple mapreduce jobs in the same cluster. Can you provide more detail on the crash ? BTW, there are 3 major releases after 0.92 Please consider upgrading your cluster to newer release. Cheers On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier wrote: > Hi to everybody, > > I have a probably stupid question: is it a problem to run many mapreduce > jobs on the same HBase table at the same time? And multiple jobs on > different tables on the same cluster? > Should I use Hoya to have a better cluster usage..? > > In my current cluster I noticed that the region servers tend to go down if > I run a mapreduce job while updating (maybe it could be related to the old > version of HBase I'm currently running: 0.92.1-cdh4.1.2). > > Best, > Flavio