subject:"Re\: HBase cluster design"

Re: HBase cluster design

2014-05-28 Thread prince_mithi...@yahoo.co.in

Setting cache blocks to false while running an MR Jon serves 2 purposes - it 
helps real time requests by not out caches for MR jobs which don't really 
need/use them, it helps prevent churn/fragmentation in block cache which helps 
GC

There are properties you need to set in hbase-site.xml for that

Sent from Yahoo Mail on Android

Re: HBase cluster design

2014-05-28 Thread Flavio Pompermaier

Thanks again for the tips! And what about cache blocks? Why should I avoid
it?

On Wed, May 28, 2014 at 2:57 PM, Vikram Singh Chandel <
vikramsinghchan...@gmail.com> wrote:

> Hi Flavio
>
> I suppose you are using Cloudera Manager for HBase management, to change
> RAM usage cloudera has different steps:
>
> 1: Go to HBase service
> 2: Click on configurations
> 3: Click on View and Edit
> 4: Search for Environment safety valve
> 5: make following entry there "HBASE_HEAPSIZE=3072" (without quotes)
> 6: Save -> deploy client configurations (under Action tab) -> Restart HBase
>
>
> On Wed, May 28, 2014 at 12:35 PM, Flavio Pompermaier
> wrote:
>
> > Thank you all for the great suggestions, I'll try ASAP to test them.
> Just 2
> > questions:
> > - why should I set setCacheBlocks to false?
> > - How cai I increasing/decreasing the amount of RAM you provide to block
> > caches and memstores?
> >
> > Best,
> > Flavio
> >
> >
> > On Tue, May 27, 2014 at 2:54 AM, Henry Hung  wrote:
> >
> > > @Flavio:
> > > One thing you should consider: is CPU over saturated?
> > >
> > > For instance, if one server has 24 cores, and the task tracker is
> > > configured to also execute 24 MR simultaneously, then there is no core
> > left
> > > for HBase to do GC, and then crash.
> > >
> > > Few weeks ago my region server sometimes crash when MR is running,
> then I
> > > decide to move MR cluster to another machine leaving only DataNode +
> > > RegionServer running.
> > > After change, my regionserver still running without crash.
> > >
> > > My suggestion is you could try to decrease the MR task numbers to
> around
> > > 12 or lesser, and see if the frequency of crash decrease?
> > >
> > > Best regards,
> > > Henry Hung
> > >
> > > -Original Message-
> > > From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in]
> > > Sent: Tuesday, May 27, 2014 8:03 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: HBase cluster design
> > >
> > > A few things pop out to me on cursory glance:
> > > - You are using CMSIncrementalMode which after a long chain of events
> has
> > > a tendency to result in the famous Juliet pause of death. Can you try
> Par
> > > New GC instead and see if that helps?
> > > - You should try to reduce the CMSInitiatingOccupancyFraction to avoid
> a
> > > full GC
> > > - Your hbase-env.sh is not setting the Xmx at all. Do you know how much
> > > RAM you are giving to your region servers? It may be too small or too
> > large
> > > given your use case and machines size
> > > - Your client scanner caching is 1 which may be too small depending on
> > > your row sizes. You can also override that setting in your scan for the
> > MR
> > > job
> > > - You only have 2 zookeeper instances which is not at all recommended.
> > > Zookeeper needs a quorum to operate and generally works best with an
> odd
> > > number of zookeeper servers. This probably isn't related to your
> crashes
> > > but it would help stability if you had 1 or 3 zookeepers
> > > - I am not 100% sure if the version of hbase you are using has mslab
> > > enabled. If not you should enable it.
> > > - You can try increasing/decreasing the amount of RAM you provide to
> > block
> > > caches and memstores to suit your use case. I see that you are using
> the
> > > defaults here
> > >
> > > On top of these, when you kick off your MR job to scan HBase you should
> > > setCacheBlocks to false
> > >
> > >
> > >
> > > Regards,
> > > Dhaval
> > >
> > >
> > > 
> > >  From: Flavio Pompermaier 
> > > To: user@hbase.apache.org; Dhaval Shah 
> > > Sent: Friday, 23 May 2014 3:16 AM
> > > Subject: Re: HBase cluster design
> > >
> > >
> > >
> > > The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk
> each
> > > server
> > > Attached my hbase config files.
> > >
> > > Thanks,
> > > Flavio
> > >
> > >
> > >
> > >
> > >
> > > On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah <
> > prince_mithi...@yahoo.co.in>
> > > wrote:
> > >
> > > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs
> of
> > &

Re: HBase cluster design

2014-05-28 Thread Vikram Singh Chandel

Hi Flavio

I suppose you are using Cloudera Manager for HBase management, to change
RAM usage cloudera has different steps:

1: Go to HBase service
2: Click on configurations
3: Click on View and Edit
4: Search for Environment safety valve
5: make following entry there "HBASE_HEAPSIZE=3072" (without quotes)
6: Save -> deploy client configurations (under Action tab) -> Restart HBase


On Wed, May 28, 2014 at 12:35 PM, Flavio Pompermaier
wrote:

> Thank you all for the great suggestions, I'll try ASAP to test them. Just 2
> questions:
> - why should I set setCacheBlocks to false?
> - How cai I increasing/decreasing the amount of RAM you provide to block
> caches and memstores?
>
> Best,
> Flavio
>
>
> On Tue, May 27, 2014 at 2:54 AM, Henry Hung  wrote:
>
> > @Flavio:
> > One thing you should consider: is CPU over saturated?
> >
> > For instance, if one server has 24 cores, and the task tracker is
> > configured to also execute 24 MR simultaneously, then there is no core
> left
> > for HBase to do GC, and then crash.
> >
> > Few weeks ago my region server sometimes crash when MR is running, then I
> > decide to move MR cluster to another machine leaving only DataNode +
> > RegionServer running.
> > After change, my regionserver still running without crash.
> >
> > My suggestion is you could try to decrease the MR task numbers to around
> > 12 or lesser, and see if the frequency of crash decrease?
> >
> > Best regards,
> > Henry Hung
> >
> > -Original Message-
> > From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in]
> > Sent: Tuesday, May 27, 2014 8:03 AM
> > To: user@hbase.apache.org
> > Subject: Re: HBase cluster design
> >
> > A few things pop out to me on cursory glance:
> > - You are using CMSIncrementalMode which after a long chain of events has
> > a tendency to result in the famous Juliet pause of death. Can you try Par
> > New GC instead and see if that helps?
> > - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a
> > full GC
> > - Your hbase-env.sh is not setting the Xmx at all. Do you know how much
> > RAM you are giving to your region servers? It may be too small or too
> large
> > given your use case and machines size
> > - Your client scanner caching is 1 which may be too small depending on
> > your row sizes. You can also override that setting in your scan for the
> MR
> > job
> > - You only have 2 zookeeper instances which is not at all recommended.
> > Zookeeper needs a quorum to operate and generally works best with an odd
> > number of zookeeper servers. This probably isn't related to your crashes
> > but it would help stability if you had 1 or 3 zookeepers
> > - I am not 100% sure if the version of hbase you are using has mslab
> > enabled. If not you should enable it.
> > - You can try increasing/decreasing the amount of RAM you provide to
> block
> > caches and memstores to suit your use case. I see that you are using the
> > defaults here
> >
> > On top of these, when you kick off your MR job to scan HBase you should
> > setCacheBlocks to false
> >
> >
> >
> > Regards,
> > Dhaval
> >
> >
> > 
> >  From: Flavio Pompermaier 
> > To: user@hbase.apache.org; Dhaval Shah 
> > Sent: Friday, 23 May 2014 3:16 AM
> > Subject: Re: HBase cluster design
> >
> >
> >
> > The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each
> > server
> > Attached my hbase config files.
> >
> > Thanks,
> > Flavio
> >
> >
> >
> >
> >
> > On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah <
> prince_mithi...@yahoo.co.in>
> > wrote:
> >
> > Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of
> > your cluster?
> > >
> > >
> > >
> > >
> > >Regards,
> > >Dhaval
> > >
> > >
> > >
> > > From: Flavio Pompermaier 
> > >To: user@hbase.apache.org
> > >Sent: Saturday, 17 May 2014 2:49 AM
> > >Subject: Re: HBase cluster design
> > >
> > >
> > >
> > >Could you tell me please in detail the parameters you'd like to see so i
> > >can look for them and learn the important ones?i'm using cloudera, cdh4
> in
> > >one cluster and cdh5 in the other.
> > >
> > >Best,
> > >Flavio
> > >
> > >On May 17, 2014 2:48 AM, "prince_mithi...@ya

Re: HBase cluster design

2014-05-28 Thread Ted Yu

For #2, see http://hbase.apache.org/book/perf.configurations.html

Relevant config parameters start with 12.4.3

Cheers

On May 28, 2014, at 12:05 AM, Flavio Pompermaier  wrote:

> Thank you all for the great suggestions, I'll try ASAP to test them. Just 2
> questions:
> - why should I set setCacheBlocks to false?
> - How cai I increasing/decreasing the amount of RAM you provide to block
> caches and memstores?
> 
> Best,
> Flavio
> 
> 
> On Tue, May 27, 2014 at 2:54 AM, Henry Hung  wrote:
> 
>> @Flavio:
>> One thing you should consider: is CPU over saturated?
>> 
>> For instance, if one server has 24 cores, and the task tracker is
>> configured to also execute 24 MR simultaneously, then there is no core left
>> for HBase to do GC, and then crash.
>> 
>> Few weeks ago my region server sometimes crash when MR is running, then I
>> decide to move MR cluster to another machine leaving only DataNode +
>> RegionServer running.
>> After change, my regionserver still running without crash.
>> 
>> My suggestion is you could try to decrease the MR task numbers to around
>> 12 or lesser, and see if the frequency of crash decrease?
>> 
>> Best regards,
>> Henry Hung
>> 
>> -Original Message-
>> From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in]
>> Sent: Tuesday, May 27, 2014 8:03 AM
>> To: user@hbase.apache.org
>> Subject: Re: HBase cluster design
>> 
>> A few things pop out to me on cursory glance:
>> - You are using CMSIncrementalMode which after a long chain of events has
>> a tendency to result in the famous Juliet pause of death. Can you try Par
>> New GC instead and see if that helps?
>> - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a
>> full GC
>> - Your hbase-env.sh is not setting the Xmx at all. Do you know how much
>> RAM you are giving to your region servers? It may be too small or too large
>> given your use case and machines size
>> - Your client scanner caching is 1 which may be too small depending on
>> your row sizes. You can also override that setting in your scan for the MR
>> job
>> - You only have 2 zookeeper instances which is not at all recommended.
>> Zookeeper needs a quorum to operate and generally works best with an odd
>> number of zookeeper servers. This probably isn't related to your crashes
>> but it would help stability if you had 1 or 3 zookeepers
>> - I am not 100% sure if the version of hbase you are using has mslab
>> enabled. If not you should enable it.
>> - You can try increasing/decreasing the amount of RAM you provide to block
>> caches and memstores to suit your use case. I see that you are using the
>> defaults here
>> 
>> On top of these, when you kick off your MR job to scan HBase you should
>> setCacheBlocks to false
>> 
>> 
>> 
>> Regards,
>> Dhaval
>> 
>> 
>> 
>> From: Flavio Pompermaier 
>> To: user@hbase.apache.org; Dhaval Shah 
>> Sent: Friday, 23 May 2014 3:16 AM
>> Subject: Re: HBase cluster design
>> 
>> 
>> 
>> The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each
>> server
>> Attached my hbase config files.
>> 
>> Thanks,
>> Flavio
>> 
>> 
>> 
>> 
>> 
>> On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah 
>> wrote:
>> 
>> Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of
>> your cluster?
>>> 
>>> 
>>> 
>>> 
>>> Regards,
>>> Dhaval
>>> 
>>> 
>>> 
>>> From: Flavio Pompermaier 
>>> To: user@hbase.apache.org
>>> Sent: Saturday, 17 May 2014 2:49 AM
>>> Subject: Re: HBase cluster design
>>> 
>>> 
>>> 
>>> Could you tell me please in detail the parameters you'd like to see so i
>>> can look for them and learn the important ones?i'm using cloudera, cdh4 in
>>> one cluster and cdh5 in the other.
>>> 
>>> Best,
>>> Flavio
>>> 
>>> On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
>>> prince_mithi...@yahoo.co.in> wrote:
>>> 
>>>> Can you describe your setup in more detail? Specifically the amount of
>>>> heap hbase region servers have and your GC settings. Is your server
>>>> swapping when your MR obs are running? Also do your regions go down or
>> your
>>>> region servers?
>>>> 
>>>

Re: HBase cluster design

2014-05-28 Thread Flavio Pompermaier

Thank you all for the great suggestions, I'll try ASAP to test them. Just 2
questions:
- why should I set setCacheBlocks to false?
- How cai I increasing/decreasing the amount of RAM you provide to block
caches and memstores?

Best,
Flavio


On Tue, May 27, 2014 at 2:54 AM, Henry Hung  wrote:

> @Flavio:
> One thing you should consider: is CPU over saturated?
>
> For instance, if one server has 24 cores, and the task tracker is
> configured to also execute 24 MR simultaneously, then there is no core left
> for HBase to do GC, and then crash.
>
> Few weeks ago my region server sometimes crash when MR is running, then I
> decide to move MR cluster to another machine leaving only DataNode +
> RegionServer running.
> After change, my regionserver still running without crash.
>
> My suggestion is you could try to decrease the MR task numbers to around
> 12 or lesser, and see if the frequency of crash decrease?
>
> Best regards,
> Henry Hung
>
> -Original Message-
> From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in]
> Sent: Tuesday, May 27, 2014 8:03 AM
> To: user@hbase.apache.org
> Subject: Re: HBase cluster design
>
> A few things pop out to me on cursory glance:
> - You are using CMSIncrementalMode which after a long chain of events has
> a tendency to result in the famous Juliet pause of death. Can you try Par
> New GC instead and see if that helps?
> - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a
> full GC
> - Your hbase-env.sh is not setting the Xmx at all. Do you know how much
> RAM you are giving to your region servers? It may be too small or too large
> given your use case and machines size
> - Your client scanner caching is 1 which may be too small depending on
> your row sizes. You can also override that setting in your scan for the MR
> job
> - You only have 2 zookeeper instances which is not at all recommended.
> Zookeeper needs a quorum to operate and generally works best with an odd
> number of zookeeper servers. This probably isn't related to your crashes
> but it would help stability if you had 1 or 3 zookeepers
> - I am not 100% sure if the version of hbase you are using has mslab
> enabled. If not you should enable it.
> - You can try increasing/decreasing the amount of RAM you provide to block
> caches and memstores to suit your use case. I see that you are using the
> defaults here
>
> On top of these, when you kick off your MR job to scan HBase you should
> setCacheBlocks to false
>
>
>
> Regards,
> Dhaval
>
>
> 
>  From: Flavio Pompermaier 
> To: user@hbase.apache.org; Dhaval Shah 
> Sent: Friday, 23 May 2014 3:16 AM
> Subject: Re: HBase cluster design
>
>
>
> The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each
> server
> Attached my hbase config files.
>
> Thanks,
> Flavio
>
>
>
>
>
> On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah 
> wrote:
>
> Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of
> your cluster?
> >
> >
> >
> >
> >Regards,
> >Dhaval
> >
> >
> >
> > From: Flavio Pompermaier 
> >To: user@hbase.apache.org
> >Sent: Saturday, 17 May 2014 2:49 AM
> >Subject: Re: HBase cluster design
> >
> >
> >
> >Could you tell me please in detail the parameters you'd like to see so i
> >can look for them and learn the important ones?i'm using cloudera, cdh4 in
> >one cluster and cdh5 in the other.
> >
> >Best,
> >Flavio
> >
> >On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
> >prince_mithi...@yahoo.co.in> wrote:
> >
> >> Can you describe your setup in more detail? Specifically the amount of
> >> heap hbase region servers have and your GC settings. Is your server
> >> swapping when your MR obs are running? Also do your regions go down or
> your
> >> region servers?
> >>
> >> We run many MR jobs simultaneously on our hbase tables (size is in TBs)
> >> along with serving real time requests at the same time. So I can vouch
> for
> >> the fact that a well tuned hbase cluster definitely supports this use
> case
> >> (well-tuned is the key word here)
> >>
> >> Sent from Yahoo Mail on Android
> >>
> >>
>
> The privileged confidential information contained in this email is
> intended for use only by the addressees as indicated by the original sender
> of this email. If you are not the addressee indicated in this email or are
> not responsible for delivery of the email to such a person, please kindly
> reply to the sender indicating this fact and delete all copies of it from
> your computer and network server immediately. Your cooperation is highly
> appreciated. It is advised that any unauthorized use of confidential
> information of Winbond is strictly prohibited; and any information in this
> email irrelevant to the official business of Winbond shall be deemed as
> neither given nor endorsed by Winbond.
>

RE: HBase cluster design

2014-05-26 Thread Henry Hung

@Flavio:
One thing you should consider: is CPU over saturated?

For instance, if one server has 24 cores, and the task tracker is configured to 
also execute 24 MR simultaneously, then there is no core left for HBase to do 
GC, and then crash.

Few weeks ago my region server sometimes crash when MR is running, then I 
decide to move MR cluster to another machine leaving only DataNode + 
RegionServer running.
After change, my regionserver still running without crash.

My suggestion is you could try to decrease the MR task numbers to around 12 or 
lesser, and see if the frequency of crash decrease?

Best regards,
Henry Hung

-Original Message-
From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in]
Sent: Tuesday, May 27, 2014 8:03 AM
To: user@hbase.apache.org
Subject: Re: HBase cluster design

A few things pop out to me on cursory glance:
- You are using CMSIncrementalMode which after a long chain of events has a 
tendency to result in the famous Juliet pause of death. Can you try Par New GC 
instead and see if that helps?
- You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC
- Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you 
are giving to your region servers? It may be too small or too large given your 
use case and machines size
- Your client scanner caching is 1 which may be too small depending on your row 
sizes. You can also override that setting in your scan for the MR job
- You only have 2 zookeeper instances which is not at all recommended. 
Zookeeper needs a quorum to operate and generally works best with an odd number 
of zookeeper servers. This probably isn't related to your crashes but it would 
help stability if you had 1 or 3 zookeepers
- I am not 100% sure if the version of hbase you are using has mslab enabled. 
If not you should enable it.
- You can try increasing/decreasing the amount of RAM you provide to block 
caches and memstores to suit your use case. I see that you are using the 
defaults here

On top of these, when you kick off your MR job to scan HBase you should 
setCacheBlocks to false

Regards,
Dhaval

 From: Flavio Pompermaier 
To: user@hbase.apache.org; Dhaval Shah 
Sent: Friday, 23 May 2014 3:16 AM
Subject: Re: HBase cluster design

The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server
Attached my hbase config files.

Thanks,
Flavio

On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah  
wrote:

Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your 
cluster?
>
>
>
>
>Regards,
>Dhaval
>
>
>
> From: Flavio Pompermaier 
>To: user@hbase.apache.org
>Sent: Saturday, 17 May 2014 2:49 AM
>Subject: Re: HBase cluster design
>
>
>
>Could you tell me please in detail the parameters you'd like to see so i
>can look for them and learn the important ones?i'm using cloudera, cdh4 in
>one cluster and cdh5 in the other.
>
>Best,
>Flavio
>
>On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
>prince_mithi...@yahoo.co.in> wrote:
>
>> Can you describe your setup in more detail? Specifically the amount of
>> heap hbase region servers have and your GC settings. Is your server
>> swapping when your MR obs are running? Also do your regions go down or your
>> region servers?
>>
>> We run many MR jobs simultaneously on our hbase tables (size is in TBs)
>> along with serving real time requests at the same time. So I can vouch for
>> the fact that a well tuned hbase cluster definitely supports this use case
>> (well-tuned is the key word here)
>>
>> Sent from Yahoo Mail on Android
>>
>>

The privileged confidential information contained in this email is intended for 
use only by the addressees as indicated by the original sender of this email. 
If you are not the addressee indicated in this email or are not responsible for 
delivery of the email to such a person, please kindly reply to the sender 
indicating this fact and delete all copies of it from your computer and network 
server immediately. Your cooperation is highly appreciated. It is advised that 
any unauthorized use of confidential information of Winbond is strictly 
prohibited; and any information in this email irrelevant to the official 
business of Winbond shall be deemed as neither given nor endorsed by Winbond.

Re: HBase cluster design

2014-05-26 Thread Dhaval Shah

A few things pop out to me on cursory glance:
- You are using CMSIncrementalMode which after a long chain of events has a 
tendency to result in the famous Juliet pause of death. Can you try Par New GC 
instead and see if that helps?
- You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC
- Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you 
are giving to your region servers? It may be too small or too large given your 
use case and machines size
- Your client scanner caching is 1 which may be too small depending on your row 
sizes. You can also override that setting in your scan for the MR job
- You only have 2 zookeeper instances which is not at all recommended. 
Zookeeper needs a quorum to operate and generally works best with an odd number 
of zookeeper servers. This probably isn't related to your crashes but it would 
help stability if you had 1 or 3 zookeepers
- I am not 100% sure if the version of hbase you are using has mslab enabled. 
If not you should enable it.
- You can try increasing/decreasing the amount of RAM you provide to block 
caches and memstores to suit your use case. I see that you are using the 
defaults here

On top of these, when you kick off your MR job to scan HBase you should 
setCacheBlocks to false



Regards,
Dhaval
 


 From: Flavio Pompermaier 
To: user@hbase.apache.org; Dhaval Shah  
Sent: Friday, 23 May 2014 3:16 AM
Subject: Re: HBase cluster design
  


The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server
Attached my hbase config files. 

Thanks,
Flavio





On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah  
wrote:

Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your 
cluster?
>
>
> 
>
>Regards,
>Dhaval
>
>
>
> From: Flavio Pompermaier 
>To: user@hbase.apache.org
>Sent: Saturday, 17 May 2014 2:49 AM
>Subject: Re: HBase cluster design
>
>
>
>Could you tell me please in detail the parameters you'd like to see so i
>can look for them and learn the important ones?i'm using cloudera, cdh4 in
>one cluster and cdh5 in the other.
>
>Best,
>Flavio
>
>On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
>prince_mithi...@yahoo.co.in> wrote:
>
>> Can you describe your setup in more detail? Specifically the amount of
>> heap hbase region servers have and your GC settings. Is your server
>> swapping when your MR obs are running? Also do your regions go down or your
>> region servers?
>>
>> We run many MR jobs simultaneously on our hbase tables (size is in TBs)
>> along with serving real time requests at the same time. So I can vouch for
>> the fact that a well tuned hbase cluster definitely supports this use case
>> (well-tuned is the key word here)
>>
>> Sent from Yahoo Mail on Android
>>
>>

Re: HBase cluster design

2014-05-23 Thread Flavio Pompermaier

The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each
server
Attached my hbase config files.

Thanks,
Flavio

On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah wrote:

> Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of
> your cluster?
>
>
>
>
> Regards,
> Dhaval
>
>
> 
>  From: Flavio Pompermaier 
> To: user@hbase.apache.org
> Sent: Saturday, 17 May 2014 2:49 AM
> Subject: Re: HBase cluster design
>
>
> Could you tell me please in detail the parameters you'd like to see so i
> can look for them and learn the important ones?i'm using cloudera, cdh4 in
> one cluster and cdh5 in the other.
>
> Best,
> Flavio
>
> On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
> prince_mithi...@yahoo.co.in> wrote:
>
> > Can you describe your setup in more detail? Specifically the amount of
> > heap hbase region servers have and your GC settings. Is your server
> > swapping when your MR obs are running? Also do your regions go down or
> your
> > region servers?
> >
> > We run many MR jobs simultaneously on our hbase tables (size is in TBs)
> > along with serving real time requests at the same time. So I can vouch
> for
> > the fact that a well tuned hbase cluster definitely supports this use
> case
> > (well-tuned is the key word here)
> >
> > Sent from Yahoo Mail on Android
> >
> >
>


hbase-env.sh
Description: Bourne shell script




  
hbase.rootdir
hdfs://server4:8020/hbase
  
  
hbase.client.write.buffer
2097152
  
  
hbase.client.pause
1000
  
  
hbase.client.retries.number
10
  
  
hbase.client.scanner.caching
1
  
  
hbase.client.keyvalue.maxsize
10485760
  
  
hbase.security.authentication
simple
  
  
zookeeper.session.timeout
6
  
  
zookeeper.znode.parent
/hbase
  
  
zookeeper.znode.rootserver
root-region-server
  
  
hbase.zookeeper.quorum
server1,server2
  
  
hbase.zookeeper.property.clientPort
2181

Re: HBase cluster design

2014-05-22 Thread Dhaval Shah

Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your 
cluster?


 

Regards,
Dhaval



 From: Flavio Pompermaier 
To: user@hbase.apache.org 
Sent: Saturday, 17 May 2014 2:49 AM
Subject: Re: HBase cluster design
 

Could you tell me please in detail the parameters you'd like to see so i
can look for them and learn the important ones?i'm using cloudera, cdh4 in
one cluster and cdh5 in the other.

Best,
Flavio

On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
prince_mithi...@yahoo.co.in> wrote:

> Can you describe your setup in more detail? Specifically the amount of
> heap hbase region servers have and your GC settings. Is your server
> swapping when your MR obs are running? Also do your regions go down or your
> region servers?
>
> We run many MR jobs simultaneously on our hbase tables (size is in TBs)
> along with serving real time requests at the same time. So I can vouch for
> the fact that a well tuned hbase cluster definitely supports this use case
> (well-tuned is the key word here)
>
> Sent from Yahoo Mail on Android
>
>

Re: HBase cluster design

2014-05-17 Thread Flavio Pompermaier

Could you tell me please in detail the parameters you'd like to see so i
can look for them and learn the important ones?i'm using cloudera, cdh4 in
one cluster and cdh5 in the other.

Best,
Flavio
On May 17, 2014 2:48 AM, "prince_mithi...@yahoo.co.in" <
prince_mithi...@yahoo.co.in> wrote:

> Can you describe your setup in more detail? Specifically the amount of
> heap hbase region servers have and your GC settings. Is your server
> swapping when your MR obs are running? Also do your regions go down or your
> region servers?
>
> We run many MR jobs simultaneously on our hbase tables (size is in TBs)
> along with serving real time requests at the same time. So I can vouch for
> the fact that a well tuned hbase cluster definitely supports this use case
> (well-tuned is the key word here)
>
> Sent from Yahoo Mail on Android
>
>

Re: HBase cluster design

2014-05-16 Thread prince_mithi...@yahoo.co.in

Can you describe your setup in more detail? Specifically the amount of heap 
hbase region servers have and your GC settings. Is your server swapping when 
your MR obs are running? Also do your regions go down or your region servers?

We run many MR jobs simultaneously on our hbase tables (size is in TBs) along 
with serving real time requests at the same time. So I can vouch for the fact 
that a well tuned hbase cluster definitely supports this use case (well-tuned 
is the key word here)

Sent from Yahoo Mail on Android

Re: HBase cluster design

2014-05-16 Thread Flavio Pompermaier

Thanks for the response. Actually I'm still trying to understand why some
of the regions of my Hbase goes down from time to time during my mapred job
if table updates occur because in the logs there's nothing interesting..The
updates usually happens in bursts of 10/100 sequential puts per seconds. Is
there any rule of thumb about those scenarios to avoid problems or some
fundamental tuning to check? I have a lot of ram (48g) and 24 processor per
server (for a total  of 4 servers) and I have not that much data (20g) so I
don't understand why the region servers goes down (usually after a couple
of mapred job).
However in general, speaking also with other people using Hbase, it seems
that is not very safe to run mapred jobs while updating the table..are we
wrong?

Best,
Flavio

On Wed, May 14, 2014 at 7:04 PM, Stack  wrote:

> On Tue, May 13, 2014 at 3:14 AM, Flavio Pompermaier  >wrote:
>
> > So just to summarize the result of this discussion..
> > do you confirm me that the last version of HBase should (in theory)
> support
> > mapreduce jobs on tables that in the meantime could be updated by
> external
> > processes (i.e. not by the mapred job)?
> > One of the answer about this was saying: "Poorly tuned HBase clusters can
> > fail easily under heavy load"..
> > Could you suggest me some tuning to avoid the crashing of HBase in such
> > situations?
> >
>
>
> Run less mappers/reducers.  Start with one only and move up from there.
>  Ditto for other processes updating hbase.
>
> You have monitoring going on on this cluster?  What is it telling you about
> the loadings?
>
> St.Ack

Re: HBase cluster design

2014-05-14 Thread Stack

On Tue, May 13, 2014 at 3:14 AM, Flavio Pompermaier wrote:

> So just to summarize the result of this discussion..
> do you confirm me that the last version of HBase should (in theory) support
> mapreduce jobs on tables that in the meantime could be updated by external
> processes (i.e. not by the mapred job)?
> One of the answer about this was saying: "Poorly tuned HBase clusters can
> fail easily under heavy load"..
> Could you suggest me some tuning to avoid the crashing of HBase in such
> situations?
>

Run less mappers/reducers.  Start with one only and move up from there.
 Ditto for other processes updating hbase.

You have monitoring going on on this cluster?  What is it telling you about
the loadings?

St.Ack

Re: HBase cluster design

2014-05-14 Thread Flavio Pompermaier

Of course I did..but that's not very helpful, is too broad!
I need something more specific to the tuning of HBase for the support of
the run of multiple mapred jobs while updating the tables..

Best,
Flavio


On Tue, May 13, 2014 at 3:20 PM, Ted Yu  wrote:

> Have you looked at http://hbase.apache.org/book/performance.html ?
>
> Cheers
>
> On May 13, 2014, at 3:14 AM, Flavio Pompermaier 
> wrote:
>
> > So just to summarize the result of this discussion..
> > do you confirm me that the last version of HBase should (in theory)
> support
> > mapreduce jobs on tables that in the meantime could be updated by
> external
> > processes (i.e. not by the mapred job)?
> > One of the answer about this was saying: "Poorly tuned HBase clusters can
> > fail easily under heavy load"..
> > Could you suggest me some tuning to avoid the crashing of HBase in such
> > situations?
> >
> > Best,
> > Flavio
> >
> >
> > On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier
> > wrote:
> >
> >> Today I was able to catch an error during a mapreduce job that actually
> >> mimes the rowCount more or less.
> >> The error I saw is:
> >>
> >> ould not sync. Requesting close of hlog
> >> java.io.IOException: Reflection
> >>at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
> >>at
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
> >>at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
> >>at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
> >>at java.lang.Thread.run(Thread.java:662)
> >> Caused by: java.lang.reflect.InvocationTargetException
> >>at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
> >>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>at java.lang.reflect.Method.invoke(Method.java:597)
> >>at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
> >>... 4 more
> >> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
> File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not
> have any open files.
> >>at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
> >>at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
> >>at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
> >>at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> >>at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> >>at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> >>at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> >>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> >>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> >>at java.security.AccessController.doPrivileged(Native Method)
> >>at javax.security.auth.Subject.doAs(Subject.java:396)
> >>at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> >>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
> >>
> >>at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> >>at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> >>at $Proxy14.addBlock(Unknown Source)
> >>at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> >>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>at java.lang.reflect.Method.invoke(Method.java:597)
> >>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> >>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> >>at $Proxy14.addBlock(Unknown Source)
> >>at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> >>at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> >>at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> >>at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
> >>
> >>
> >> What can be the cause of this error?
> >>
> >>
> >> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel <
> michael_se...@hot

Re: HBase cluster design

2014-05-13 Thread Ted Yu

Have you looked at http://hbase.apache.org/book/performance.html ?

Cheers

On May 13, 2014, at 3:14 AM, Flavio Pompermaier  wrote:

> So just to summarize the result of this discussion..
> do you confirm me that the last version of HBase should (in theory) support
> mapreduce jobs on tables that in the meantime could be updated by external
> processes (i.e. not by the mapred job)?
> One of the answer about this was saying: "Poorly tuned HBase clusters can
> fail easily under heavy load"..
> Could you suggest me some tuning to avoid the crashing of HBase in such
> situations?
> 
> Best,
> Flavio
> 
> 
> On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier
> wrote:
> 
>> Today I was able to catch an error during a mapreduce job that actually
>> mimes the rowCount more or less.
>> The error I saw is:
>> 
>> ould not sync. Requesting close of hlog
>> java.io.IOException: Reflection
>>at 
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
>>at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
>>at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
>>at 
>> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
>>at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.reflect.InvocationTargetException
>>at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
>>at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at 
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
>>... 4 more
>> Caused by: 
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>  No lease on 
>> /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
>>  File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not 
>> have any open files.
>>at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
>>at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
>>at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
>>at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>at java.security.AccessController.doPrivileged(Native Method)
>>at javax.security.auth.Subject.doAs(Subject.java:396)
>>at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>> 
>>at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>at $Proxy14.addBlock(Unknown Source)
>>at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>>at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>at $Proxy14.addBlock(Unknown Source)
>>at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>> 
>> 
>> What can be the cause of this error?
>> 
>> 
>> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel 
>> wrote:
>> 
>>> You have one other thing to consider.
>>> 
>>> Did you oversubscribe on the m/r tuning side of things.
>>> 
>>> Many people want to segment their HBase to a portion of the cluster.
>>> This should be the exception to the design not the primary cluster design.
>>> 
>>> If you over subscribe your cluster, you will run out of memory, then you
>>> need to

Re: HBase cluster design

2014-05-13 Thread Flavio Pompermaier

So just to summarize the result of this discussion..
do you confirm me that the last version of HBase should (in theory) support
mapreduce jobs on tables that in the meantime could be updated by external
processes (i.e. not by the mapred job)?
One of the answer about this was saying: "Poorly tuned HBase clusters can
fail easily under heavy load"..
Could you suggest me some tuning to avoid the crashing of HBase in such
situations?

Best,
Flavio


On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier
wrote:

> Today I was able to catch an error during a mapreduce job that actually
> mimes the rowCount more or less.
> The error I saw is:
>
> ould not sync. Requesting close of hlog
> java.io.IOException: Reflection
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
>   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
>   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
>   ... 4 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
>  File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not 
> have any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
>   at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>   at $Proxy14.addBlock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>   at $Proxy14.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
>
> What can be the cause of this error?
>
>
> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel 
> wrote:
>
>> You have one other thing to consider.
>>
>> Did you oversubscribe on the m/r tuning side of things.
>>
>> Many people want to segment their HBase to a portion of the cluster.
>> This should be the exception to the design not the primary cluster design.
>>
>> If you over subscribe your cluster, you will run out of memory, then you
>> need to swap, and boom bad things happen.
>>
>> Also, while many suggest not reserving room for swap... I suggest that
>> you do leave some room.
>>
>> While t

Re: HBase cluster design

2014-04-11 Thread Flavio Pompermaier

Today I was able to catch an error during a mapreduce job that actually
mimes the rowCount more or less.
The error I saw is:

ould not sync. Requesting close of hlog
java.io.IOException: Reflection
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
... 4 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on 
/hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does
not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy14.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy14.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)

What can be the cause of this error?

On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel wrote:

> You have one other thing to consider.
>
> Did you oversubscribe on the m/r tuning side of things.
>
> Many people want to segment their HBase to a portion of the cluster.
> This should be the exception to the design not the primary cluster design.
>
> If you over subscribe your cluster, you will run out of memory, then you
> need to swap, and boom bad things happen.
>
> Also, while many suggest not reserving room for swap... I suggest that you
> do leave some room.
>
> While this doesn't address the issues in your question directly, they are
> something that you need to consider.
>
> More to your point...
> Poorly tuned HBase clusters can fail easily under heavy load.
>
> While Ted doesn't address this... consideration, it can become an issue.
>
> YMMV of course.
>
>
>
> On Apr 4, 2014, at 9:43 AM, Ted Yu  wrote:
>
> > The 'Connection refused' message was logged at WARN level.
> >
> > If you can pastebin more of the region server log before its crash, I
> would
> > be take a deeper look.
> >
> > BTW I assume your zookeeper quorum was healthy during that period of
> time.

Re: HBase cluster design

2014-04-05 Thread Michael Segel

You have one other thing to consider. 

Did you oversubscribe on the m/r tuning side of things. 

Many people want to segment their HBase to a portion of the cluster. 
This should be the exception to the design not the primary cluster design. 

If you over subscribe your cluster, you will run out of memory, then you need 
to swap, and boom bad things happen. 

Also, while many suggest not reserving room for swap... I suggest that you do 
leave some room. 

While this doesn't address the issues in your question directly, they are 
something that you need to consider. 

More to your point... 
Poorly tuned HBase clusters can fail easily under heavy load. 

While Ted doesn't address this... consideration, it can become an issue. 

YMMV of course. 



On Apr 4, 2014, at 9:43 AM, Ted Yu  wrote:

> The 'Connection refused' message was logged at WARN level.
> 
> If you can pastebin more of the region server log before its crash, I would
> be take a deeper look.
> 
> BTW I assume your zookeeper quorum was healthy during that period of time.
> 
> 
> On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier 
> wrote:
> 
>> Yes I know I should update HBase, this is something I'm going to do really
>> soon. Bad me..
>> I just wanted to know if the fact of adding/updating rows in HBase while
>> running a mapred job could be problematic or not..
>> From what you told me it's not, so the problem could be caused by the old
>> version of HBase or some other os configuration.
>> The update was performed via an application accessing HBase directly,
>> adding and updating rows of the table.
>> Once in a while some region servers goes down and marked as "bad state" by
>> Cloudera so I have to restart them.
>> 
>> The error I usually see is:
>> 
>> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x13b2cf447fd for server null, unexpected error, closing socket
>> connection and attempting reconnect
>> java.net.ConnectException: Connection refused
>>at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>at
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
>>at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047
>> 
>> Best,
>> Flavio
>> 
>> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu  wrote:
>> 
>>> Was the updating performed by one of the mapreduce jobs ?
>>> HBase should be able to serve multiple mapreduce jobs in the same
>> cluster.
>>> 
>>> Can you provide more detail on the crash ?
>>> 
>>> BTW, there are 3 major releases after 0.92
>>> Please consider upgrading your cluster to newer release.
>>> 
>>> Cheers
>>> 
>>> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier 
>>> wrote:
>>> 
 Hi to everybody,
 
 I have a probably stupid question: is it a problem to run many
>> mapreduce
 jobs on the same HBase table at the same time? And multiple jobs on
 different tables on the same cluster?
 Should I use Hoya to have a better cluster usage..?
 
 In my current cluster I noticed that the region servers tend to go down
>>> if
 I run a mapreduce job while updating (maybe it could be related to the
>>> old
 version of HBase I'm currently running: 0.92.1-cdh4.1.2).
 
 Best,
 Flavio
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase cluster design

2014-04-04 Thread Ted Yu

The 'Connection refused' message was logged at WARN level.

If you can pastebin more of the region server log before its crash, I would
be take a deeper look.

BTW I assume your zookeeper quorum was healthy during that period of time.


On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier wrote:

> Yes I know I should update HBase, this is something I'm going to do really
> soon. Bad me..
> I just wanted to know if the fact of adding/updating rows in HBase while
> running a mapred job could be problematic or not..
> From what you told me it's not, so the problem could be caused by the old
> version of HBase or some other os configuration.
> The update was performed via an application accessing HBase directly,
> adding and updating rows of the table.
> Once in a while some region servers goes down and marked as "bad state" by
> Cloudera so I have to restart them.
>
> The error I usually see is:
>
> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x13b2cf447fd for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047
>
> Best,
> Flavio
>
> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu  wrote:
>
> > Was the updating performed by one of the mapreduce jobs ?
> > HBase should be able to serve multiple mapreduce jobs in the same
> cluster.
> >
> > Can you provide more detail on the crash ?
> >
> > BTW, there are 3 major releases after 0.92
> > Please consider upgrading your cluster to newer release.
> >
> > Cheers
> >
> > On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier 
> > wrote:
> >
> > > Hi to everybody,
> > >
> > > I have a probably stupid question: is it a problem to run many
> mapreduce
> > > jobs on the same HBase table at the same time? And multiple jobs on
> > > different tables on the same cluster?
> > > Should I use Hoya to have a better cluster usage..?
> > >
> > > In my current cluster I noticed that the region servers tend to go down
> > if
> > > I run a mapreduce job while updating (maybe it could be related to the
> > old
> > > version of HBase I'm currently running: 0.92.1-cdh4.1.2).
> > >
> > > Best,
> > > Flavio
> >
>

Re: HBase cluster design

2014-04-04 Thread Flavio Pompermaier

Yes I know I should update HBase, this is something I'm going to do really
soon. Bad me..
I just wanted to know if the fact of adding/updating rows in HBase while
running a mapred job could be problematic or not..
>From what you told me it's not, so the problem could be caused by the old
version of HBase or some other os configuration.
The update was performed via an application accessing HBase directly,
adding and updating rows of the table.
Once in a while some region servers goes down and marked as "bad state" by
Cloudera so I have to restart them.

The error I usually see is:

2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
0x13b2cf447fd for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047

Best,
Flavio

On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu  wrote:

> Was the updating performed by one of the mapreduce jobs ?
> HBase should be able to serve multiple mapreduce jobs in the same cluster.
>
> Can you provide more detail on the crash ?
>
> BTW, there are 3 major releases after 0.92
> Please consider upgrading your cluster to newer release.
>
> Cheers
>
> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier 
> wrote:
>
> > Hi to everybody,
> >
> > I have a probably stupid question: is it a problem to run many mapreduce
> > jobs on the same HBase table at the same time? And multiple jobs on
> > different tables on the same cluster?
> > Should I use Hoya to have a better cluster usage..?
> >
> > In my current cluster I noticed that the region servers tend to go down
> if
> > I run a mapreduce job while updating (maybe it could be related to the
> old
> > version of HBase I'm currently running: 0.92.1-cdh4.1.2).
> >
> > Best,
> > Flavio
>

Re: HBase cluster design

2014-04-04 Thread Ted Yu

Was the updating performed by one of the mapreduce jobs ?
HBase should be able to serve multiple mapreduce jobs in the same cluster. 

Can you provide more detail on the crash ?

BTW, there are 3 major releases after 0.92
Please consider upgrading your cluster to newer release. 

Cheers

On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier  wrote:

> Hi to everybody,
> 
> I have a probably stupid question: is it a problem to run many mapreduce
> jobs on the same HBase table at the same time? And multiple jobs on
> different tables on the same cluster?
> Should I use Hoya to have a better cluster usage..?
> 
> In my current cluster I noticed that the region servers tend to go down if
> I run a mapreduce job while updating (maybe it could be related to the old
> version of HBase I'm currently running: 0.92.1-cdh4.1.2).
> 
> Best,
> Flavio

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

RE: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

Re: HBase cluster design

21 matches

Site Navigation

Mail list logo

Footer information