Re: HBase for Small Key Value Tables

2016-08-30 Thread Ted Yu
You don't need to rebuild hbase. 

Just add entry in hbase-site.xml for the following config:
> hbase.master.balancer.stochastic.tableSkewCost

Restart master after the addition. 

Cheers

> On Aug 30, 2016, at 12:10 AM, Manish Maheshwari  wrote:
> 
> Hi Ted,
> 
> Where do we set this value  DEFAULT_TABLE_SKEW_COST = 35. I see it in only
> in StochasticLoadBalancer.java
> We don't find this in any of the HBase Config files. Do we need to re-build
> HBase from code for this?
> 
> Thanks,
> Manish
> 
>> On Tue, Aug 30, 2016 at 6:44 AM, Ted Yu  wrote:
>> 
>> StochasticLoadBalancer by default would balance regions evenly across the
>> cluster.
>> 
>> Regions of particular table may not be evenly distributed.
>> 
>> Increase the value for the following config:
>> 
>>private static final String TABLE_SKEW_COST_KEY =
>> 
>>"hbase.master.balancer.stochastic.tableSkewCost";
>> 
>>private static final float DEFAULT_TABLE_SKEW_COST = 35;
>> 
>> You can set 500 or higher.
>> 
>> FYI
>> 
>> On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari 
>> wrote:
>> 
>>> Thanks Ted for the maxregionsize per table idea. We will try to keep it
>>> around 1-2 Gigs and see how it goes. Will this also make sure that the
>>> region migrates to another region server? Or do we still need to do that
>>> manually?
>>> 
>>> On JMX, Since the environment is production, we are yet unable to use jmx
>>> for stats collection. But in dev we are trying it out.
>>> 
 On Aug 30, 2016 1:01 AM, "Ted Yu"  wrote:
 
 bq. We cannot change the maxregionsize parameter
 
 The region size can be changed on per table basis:
 
  hbase> alter 't1', MAX_FILESIZE => '134217728'
 
 See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb
>>> for
 more details.
 
 FYI
 
 On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari <
>> mylogi...@gmail.com
 
 wrote:
 
> Hi,
> 
> We have a scenario where HBase is used like a Key Value Database to
>> map
> Keys to Regions. We have over 5 Million Keys, but the table size is
>>> less
> than 7 GB. The read volume is pretty high - About 50x of the
>> put/delete
> volume. This causes hot spotting on the Data Node and the region is
>> not
> split. We cannot change the maxregionsize parameter as that will
>> impact
> other tables too.
> 
> Our idea is to manually inspect the row key ranges and then split the
> region manually and assign them to different region servers. We will
> continue to then monitor the rows in one region to see if needs to be
> split.
> 
> Any experience of doing this on HBase. Is this a recommended
>> approach?
> 
> Thanks,
> Manish
>> 


Re: HBase for Small Key Value Tables

2016-08-30 Thread Manish Maheshwari
Hi Ted,

Where do we set this value  DEFAULT_TABLE_SKEW_COST = 35. I see it in only
in StochasticLoadBalancer.java
We don't find this in any of the HBase Config files. Do we need to re-build
HBase from code for this?

Thanks,
Manish

On Tue, Aug 30, 2016 at 6:44 AM, Ted Yu  wrote:

> StochasticLoadBalancer by default would balance regions evenly across the
> cluster.
>
> Regions of particular table may not be evenly distributed.
>
> Increase the value for the following config:
>
> private static final String TABLE_SKEW_COST_KEY =
>
> "hbase.master.balancer.stochastic.tableSkewCost";
>
> private static final float DEFAULT_TABLE_SKEW_COST = 35;
>
> You can set 500 or higher.
>
> FYI
>
> On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari 
> wrote:
>
> > Thanks Ted for the maxregionsize per table idea. We will try to keep it
> > around 1-2 Gigs and see how it goes. Will this also make sure that the
> > region migrates to another region server? Or do we still need to do that
> > manually?
> >
> > On JMX, Since the environment is production, we are yet unable to use jmx
> > for stats collection. But in dev we are trying it out.
> >
> > On Aug 30, 2016 1:01 AM, "Ted Yu"  wrote:
> >
> > > bq. We cannot change the maxregionsize parameter
> > >
> > > The region size can be changed on per table basis:
> > >
> > >   hbase> alter 't1', MAX_FILESIZE => '134217728'
> > >
> > > See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb
> > for
> > > more details.
> > >
> > > FYI
> > >
> > > On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a scenario where HBase is used like a Key Value Database to
> map
> > > > Keys to Regions. We have over 5 Million Keys, but the table size is
> > less
> > > > than 7 GB. The read volume is pretty high - About 50x of the
> put/delete
> > > > volume. This causes hot spotting on the Data Node and the region is
> not
> > > > split. We cannot change the maxregionsize parameter as that will
> impact
> > > > other tables too.
> > > >
> > > > Our idea is to manually inspect the row key ranges and then split the
> > > > region manually and assign them to different region servers. We will
> > > > continue to then monitor the rows in one region to see if needs to be
> > > > split.
> > > >
> > > > Any experience of doing this on HBase. Is this a recommended
> approach?
> > > >
> > > > Thanks,
> > > > Manish
> > > >
> > >
> >
>


Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
StochasticLoadBalancer by default would balance regions evenly across the
cluster.

Regions of particular table may not be evenly distributed.

Increase the value for the following config:

private static final String TABLE_SKEW_COST_KEY =

"hbase.master.balancer.stochastic.tableSkewCost";

private static final float DEFAULT_TABLE_SKEW_COST = 35;

You can set 500 or higher.

FYI

On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari 
wrote:

> Thanks Ted for the maxregionsize per table idea. We will try to keep it
> around 1-2 Gigs and see how it goes. Will this also make sure that the
> region migrates to another region server? Or do we still need to do that
> manually?
>
> On JMX, Since the environment is production, we are yet unable to use jmx
> for stats collection. But in dev we are trying it out.
>
> On Aug 30, 2016 1:01 AM, "Ted Yu"  wrote:
>
> > bq. We cannot change the maxregionsize parameter
> >
> > The region size can be changed on per table basis:
> >
> >   hbase> alter 't1', MAX_FILESIZE => '134217728'
> >
> > See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb
> for
> > more details.
> >
> > FYI
> >
> > On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari  >
> > wrote:
> >
> > > Hi,
> > >
> > > We have a scenario where HBase is used like a Key Value Database to map
> > > Keys to Regions. We have over 5 Million Keys, but the table size is
> less
> > > than 7 GB. The read volume is pretty high - About 50x of the put/delete
> > > volume. This causes hot spotting on the Data Node and the region is not
> > > split. We cannot change the maxregionsize parameter as that will impact
> > > other tables too.
> > >
> > > Our idea is to manually inspect the row key ranges and then split the
> > > region manually and assign them to different region servers. We will
> > > continue to then monitor the rows in one region to see if needs to be
> > > split.
> > >
> > > Any experience of doing this on HBase. Is this a recommended approach?
> > >
> > > Thanks,
> > > Manish
> > >
> >
>


Re: HBase for Small Key Value Tables

2016-08-29 Thread Manish Maheshwari
Thanks Ted for the maxregionsize per table idea. We will try to keep it
around 1-2 Gigs and see how it goes. Will this also make sure that the
region migrates to another region server? Or do we still need to do that
manually?

On JMX, Since the environment is production, we are yet unable to use jmx
for stats collection. But in dev we are trying it out.

On Aug 30, 2016 1:01 AM, "Ted Yu"  wrote:

> bq. We cannot change the maxregionsize parameter
>
> The region size can be changed on per table basis:
>
>   hbase> alter 't1', MAX_FILESIZE => '134217728'
>
> See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb for
> more details.
>
> FYI
>
> On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari 
> wrote:
>
> > Hi,
> >
> > We have a scenario where HBase is used like a Key Value Database to map
> > Keys to Regions. We have over 5 Million Keys, but the table size is less
> > than 7 GB. The read volume is pretty high - About 50x of the put/delete
> > volume. This causes hot spotting on the Data Node and the region is not
> > split. We cannot change the maxregionsize parameter as that will impact
> > other tables too.
> >
> > Our idea is to manually inspect the row key ranges and then split the
> > region manually and assign them to different region servers. We will
> > continue to then monitor the rows in one region to see if needs to be
> > split.
> >
> > Any experience of doing this on HBase. Is this a recommended approach?
> >
> > Thanks,
> > Manish
> >
>


Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
bq. We cannot change the maxregionsize parameter

The region size can be changed on per table basis:

  hbase> alter 't1', MAX_FILESIZE => '134217728'

See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb for
more details.

FYI

On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari 
wrote:

> Hi,
>
> We have a scenario where HBase is used like a Key Value Database to map
> Keys to Regions. We have over 5 Million Keys, but the table size is less
> than 7 GB. The read volume is pretty high - About 50x of the put/delete
> volume. This causes hot spotting on the Data Node and the region is not
> split. We cannot change the maxregionsize parameter as that will impact
> other tables too.
>
> Our idea is to manually inspect the row key ranges and then split the
> region manually and assign them to different region servers. We will
> continue to then monitor the rows in one region to see if needs to be
> split.
>
> Any experience of doing this on HBase. Is this a recommended approach?
>
> Thanks,
> Manish
>


Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
Cycling old bits:
http://search-hadoop.com/m/YGbb3E2a71UVLBK&subj=Re+HBase+Count+Rows+in+Regions+and+Region+Servers

You can use /jmx to inspect regions and find the hotspot.

On Mon, Aug 29, 2016 at 7:29 AM, Manish Maheshwari 
wrote:

> Hi Dima,
>
> Thanks for the suggestion. We can load the data in heap, but Hbase makes it
> easier for one to write and another to read. With heap we need to build a
> process to handle both processes and also write to log so as to not lose
> the updates in case of process failure.
>
> Thanks
> Manish
>
> On Aug 29, 2016 2:18 PM, "Dima Spivak"  wrote:
>
> > (Though if it is only 7 GB, why not just store it in memory?)
> >
> > On Sunday, August 28, 2016, Dima Spivak  wrote:
> >
> > > If your data can all fit on one machine, HBase is not the best choice.
> I
> > > think you'd be better off using a simpler solution for small data and
> > leave
> > > HBase for use cases that require proper clusters.
> > >
> > > On Sunday, August 28, 2016, Manish Maheshwari  > > > wrote:
> > >
> > >> We dont want to invest into another DB like Dynamo, Cassandra and
> > Already
> > >> are in the Hadoop Stack. Managing another DB would be a pain. Why
> HBase
> > >> over RDMS, is because we call HBase via Spark Streaming to lookup the
> > >> keys.
> > >>
> > >> Manish
> > >>
> > >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak 
> > >> wrote:
> > >>
> > >> > Hey Manish,
> > >> >
> > >> > Just to ask the naive question, why use HBase if the data fits into
> > >> such a
> > >> > small table?
> > >> >
> > >> > On Sunday, August 28, 2016, Manish Maheshwari 
> > >> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > We have a scenario where HBase is used like a Key Value Database
> to
> > >> map
> > >> > > Keys to Regions. We have over 5 Million Keys, but the table size
> is
> > >> less
> > >> > > than 7 GB. The read volume is pretty high - About 50x of the
> > >> put/delete
> > >> > > volume. This causes hot spotting on the Data Node and the region
> is
> > >> not
> > >> > > split. We cannot change the maxregionsize parameter as that will
> > >> impact
> > >> > > other tables too.
> > >> > >
> > >> > > Our idea is to manually inspect the row key ranges and then split
> > the
> > >> > > region manually and assign them to different region servers. We
> will
> > >> > > continue to then monitor the rows in one region to see if needs to
> > be
> > >> > > split.
> > >> > >
> > >> > > Any experience of doing this on HBase. Is this a recommended
> > approach?
> > >> > >
> > >> > > Thanks,
> > >> > > Manish
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > -Dima
> > >> >
> > >>
> > >
> > >
> > > --
> > > -Dima
> > >
> > >
> >
> > --
> > -Dima
> >
>


Re: HBase for Small Key Value Tables

2016-08-29 Thread Manish Maheshwari
Hi Dima,

Thanks for the suggestion. We can load the data in heap, but Hbase makes it
easier for one to write and another to read. With heap we need to build a
process to handle both processes and also write to log so as to not lose
the updates in case of process failure.

Thanks
Manish

On Aug 29, 2016 2:18 PM, "Dima Spivak"  wrote:

> (Though if it is only 7 GB, why not just store it in memory?)
>
> On Sunday, August 28, 2016, Dima Spivak  wrote:
>
> > If your data can all fit on one machine, HBase is not the best choice. I
> > think you'd be better off using a simpler solution for small data and
> leave
> > HBase for use cases that require proper clusters.
> >
> > On Sunday, August 28, 2016, Manish Maheshwari  > > wrote:
> >
> >> We dont want to invest into another DB like Dynamo, Cassandra and
> Already
> >> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase
> >> over RDMS, is because we call HBase via Spark Streaming to lookup the
> >> keys.
> >>
> >> Manish
> >>
> >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak 
> >> wrote:
> >>
> >> > Hey Manish,
> >> >
> >> > Just to ask the naive question, why use HBase if the data fits into
> >> such a
> >> > small table?
> >> >
> >> > On Sunday, August 28, 2016, Manish Maheshwari 
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > We have a scenario where HBase is used like a Key Value Database to
> >> map
> >> > > Keys to Regions. We have over 5 Million Keys, but the table size is
> >> less
> >> > > than 7 GB. The read volume is pretty high - About 50x of the
> >> put/delete
> >> > > volume. This causes hot spotting on the Data Node and the region is
> >> not
> >> > > split. We cannot change the maxregionsize parameter as that will
> >> impact
> >> > > other tables too.
> >> > >
> >> > > Our idea is to manually inspect the row key ranges and then split
> the
> >> > > region manually and assign them to different region servers. We will
> >> > > continue to then monitor the rows in one region to see if needs to
> be
> >> > > split.
> >> > >
> >> > > Any experience of doing this on HBase. Is this a recommended
> approach?
> >> > >
> >> > > Thanks,
> >> > > Manish
> >> > >
> >> >
> >> >
> >> > --
> >> > -Dima
> >> >
> >>
> >
> >
> > --
> > -Dima
> >
> >
>
> --
> -Dima
>


Re: HBase for Small Key Value Tables

2016-08-28 Thread Dima Spivak
(Though if it is only 7 GB, why not just store it in memory?)

On Sunday, August 28, 2016, Dima Spivak  wrote:

> If your data can all fit on one machine, HBase is not the best choice. I
> think you'd be better off using a simpler solution for small data and leave
> HBase for use cases that require proper clusters.
>
> On Sunday, August 28, 2016, Manish Maheshwari  > wrote:
>
>> We dont want to invest into another DB like Dynamo, Cassandra and Already
>> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase
>> over RDMS, is because we call HBase via Spark Streaming to lookup the
>> keys.
>>
>> Manish
>>
>> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak 
>> wrote:
>>
>> > Hey Manish,
>> >
>> > Just to ask the naive question, why use HBase if the data fits into
>> such a
>> > small table?
>> >
>> > On Sunday, August 28, 2016, Manish Maheshwari 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > We have a scenario where HBase is used like a Key Value Database to
>> map
>> > > Keys to Regions. We have over 5 Million Keys, but the table size is
>> less
>> > > than 7 GB. The read volume is pretty high - About 50x of the
>> put/delete
>> > > volume. This causes hot spotting on the Data Node and the region is
>> not
>> > > split. We cannot change the maxregionsize parameter as that will
>> impact
>> > > other tables too.
>> > >
>> > > Our idea is to manually inspect the row key ranges and then split the
>> > > region manually and assign them to different region servers. We will
>> > > continue to then monitor the rows in one region to see if needs to be
>> > > split.
>> > >
>> > > Any experience of doing this on HBase. Is this a recommended approach?
>> > >
>> > > Thanks,
>> > > Manish
>> > >
>> >
>> >
>> > --
>> > -Dima
>> >
>>
>
>
> --
> -Dima
>
>

-- 
-Dima


Re: HBase for Small Key Value Tables

2016-08-28 Thread Dima Spivak
If your data can all fit on one machine, HBase is not the best choice. I
think you'd be better off using a simpler solution for small data and leave
HBase for use cases that require proper clusters.

On Sunday, August 28, 2016, Manish Maheshwari  wrote:

> We dont want to invest into another DB like Dynamo, Cassandra and Already
> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase
> over RDMS, is because we call HBase via Spark Streaming to lookup the keys.
>
> Manish
>
> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak  > wrote:
>
> > Hey Manish,
> >
> > Just to ask the naive question, why use HBase if the data fits into such
> a
> > small table?
> >
> > On Sunday, August 28, 2016, Manish Maheshwari  > wrote:
> >
> > > Hi,
> > >
> > > We have a scenario where HBase is used like a Key Value Database to map
> > > Keys to Regions. We have over 5 Million Keys, but the table size is
> less
> > > than 7 GB. The read volume is pretty high - About 50x of the put/delete
> > > volume. This causes hot spotting on the Data Node and the region is not
> > > split. We cannot change the maxregionsize parameter as that will impact
> > > other tables too.
> > >
> > > Our idea is to manually inspect the row key ranges and then split the
> > > region manually and assign them to different region servers. We will
> > > continue to then monitor the rows in one region to see if needs to be
> > > split.
> > >
> > > Any experience of doing this on HBase. Is this a recommended approach?
> > >
> > > Thanks,
> > > Manish
> > >
> >
> >
> > --
> > -Dima
> >
>


-- 
-Dima


Re: HBase for Small Key Value Tables

2016-08-28 Thread Manish Maheshwari
We dont want to invest into another DB like Dynamo, Cassandra and Already
are in the Hadoop Stack. Managing another DB would be a pain. Why HBase
over RDMS, is because we call HBase via Spark Streaming to lookup the keys.

Manish

On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak  wrote:

> Hey Manish,
>
> Just to ask the naive question, why use HBase if the data fits into such a
> small table?
>
> On Sunday, August 28, 2016, Manish Maheshwari  wrote:
>
> > Hi,
> >
> > We have a scenario where HBase is used like a Key Value Database to map
> > Keys to Regions. We have over 5 Million Keys, but the table size is less
> > than 7 GB. The read volume is pretty high - About 50x of the put/delete
> > volume. This causes hot spotting on the Data Node and the region is not
> > split. We cannot change the maxregionsize parameter as that will impact
> > other tables too.
> >
> > Our idea is to manually inspect the row key ranges and then split the
> > region manually and assign them to different region servers. We will
> > continue to then monitor the rows in one region to see if needs to be
> > split.
> >
> > Any experience of doing this on HBase. Is this a recommended approach?
> >
> > Thanks,
> > Manish
> >
>
>
> --
> -Dima
>


Re: HBase for Small Key Value Tables

2016-08-28 Thread Dima Spivak
Hey Manish,

Just to ask the naive question, why use HBase if the data fits into such a
small table?

On Sunday, August 28, 2016, Manish Maheshwari  wrote:

> Hi,
>
> We have a scenario where HBase is used like a Key Value Database to map
> Keys to Regions. We have over 5 Million Keys, but the table size is less
> than 7 GB. The read volume is pretty high - About 50x of the put/delete
> volume. This causes hot spotting on the Data Node and the region is not
> split. We cannot change the maxregionsize parameter as that will impact
> other tables too.
>
> Our idea is to manually inspect the row key ranges and then split the
> region manually and assign them to different region servers. We will
> continue to then monitor the rows in one region to see if needs to be
> split.
>
> Any experience of doing this on HBase. Is this a recommended approach?
>
> Thanks,
> Manish
>


-- 
-Dima