Re: HBase for Small Key Value Tables
You don't need to rebuild hbase. Just add entry in hbase-site.xml for the following config: > hbase.master.balancer.stochastic.tableSkewCost Restart master after the addition. Cheers > On Aug 30, 2016, at 12:10 AM, Manish Maheshwari wrote: > > Hi Ted, > > Where do we set this value DEFAULT_TABLE_SKEW_COST = 35. I see it in only > in StochasticLoadBalancer.java > We don't find this in any of the HBase Config files. Do we need to re-build > HBase from code for this? > > Thanks, > Manish > >> On Tue, Aug 30, 2016 at 6:44 AM, Ted Yu wrote: >> >> StochasticLoadBalancer by default would balance regions evenly across the >> cluster. >> >> Regions of particular table may not be evenly distributed. >> >> Increase the value for the following config: >> >>private static final String TABLE_SKEW_COST_KEY = >> >>"hbase.master.balancer.stochastic.tableSkewCost"; >> >>private static final float DEFAULT_TABLE_SKEW_COST = 35; >> >> You can set 500 or higher. >> >> FYI >> >> On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari >> wrote: >> >>> Thanks Ted for the maxregionsize per table idea. We will try to keep it >>> around 1-2 Gigs and see how it goes. Will this also make sure that the >>> region migrates to another region server? Or do we still need to do that >>> manually? >>> >>> On JMX, Since the environment is production, we are yet unable to use jmx >>> for stats collection. But in dev we are trying it out. >>> On Aug 30, 2016 1:01 AM, "Ted Yu" wrote: bq. We cannot change the maxregionsize parameter The region size can be changed on per table basis: hbase> alter 't1', MAX_FILESIZE => '134217728' See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb >>> for more details. FYI On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari < >> mylogi...@gmail.com wrote: > Hi, > > We have a scenario where HBase is used like a Key Value Database to >> map > Keys to Regions. We have over 5 Million Keys, but the table size is >>> less > than 7 GB. The read volume is pretty high - About 50x of the >> put/delete > volume. This causes hot spotting on the Data Node and the region is >> not > split. We cannot change the maxregionsize parameter as that will >> impact > other tables too. > > Our idea is to manually inspect the row key ranges and then split the > region manually and assign them to different region servers. We will > continue to then monitor the rows in one region to see if needs to be > split. > > Any experience of doing this on HBase. Is this a recommended >> approach? > > Thanks, > Manish >>
Re: HBase for Small Key Value Tables
Hi Ted, Where do we set this value DEFAULT_TABLE_SKEW_COST = 35. I see it in only in StochasticLoadBalancer.java We don't find this in any of the HBase Config files. Do we need to re-build HBase from code for this? Thanks, Manish On Tue, Aug 30, 2016 at 6:44 AM, Ted Yu wrote: > StochasticLoadBalancer by default would balance regions evenly across the > cluster. > > Regions of particular table may not be evenly distributed. > > Increase the value for the following config: > > private static final String TABLE_SKEW_COST_KEY = > > "hbase.master.balancer.stochastic.tableSkewCost"; > > private static final float DEFAULT_TABLE_SKEW_COST = 35; > > You can set 500 or higher. > > FYI > > On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari > wrote: > > > Thanks Ted for the maxregionsize per table idea. We will try to keep it > > around 1-2 Gigs and see how it goes. Will this also make sure that the > > region migrates to another region server? Or do we still need to do that > > manually? > > > > On JMX, Since the environment is production, we are yet unable to use jmx > > for stats collection. But in dev we are trying it out. > > > > On Aug 30, 2016 1:01 AM, "Ted Yu" wrote: > > > > > bq. We cannot change the maxregionsize parameter > > > > > > The region size can be changed on per table basis: > > > > > > hbase> alter 't1', MAX_FILESIZE => '134217728' > > > > > > See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb > > for > > > more details. > > > > > > FYI > > > > > > On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari < > mylogi...@gmail.com > > > > > > wrote: > > > > > > > Hi, > > > > > > > > We have a scenario where HBase is used like a Key Value Database to > map > > > > Keys to Regions. We have over 5 Million Keys, but the table size is > > less > > > > than 7 GB. The read volume is pretty high - About 50x of the > put/delete > > > > volume. This causes hot spotting on the Data Node and the region is > not > > > > split. We cannot change the maxregionsize parameter as that will > impact > > > > other tables too. > > > > > > > > Our idea is to manually inspect the row key ranges and then split the > > > > region manually and assign them to different region servers. We will > > > > continue to then monitor the rows in one region to see if needs to be > > > > split. > > > > > > > > Any experience of doing this on HBase. Is this a recommended > approach? > > > > > > > > Thanks, > > > > Manish > > > > > > > > > >
Re: HBase for Small Key Value Tables
StochasticLoadBalancer by default would balance regions evenly across the cluster. Regions of particular table may not be evenly distributed. Increase the value for the following config: private static final String TABLE_SKEW_COST_KEY = "hbase.master.balancer.stochastic.tableSkewCost"; private static final float DEFAULT_TABLE_SKEW_COST = 35; You can set 500 or higher. FYI On Mon, Aug 29, 2016 at 3:22 PM, Manish Maheshwari wrote: > Thanks Ted for the maxregionsize per table idea. We will try to keep it > around 1-2 Gigs and see how it goes. Will this also make sure that the > region migrates to another region server? Or do we still need to do that > manually? > > On JMX, Since the environment is production, we are yet unable to use jmx > for stats collection. But in dev we are trying it out. > > On Aug 30, 2016 1:01 AM, "Ted Yu" wrote: > > > bq. We cannot change the maxregionsize parameter > > > > The region size can be changed on per table basis: > > > > hbase> alter 't1', MAX_FILESIZE => '134217728' > > > > See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb > for > > more details. > > > > FYI > > > > On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari > > > wrote: > > > > > Hi, > > > > > > We have a scenario where HBase is used like a Key Value Database to map > > > Keys to Regions. We have over 5 Million Keys, but the table size is > less > > > than 7 GB. The read volume is pretty high - About 50x of the put/delete > > > volume. This causes hot spotting on the Data Node and the region is not > > > split. We cannot change the maxregionsize parameter as that will impact > > > other tables too. > > > > > > Our idea is to manually inspect the row key ranges and then split the > > > region manually and assign them to different region servers. We will > > > continue to then monitor the rows in one region to see if needs to be > > > split. > > > > > > Any experience of doing this on HBase. Is this a recommended approach? > > > > > > Thanks, > > > Manish > > > > > >
Re: HBase for Small Key Value Tables
Thanks Ted for the maxregionsize per table idea. We will try to keep it around 1-2 Gigs and see how it goes. Will this also make sure that the region migrates to another region server? Or do we still need to do that manually? On JMX, Since the environment is production, we are yet unable to use jmx for stats collection. But in dev we are trying it out. On Aug 30, 2016 1:01 AM, "Ted Yu" wrote: > bq. We cannot change the maxregionsize parameter > > The region size can be changed on per table basis: > > hbase> alter 't1', MAX_FILESIZE => '134217728' > > See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb for > more details. > > FYI > > On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari > wrote: > > > Hi, > > > > We have a scenario where HBase is used like a Key Value Database to map > > Keys to Regions. We have over 5 Million Keys, but the table size is less > > than 7 GB. The read volume is pretty high - About 50x of the put/delete > > volume. This causes hot spotting on the Data Node and the region is not > > split. We cannot change the maxregionsize parameter as that will impact > > other tables too. > > > > Our idea is to manually inspect the row key ranges and then split the > > region manually and assign them to different region servers. We will > > continue to then monitor the rows in one region to see if needs to be > > split. > > > > Any experience of doing this on HBase. Is this a recommended approach? > > > > Thanks, > > Manish > > >
Re: HBase for Small Key Value Tables
bq. We cannot change the maxregionsize parameter The region size can be changed on per table basis: hbase> alter 't1', MAX_FILESIZE => '134217728' See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb for more details. FYI On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari wrote: > Hi, > > We have a scenario where HBase is used like a Key Value Database to map > Keys to Regions. We have over 5 Million Keys, but the table size is less > than 7 GB. The read volume is pretty high - About 50x of the put/delete > volume. This causes hot spotting on the Data Node and the region is not > split. We cannot change the maxregionsize parameter as that will impact > other tables too. > > Our idea is to manually inspect the row key ranges and then split the > region manually and assign them to different region servers. We will > continue to then monitor the rows in one region to see if needs to be > split. > > Any experience of doing this on HBase. Is this a recommended approach? > > Thanks, > Manish >
Re: HBase for Small Key Value Tables
Cycling old bits: http://search-hadoop.com/m/YGbb3E2a71UVLBK&subj=Re+HBase+Count+Rows+in+Regions+and+Region+Servers You can use /jmx to inspect regions and find the hotspot. On Mon, Aug 29, 2016 at 7:29 AM, Manish Maheshwari wrote: > Hi Dima, > > Thanks for the suggestion. We can load the data in heap, but Hbase makes it > easier for one to write and another to read. With heap we need to build a > process to handle both processes and also write to log so as to not lose > the updates in case of process failure. > > Thanks > Manish > > On Aug 29, 2016 2:18 PM, "Dima Spivak" wrote: > > > (Though if it is only 7 GB, why not just store it in memory?) > > > > On Sunday, August 28, 2016, Dima Spivak wrote: > > > > > If your data can all fit on one machine, HBase is not the best choice. > I > > > think you'd be better off using a simpler solution for small data and > > leave > > > HBase for use cases that require proper clusters. > > > > > > On Sunday, August 28, 2016, Manish Maheshwari > > > wrote: > > > > > >> We dont want to invest into another DB like Dynamo, Cassandra and > > Already > > >> are in the Hadoop Stack. Managing another DB would be a pain. Why > HBase > > >> over RDMS, is because we call HBase via Spark Streaming to lookup the > > >> keys. > > >> > > >> Manish > > >> > > >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak > > >> wrote: > > >> > > >> > Hey Manish, > > >> > > > >> > Just to ask the naive question, why use HBase if the data fits into > > >> such a > > >> > small table? > > >> > > > >> > On Sunday, August 28, 2016, Manish Maheshwari > > >> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > We have a scenario where HBase is used like a Key Value Database > to > > >> map > > >> > > Keys to Regions. We have over 5 Million Keys, but the table size > is > > >> less > > >> > > than 7 GB. The read volume is pretty high - About 50x of the > > >> put/delete > > >> > > volume. This causes hot spotting on the Data Node and the region > is > > >> not > > >> > > split. We cannot change the maxregionsize parameter as that will > > >> impact > > >> > > other tables too. > > >> > > > > >> > > Our idea is to manually inspect the row key ranges and then split > > the > > >> > > region manually and assign them to different region servers. We > will > > >> > > continue to then monitor the rows in one region to see if needs to > > be > > >> > > split. > > >> > > > > >> > > Any experience of doing this on HBase. Is this a recommended > > approach? > > >> > > > > >> > > Thanks, > > >> > > Manish > > >> > > > > >> > > > >> > > > >> > -- > > >> > -Dima > > >> > > > >> > > > > > > > > > -- > > > -Dima > > > > > > > > > > -- > > -Dima > > >
Re: HBase for Small Key Value Tables
Hi Dima, Thanks for the suggestion. We can load the data in heap, but Hbase makes it easier for one to write and another to read. With heap we need to build a process to handle both processes and also write to log so as to not lose the updates in case of process failure. Thanks Manish On Aug 29, 2016 2:18 PM, "Dima Spivak" wrote: > (Though if it is only 7 GB, why not just store it in memory?) > > On Sunday, August 28, 2016, Dima Spivak wrote: > > > If your data can all fit on one machine, HBase is not the best choice. I > > think you'd be better off using a simpler solution for small data and > leave > > HBase for use cases that require proper clusters. > > > > On Sunday, August 28, 2016, Manish Maheshwari > > wrote: > > > >> We dont want to invest into another DB like Dynamo, Cassandra and > Already > >> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase > >> over RDMS, is because we call HBase via Spark Streaming to lookup the > >> keys. > >> > >> Manish > >> > >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak > >> wrote: > >> > >> > Hey Manish, > >> > > >> > Just to ask the naive question, why use HBase if the data fits into > >> such a > >> > small table? > >> > > >> > On Sunday, August 28, 2016, Manish Maheshwari > >> wrote: > >> > > >> > > Hi, > >> > > > >> > > We have a scenario where HBase is used like a Key Value Database to > >> map > >> > > Keys to Regions. We have over 5 Million Keys, but the table size is > >> less > >> > > than 7 GB. The read volume is pretty high - About 50x of the > >> put/delete > >> > > volume. This causes hot spotting on the Data Node and the region is > >> not > >> > > split. We cannot change the maxregionsize parameter as that will > >> impact > >> > > other tables too. > >> > > > >> > > Our idea is to manually inspect the row key ranges and then split > the > >> > > region manually and assign them to different region servers. We will > >> > > continue to then monitor the rows in one region to see if needs to > be > >> > > split. > >> > > > >> > > Any experience of doing this on HBase. Is this a recommended > approach? > >> > > > >> > > Thanks, > >> > > Manish > >> > > > >> > > >> > > >> > -- > >> > -Dima > >> > > >> > > > > > > -- > > -Dima > > > > > > -- > -Dima >
Re: HBase for Small Key Value Tables
(Though if it is only 7 GB, why not just store it in memory?) On Sunday, August 28, 2016, Dima Spivak wrote: > If your data can all fit on one machine, HBase is not the best choice. I > think you'd be better off using a simpler solution for small data and leave > HBase for use cases that require proper clusters. > > On Sunday, August 28, 2016, Manish Maheshwari > wrote: > >> We dont want to invest into another DB like Dynamo, Cassandra and Already >> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase >> over RDMS, is because we call HBase via Spark Streaming to lookup the >> keys. >> >> Manish >> >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak >> wrote: >> >> > Hey Manish, >> > >> > Just to ask the naive question, why use HBase if the data fits into >> such a >> > small table? >> > >> > On Sunday, August 28, 2016, Manish Maheshwari >> wrote: >> > >> > > Hi, >> > > >> > > We have a scenario where HBase is used like a Key Value Database to >> map >> > > Keys to Regions. We have over 5 Million Keys, but the table size is >> less >> > > than 7 GB. The read volume is pretty high - About 50x of the >> put/delete >> > > volume. This causes hot spotting on the Data Node and the region is >> not >> > > split. We cannot change the maxregionsize parameter as that will >> impact >> > > other tables too. >> > > >> > > Our idea is to manually inspect the row key ranges and then split the >> > > region manually and assign them to different region servers. We will >> > > continue to then monitor the rows in one region to see if needs to be >> > > split. >> > > >> > > Any experience of doing this on HBase. Is this a recommended approach? >> > > >> > > Thanks, >> > > Manish >> > > >> > >> > >> > -- >> > -Dima >> > >> > > > -- > -Dima > > -- -Dima
Re: HBase for Small Key Value Tables
If your data can all fit on one machine, HBase is not the best choice. I think you'd be better off using a simpler solution for small data and leave HBase for use cases that require proper clusters. On Sunday, August 28, 2016, Manish Maheshwari wrote: > We dont want to invest into another DB like Dynamo, Cassandra and Already > are in the Hadoop Stack. Managing another DB would be a pain. Why HBase > over RDMS, is because we call HBase via Spark Streaming to lookup the keys. > > Manish > > On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak > wrote: > > > Hey Manish, > > > > Just to ask the naive question, why use HBase if the data fits into such > a > > small table? > > > > On Sunday, August 28, 2016, Manish Maheshwari > wrote: > > > > > Hi, > > > > > > We have a scenario where HBase is used like a Key Value Database to map > > > Keys to Regions. We have over 5 Million Keys, but the table size is > less > > > than 7 GB. The read volume is pretty high - About 50x of the put/delete > > > volume. This causes hot spotting on the Data Node and the region is not > > > split. We cannot change the maxregionsize parameter as that will impact > > > other tables too. > > > > > > Our idea is to manually inspect the row key ranges and then split the > > > region manually and assign them to different region servers. We will > > > continue to then monitor the rows in one region to see if needs to be > > > split. > > > > > > Any experience of doing this on HBase. Is this a recommended approach? > > > > > > Thanks, > > > Manish > > > > > > > > > -- > > -Dima > > > -- -Dima
Re: HBase for Small Key Value Tables
We dont want to invest into another DB like Dynamo, Cassandra and Already are in the Hadoop Stack. Managing another DB would be a pain. Why HBase over RDMS, is because we call HBase via Spark Streaming to lookup the keys. Manish On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak wrote: > Hey Manish, > > Just to ask the naive question, why use HBase if the data fits into such a > small table? > > On Sunday, August 28, 2016, Manish Maheshwari wrote: > > > Hi, > > > > We have a scenario where HBase is used like a Key Value Database to map > > Keys to Regions. We have over 5 Million Keys, but the table size is less > > than 7 GB. The read volume is pretty high - About 50x of the put/delete > > volume. This causes hot spotting on the Data Node and the region is not > > split. We cannot change the maxregionsize parameter as that will impact > > other tables too. > > > > Our idea is to manually inspect the row key ranges and then split the > > region manually and assign them to different region servers. We will > > continue to then monitor the rows in one region to see if needs to be > > split. > > > > Any experience of doing this on HBase. Is this a recommended approach? > > > > Thanks, > > Manish > > > > > -- > -Dima >
Re: HBase for Small Key Value Tables
Hey Manish, Just to ask the naive question, why use HBase if the data fits into such a small table? On Sunday, August 28, 2016, Manish Maheshwari wrote: > Hi, > > We have a scenario where HBase is used like a Key Value Database to map > Keys to Regions. We have over 5 Million Keys, but the table size is less > than 7 GB. The read volume is pretty high - About 50x of the put/delete > volume. This causes hot spotting on the Data Node and the region is not > split. We cannot change the maxregionsize parameter as that will impact > other tables too. > > Our idea is to manually inspect the row key ranges and then split the > region manually and assign them to different region servers. We will > continue to then monitor the rows in one region to see if needs to be > split. > > Any experience of doing this on HBase. Is this a recommended approach? > > Thanks, > Manish > -- -Dima