Re: HBase mention in VLDB keynote
Right, the point I was making is not about absolute numbers but the scale of the test and successful results at that scale. I would think that is on par with the (failed) experimentation at Yahoo, but have yet to see the evaluation materials posted anywhere. - Andy From: Jonathan Gray jl...@streamy.com To: hbase-user@hadoop.apache.org Sent: Tuesday, August 25, 2009 11:08:17 PM Subject: Re: HBase mention in VLDB keynote If you are just looking for numbers, they can vary quite drastically depending on the cluster configuration, cluster hardware, jvm/gc configuration, dataset properties, read patterns, and load patterns. The ones I provided in that presentation are on a very small cluster but with simple data and low load, my attempt at some getting some base numbers. You really need to load up some of your own data and see how it behaves on your own cluster. And tuning is increasingly important now as we are limited by Java GC quite a bit. JG Schubert Zhang wrote: @stack We know HIVE-705, and already have good communication with the contributor, since we are all chinese. :-) In fact some code of the patch are used and tested in our project. But we need more flexible data store schema to resolve engineering problems, especially performance and practicability. @andy Does ryan's result different from JG's? On Wed, Aug 26, 2009 at 2:50 AM, Andrew Purtell apurt...@apache.org wrote: Hi Schubert, Regards ...and JG's/Ryan's performance test results for 0.20 stand as a contradiction. Can you provide more references? such as a url/link of these contradiction? For JG: http://www.docstoc.com/docs/7493304/HBase-Goes-Realtime I'm sure you have seen this already. Ryan has posted some information on the list now and again. Also I think your work with performance evaluation is very important feedback and data points. Thanks for that. We are doing a interesting thing to make Hive can use HBase as it's data store. Now we can use Hive's SQL to query/mapreduce data stored in HBase, and also we can directly query/scan data from HBase. That sounds REALLY interesting! - Andy From: Schubert Zhang zson...@gmail.com To: hbase-user@hadoop.apache.org Sent: Tuesday, August 25, 2009 8:26:50 PM Subject: Re: HBase mention in VLDB keynote hi andy, Even though current HBase is not yet ready for production, but we know it is really testable and evaluation-able for its data model and architecture. Regards ...and JG's/Ryan's performance test results for 0.20 stand as a contradiction. Can you provide more references? such as a url/link of these contradiction? Regards Hive, it's really a good design, especially about its abatraction of MapReduce workflow matched to SQL. Hive made a good success inside Facebook, the report says 29% of Facebook employees use Hive, and 51% of those users are from outside engineering. It should be caused by the easy leaned SQL than other languages such as Pig Latin, etc. In fact, Pig is now adding features of metadata and sql, which are provided in Hive. But Hive is still not very flexible to use alternate data store than HDFS files. We are doing a interesting thing to make Hive can use HBase as it's data store. Now we can use Hive's SQL to query/mapreduce data stored in HBase, and also we can directly query/scan data from HBase. I believe HBase can be a data store to work as a storage adapter layer above HDFS. It is not a database, it is just a data storage adapter system above HDFS, with a distributed b-tree clustered index. BigTable is designed to provide more easy-used ways to store small data objects and provide random-access, since GFS is designed for sequential-access/batch-processing/large-data storage and GFS is not appropriate to store small data objects and random-access. I also believe HBase can be a data store to let MapReduce over HBase possiable. If we review the Bigtable paper's, especially secetor 8, we can find it is widely used for to do mapreduce analysis/summary in many google applications. In the recent ACM Queue interview to Sean Quinlan, Google GFS leader, we can find google's new GFS integrated some data models of Bigtable. http://queue.acm.org/detail.cfm?id=1594206 Schubert On Wed, Aug 26, 2009 at 12:36 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Interesting. I need to see what sort of eval was going on for that presentation... He probably forgot to tweak GC :) On Tue, Aug 25, 2009 at 9:32 AM, Andrew Purtell apurt...@apache.org wrote: Can we write him to figure more on how evaluation was done? This was one interaction with that group, maybe the only other one aside from a question about sizing memstore: http://osdir.com/ml/hbase-user-hadoop-apache/2009-07/msg00552.html Now I wonder if the eval was done via the REST gateway... A followup might be useful. If I run into someone
Re: hbase/jython outdated
I have fixed the code samples and opened a feature request on JIRA for the jython command. https://issues.apache.org/jira/browse/HBASE-1796 Until recently I have used the python thrift interface but it has some serious issues with unicode. Currently I am searching for alternatives. Is there any python library for REST interface? How stable is the REST interface? On Tue, Aug 25, 2009 at 4:18 PM, Jean-Daniel Cryansjdcry...@apache.org wrote: I can edit this page just fine but you have to be logged in to do that, anyone can sign in. Thx! J-D On Tue, Aug 25, 2009 at 7:02 AM, Andrei Savusavu.and...@gmail.com wrote: Hi, The Hbase/Jython ( http://wiki.apache.org/hadoop/Hbase/Jython ) wiki page is outdated. I want to edit it but the page is marked as immutable. I have attached a working sample and a patched version of bin/hbase with the jython command added. -- Savu Andrei Website: http://www.andreisavu.ro/ -- Savu Andrei Website: http://www.andreisavu.ro/
Settings
Hi, It seems over the years I tried various settings in both Hadoop and HBase and when redoing a cluster it is always a question if we should keep that setting or not - since the issue it suppressed was fixed already. Maybe we should have a wiki page with the current settings and more advanced ones and when and how to use them. I find often that the description itself in the various default files are often as ambiguous as the setting key itself. Here a list of the not so obvious settings and what I set them as - please help me identifying which are useful or actually obsolete. HBase: - - fs.default.name = hdfs://master-hostname:9000/ This is usually in core-site.xml in Hadoop. Is the client or server needing this key at all? Did I copy it in the hbase site file by mistake? - hbase.cluster.distributed = true For true replication and stand alone ZK installations. - dfs.datanode.socket.write.timeout = 0 This is used in DataNode but here more importantly in DFSClient. Its default is fixed to apparently 8 minutes, no default file (I would have assumed hdfs-default.xml) has it listed. We set it to 0 to avoid the socket timing out on low use etc. because the DFSClient reconnect is not handled gracefully. I trust setting it to 0 is what we recommend for HBase and is still valid? - hbase.regionserver.lease.period = 60 Default was changed from 60 to 120 seconds. Over time I had issues and have set it to 10mins. Good or bad? - hbase.hregion.memstore.block.multiplier = 4 This is up from the default 2. Good or bad? - hbase.hregion.max.filesize = 536870912 Again twice as much as the default. Opinions? - hbase.regions.nobalancing.count = 20 This seems to be missing from the hbase-default.xml but is set to 4 in the code if not specified. The above I got from Ryan to improve startup of HBase. It means that while a RS is still opening up to 20 regions it can start rebalance regions. Handled by the ServerManager during message processing. Opinions? - hbase.regions.percheckin = 20 This is the count of regions assigned in one go. Handled in RegionmManager and the default is 10. Here we tell it to assign regions in larger batches to speed up the cluster start. Opinions? - hbase.regionserver.handler.count = 30 Up from 10 as I had often the problem that the UI was not responsive while a import MR job would run. All handlers were busy doing the inserts. JD mentioned it may be set to a higher default value? Hadoop: -- - dfs.block.size = 134217728 Up from the default 64MB. I have done this in the past as my data size per cell is larger than the usual few bytes. I can have a few KB up to just above 1 MB per value. Still making sense? - dfs.namenode.handler.count = 20 This was upped from the default 10 quite some time ago (more than a year ago). So is this still required? - dfs.datanode.socket.write.timeout = 0 This is the matching entry to the above I suppose. This time for the DataNode. Still required? - dfs.datanode.max.xcievers = 4096 Default is 256 and often way to low. What is a good value you would use? What is the drawback setting it high? Thanks, Lars
Re: Hbase 0.20 example\manual
Gents, It appears that example for mapred in hbase 0.20RC1 source uses alot of deprecated classes. Is it true to assume that it is out of date ? If so, could anyone point me to example for mapred of 0.20 ? Thanks, Alex On Tue, Aug 18, 2009 at 2:26 AM, Alex Spodinets spodin...@gmail.com wrote: exciting, thanks. On Tue, Aug 18, 2009 at 2:05 AM, Jonathan Gray jl...@streamy.com wrote: Look at the overview/summary in the javadocs. I'm not sure if an official one has been posted yet, but you can check out the Getting Started guide here: http://jgray.la/javadoc/hbase-0.20.0/overview-summary.html And API examples here: http://jgray.la/javadoc/hbase-0.20.0/org/apache/hadoop/hbase/client/package-summary.html JG Alex Spodinets wrote: Hello, Could somone kindly point me to an example of HBase 0.20 API usage. All i was able to find so far is a Map\Reduce example in the 0.20 SVN source. Would be also good to have some info on how 0.20 should be installed, especially the zoo keeper. Thanks.
Re: Hbase 0.20 example\manual
See under http://people.apache.org/~stack/hbase-0.20.0-candidate-2/docs/. The client code is linked from the 'Getting Started' section. Here is direct link: http://su.pr/Anqe9D St.Ack On Wed, Aug 26, 2009 at 9:10 AM, Alex Spodinets spodin...@gmail.com wrote: Gents, It appears that example for mapred in hbase 0.20RC1 source uses alot of deprecated classes. Is it true to assume that it is out of date ? If so, could anyone point me to example for mapred of 0.20 ? Thanks, Alex On Tue, Aug 18, 2009 at 2:26 AM, Alex Spodinets spodin...@gmail.com wrote: exciting, thanks. On Tue, Aug 18, 2009 at 2:05 AM, Jonathan Gray jl...@streamy.com wrote: Look at the overview/summary in the javadocs. I'm not sure if an official one has been posted yet, but you can check out the Getting Started guide here: http://jgray.la/javadoc/hbase-0.20.0/overview-summary.html And API examples here: http://jgray.la/javadoc/hbase-0.20.0/org/apache/hadoop/hbase/client/package-summary.html JG Alex Spodinets wrote: Hello, Could somone kindly point me to an example of HBase 0.20 API usage. All i was able to find so far is a Map\Reduce example in the 0.20 SVN source. Would be also good to have some info on how 0.20 should be installed, especially the zoo keeper. Thanks.
Re: Hbase 0.20 example\manual
hi , I saw the tableindexed package here http://people.apache.org/~stack/hbase-0.20.0-candidate-1/docs/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html I have a doubt ... Suppose i have the following tab;e rowkey col 1a 2a 3b 4a 5c 6b suppose i have to index on col .. so my secondary index should be somewhat as follows key val(s) a1,2,4 b 3,6 c 5 Does this new tableindexed allow such kind of repetitions on the column to be indexed ? Some people have said that , in 19.x,the value of the column on wch we are indexing should always be distinct. Does 0.20 adds any support to the one's of above kind? On Tue, Aug 18, 2009 at 4:36 AM, stack st...@duboce.net wrote: Does this help? http://people.apache.org/~stack/hbase-0.20.0-candidate-1/docs/api/overview-summary.html#overview_descriptionhttp://people.apache.org/%7Estack/hbase-0.20.0-candidate-1/docs/api/overview-summary.html#overview_description Includes sample client usage and all about zk + hbase. St.Ack On Mon, Aug 17, 2009 at 3:57 PM, Alex Spodinets spodin...@gmail.com wrote: Hello, Could somone kindly point me to an example of HBase 0.20 API usage. All i was able to find so far is a Map\Reduce example in the 0.20 SVN source. Would be also good to have some info on how 0.20 should be installed, especially the zoo keeper. Thanks.
Re: Hbase 0.20 example\manual
St.Ack, That is a client example. I'm hoping to get Map\Reduce example, have it handy ? Thanks, Alex On Wed, Aug 26, 2009 at 7:27 PM, stack st...@duboce.net wrote: See under http://people.apache.org/~stack/hbase-0.20.0-candidate-2/docs/http://people.apache.org/%7Estack/hbase-0.20.0-candidate-2/docs/ . The client code is linked from the 'Getting Started' section. Here is direct link: http://su.pr/Anqe9D St.Ack On Wed, Aug 26, 2009 at 9:10 AM, Alex Spodinets spodin...@gmail.com wrote: Gents, It appears that example for mapred in hbase 0.20RC1 source uses alot of deprecated classes. Is it true to assume that it is out of date ? If so, could anyone point me to example for mapred of 0.20 ? Thanks, Alex On Tue, Aug 18, 2009 at 2:26 AM, Alex Spodinets spodin...@gmail.com wrote: exciting, thanks. On Tue, Aug 18, 2009 at 2:05 AM, Jonathan Gray jl...@streamy.com wrote: Look at the overview/summary in the javadocs. I'm not sure if an official one has been posted yet, but you can check out the Getting Started guide here: http://jgray.la/javadoc/hbase-0.20.0/overview-summary.html And API examples here: http://jgray.la/javadoc/hbase-0.20.0/org/apache/hadoop/hbase/client/package-summary.html JG Alex Spodinets wrote: Hello, Could somone kindly point me to an example of HBase 0.20 API usage. All i was able to find so far is a Map\Reduce example in the 0.20 SVN source. Would be also good to have some info on how 0.20 should be installed, especially the zoo keeper. Thanks.
Re: Hbase 0.20 example\manual
Alex, Check the org.apache.hadoop.hbase.mapreduce package. It has the updated API and new classes. The legacy mapred package is deprecated. If you like to see an example then check out the RowCounter class. Lars Alex Spodinets wrote: St.Ack, That is a client example. I'm hoping to get Map\Reduce example, have it handy ? Thanks, Alex On Wed, Aug 26, 2009 at 7:27 PM, stack st...@duboce.net wrote: See under http://people.apache.org/~stack/hbase-0.20.0-candidate-2/docs/http://people.apache.org/%7Estack/hbase-0.20.0-candidate-2/docs/ . The client code is linked from the 'Getting Started' section. Here is direct link: http://su.pr/Anqe9D St.Ack On Wed, Aug 26, 2009 at 9:10 AM, Alex Spodinets spodin...@gmail.com wrote: Gents, It appears that example for mapred in hbase 0.20RC1 source uses alot of deprecated classes. Is it true to assume that it is out of date ? If so, could anyone point me to example for mapred of 0.20 ? Thanks, Alex On Tue, Aug 18, 2009 at 2:26 AM, Alex Spodinets spodin...@gmail.com wrote: exciting, thanks. On Tue, Aug 18, 2009 at 2:05 AM, Jonathan Gray jl...@streamy.com wrote: Look at the overview/summary in the javadocs. I'm not sure if an official one has been posted yet, but you can check out the Getting Started guide here: http://jgray.la/javadoc/hbase-0.20.0/overview-summary.html And API examples here: http://jgray.la/javadoc/hbase-0.20.0/org/apache/hadoop/hbase/client/package-summary.html JG Alex Spodinets wrote: Hello, Could somone kindly point me to an example of HBase 0.20 API usage. All i was able to find so far is a Map\Reduce example in the 0.20 SVN source. Would be also good to have some info on how 0.20 should be installed, especially the zoo keeper. Thanks.
Re: HBase mention in VLDB keynote
On Tue, Aug 25, 2009 at 7:05 PM, Schubert Zhang zson...@gmail.com wrote: Thanks JG. We are trying to load up our datasets now. But one thing's for sure that the cluster will become slow while dataset become larger and larger. It is distinct on writes and random read. What kinda of sizes are you talking of Schubert and can you figure where the slowdown is? St.Ack
Re: Hbase 0.20 example\manual
On Wed, Aug 26, 2009 at 9:35 AM, Alex Spodinets spodin...@gmail.com wrote: St.Ack, That is a client example. I'm hoping to get Map\Reduce example, have it handy ? Sorry about that. Yeah, what Lars said (I just committed a patch that clears out the old example with deprecated code and instead points you to RowCounter as example of how to use new api). St.Ack
Re: Hbase 0.20 example\manual
Got it, Thanks. On Wed, Aug 26, 2009 at 9:16 PM, stack st...@duboce.net wrote: On Wed, Aug 26, 2009 at 9:35 AM, Alex Spodinets spodin...@gmail.com wrote: St.Ack, That is a client example. I'm hoping to get Map\Reduce example, have it handy ? Sorry about that. Yeah, what Lars said (I just committed a patch that clears out the old example with deprecated code and instead points you to RowCounter as example of how to use new api). St.Ack
Will ROOT region be a bottleneck?
Hi, The HBaseMaster is responsible for assigning regions to HRegionServers. The first region to be assigned is the ROOT region. The ROOT region is served by a region server, right? Will it be a bottleneck? While many clients request at the same time. Thanks Fleming --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---
Re: Will ROOT region be a bottleneck?
While it seems like ROOT might be a bottleneck, with aggressive client caching it ends up not being an issue. Clients cache the location of ROOT, then the cache the location of META and the locations of the user tables. All is well. -ryan On Wed, Aug 26, 2009 at 5:43 PM, y_823...@tsmc.com wrote: Hi, The HBaseMaster is responsible for assigning regions to HRegionServers. The first region to be assigned is the ROOT region. The ROOT region is served by a region server, right? Will it be a bottleneck? While many clients request at the same time. Thanks Fleming --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---
Re: Seattle / NW Hadoop, HBase Lucene, etc. Meetup , Wed August 26th, 6:45pm
Hello, My apologies, but there was a mix-up reserving our meeting location, and we don't have access to it. I'm very sorry, and beer is on me next month. Promise :) Sent from my Internets On Aug 25, 2009, at 4:21 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hey there, Apologies for this not going out sooner -- apparently it was sitting as a draft in my inbox. A few of you have pinged me, so thanks for your vigilance. It's time for another Hadoop/Lucene/Apache Stack meetup! We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're back at the University of Washington, Allen Computer Science Center (not Computer Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours: we'll have two in-depth talks of 15-20 minutes each, and then several lightning talks of 5 minutes. If no one offers, We'll then have discussion and 'social time'. we'll just have general discussion. Let net know if you're interested in speaking or attending. We'd like to focus on education, so every presentation *needs* to ask some questions at the end. We can talk about these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Settings
HBase: - - fs.default.name = hdfs://master-hostname:9000/ This is usually in core-site.xml in Hadoop. Is the client or server needing this key at all? Did I copy it in the hbase site file by mistake? [schubert] I think it's better not to copy it into HBase conf file. I suggest you modify you hbase-env.sh to add the conf path of hadoop into you HBASE_CLASSPATH, e.g. export HBASE_CLASSPATH=${HBASE_HOME}/../hadoop-0.20.0/conf. Except for that, we also should config GC options here. - hbase.cluster.distributed = true For true replication and stand alone ZK installations. [schubert] also should export HBASE_MANAGES_ZK=false in hbase-env.sh to make consistent. - dfs.datanode.socket.write.timeout = 0 [schubert] This parameper should be for hadoop, HDFS. It should be in hadoop-0.20.0/conf/hdfs-site.xml. But I think it should be not useful now. This is used in DataNode but here more importantly in DFSClient. Its default is fixed to apparently 8 minutes, no default file (I would have assumed hdfs-default.xml) has it listed. We set it to 0 to avoid the socket timing out on low use etc. because the DFSClient reconnect is not handled gracefully. I trust setting it to 0 is what we recommend for HBase and is still valid? - hbase.regionserver.lease.period = 60 Default was changed from 60 to 120 seconds. Over time I had issues and have set it to 10mins. Good or bad? [schubert] I think if you select right jvm GC options, the default 6 is ok. - hbase.hregion.memstore.block.multiplier = 4 This is up from the default 2. Good or bad? [schubert] I do not think it is necessary, do you describe you reason? - hbase.hregion.max.filesize = 536870912 Again twice as much as the default. Opinions? [schubert] If you want bigger region size, I think its fine. We even had tried 1GB in some tests. - hbase.regions.nobalancing.count = 20 This seems to be missing from the hbase-default.xml but is set to 4 in the code if not specified. The above I got from Ryan to improve startup of HBase. It means that while a RS is still opening up to 20 regions it can start rebalance regions. Handled by the ServerManager during message processing. Opinions? [schubert] I think it make sense. - hbase.regions.percheckin = 20 This is the count of regions assigned in one go. Handled in RegionmManager and the default is 10. Here we tell it to assign regions in larger batches to speed up the cluster start. Opinions? [schubert] I have no idea about it. I think the region assignment will occupy some CPU and memory overheads on regionserver, if there are too many HLog to be processed. - hbase.regionserver.handler.count = 30 Up from 10 as I had often the problem that the UI was not responsive while a import MR job would run. All handlers were busy doing the inserts. JD mentioned it may be set to a higher default value? [schubert] It make sense. I my small 5 nodes cluster, I set it 20. Hadoop: -- - dfs.block.size = 134217728 Up from the default 64MB. I have done this in the past as my data size per cell is larger than the usual few bytes. I can have a few KB up to just above 1 MB per value. Still making sense? [schubert] I think you reason make sense. - dfs.namenode.handler.count = 20 This was upped from the default 10 quite some time ago (more than a year ago). So is this still required? [schubert] I also set it 20. - dfs.datanode.socket.write.timeout = 0 This is the matching entry to the above I suppose. This time for the DataNode. Still required? [schubert] I think it is not necessary now. - dfs.datanode.max.xcievers = 4096 Default is 256 and often way to low. What is a good value you would use? What is the drawback setting it high? [schubert] It should make sense. I use 3072 in my small cluster. Thanks, Lars
Re: Settings
On Wed, Aug 26, 2009 at 7:40 AM, Lars George l...@worldlingo.com wrote: Hi, It seems over the years I tried various settings in both Hadoop and HBase and when redoing a cluster it is always a question if we should keep that setting or not - since the issue it suppressed was fixed already. Maybe we should have a wiki page with the current settings and more advanced ones and when and how to use them. I find often that the description itself in the various default files are often as ambiguous as the setting key itself. I'd rather fix the description so its clear rather than add extra info out in a wiki; wiki pages tend to rot. - fs.default.name = hdfs://master-hostname:9000/ This is usually in core-site.xml in Hadoop. Is the client or server needing this key at all? Did I copy it in the hbase site file by mistake? There probably was a reason long ago but, yeah, you shouldn't need this (as Schubert says). - hbase.cluster.distributed = true For true replication and stand alone ZK installations. - dfs.datanode.socket.write.timeout = 0 This is used in DataNode but here more importantly in DFSClient. Its default is fixed to apparently 8 minutes, no default file (I would have assumed hdfs-default.xml) has it listed. We set it to 0 to avoid the socket timing out on low use etc. because the DFSClient reconnect is not handled gracefully. I trust setting it to 0 is what we recommend for HBase and is still valid? For background on this, see http://wiki.apache.org/hadoop/Hbase/Troubleshooting#6. It shouldn't be needed anymore, especially with hadoop-4681 in place but IIRC, apurtell had trouble bringing up a cluster one time when it shouldn't have been needed but the only way to get it up was to set this to zero. We should test. BTW, this is a client-side config. You have it below in hadoop. Shouldn't be needed there, not by hbase at least (maybe you have other clients of hdfs that had this issue?). - hbase.regionserver.lease.period = 60 Default was changed from 60 to 120 seconds. Over time I had issues and have set it to 10mins. Good or bad? There is an issue to check that this is even used any more. Lease is in zk now. I don't think this has an effect any more. - hbase.hregion.memstore.block.multiplier = 4 This is up from the default 2. Good or bad? Means that we'll fill more RAM before we bring down the writes gate, up to 2x the flush size (So if 64MB is default time to flush, we'll keep taking on writes till we get to 2x64MB). 2x is good for the 64M default I'd say -- especially during virulent upload with lots of Stores. - hbase.hregion.max.filesize = 536870912 Again twice as much as the default. Opinions? Means you should have less regions overall for perhaps some small compromise in performance (TBD). I think that in 0.21 we'll likely up the region default size to this or larger. Need to test. Leave it I'd say if performance is OK for you and if you have lots of regions. - hbase.regions.nobalancing.count = 20 This seems to be missing from the hbase-default.xml but is set to 4 in the code if not specified. The above I got from Ryan to improve startup of HBase. It means that while a RS is still opening up to 20 regions it can start rebalance regions. Handled by the ServerManager during message processing. Opinions? If it works for you, keep it. This whole startup and region reassignment is going to be redone in 0.21. These configurations will likely change at that time. - hbase.regions.percheckin = 20 This is the count of regions assigned in one go. Handled in RegionmManager and the default is 10. Here we tell it to assign regions in larger batches to speed up the cluster start. Opinions? See previous note. - hbase.regionserver.handler.count = 30 Up from 10 as I had often the problem that the UI was not responsive while a import MR job would run. All handlers were busy doing the inserts. JD mentioned it may be set to a higher default value? No harm here. Do the math. Is it likely that you'll have 30 clients concurrently trying to get stuff out of a regionserver? If so, keep it I'd say. Hadoop: -- - dfs.block.size = 134217728 Up from the default 64MB. I have done this in the past as my data size per cell is larger than the usual few bytes. I can have a few KB up to just above 1 MB per value. Still making sense? No opinion. Whatever works for you. - dfs.namenode.handler.count = 20 This was upped from the default 10 quite some time ago (more than a year ago). So is this still required? Probably. Check it during a time of high load. Are all in use? - dfs.datanode.socket.write.timeout = 0 This is the matching entry to the above I suppose. This time for the DataNode. Still required? See comment near top. - dfs.datanode.max.xcievers = 4096 Default is 256 and often way to low. What is a good value you would use? What is the drawback setting it