Dear All, I am a newbie in HBase/Hadoop and recently have a small-scale setup in a research cloud: ------------------------------------------ 1 Master Server (Also Hadoop Name Node) 3 Region Server (Also Hadoop Data Node) 1 Ganglia Monitoring Server 1 YCSB Workload Generation Server ------------------------------------------ HBase Version: 0.94.7, r1471806 Hadoop Version: 1.0.4, r1393290 Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8 YCSB Version: 0.1.4 ------------------------------------------
I have only one table in HBase - 'usertable' with a single column family 'cf1' holding 1,000,000 key-value records. The row keys are in monotonically increasing order and currently I have 6 regions distributed in the 3 region servers each holding 2 of the regions. * * *Objective:* create region hotspots for some research experiments *Observation:* After running a workload consist of a total 10,000,000 operations (50% read, 50% write) I've observed the below statistics in the Web UI of the master server which can suggest potential hotspots in the 3rd (not sure why !!) and 6th regions (possibly it was receiving large number of write requests). Table Regions NameRegion ServerStart Key End KeyRequests usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030 user2035146605813492656 127946 usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7. hdb1-03:60030user2035146605813492656user30679275375621809 126700 usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7. hdb1-04:60030user30679275375621809user5136356049533495298 *284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39. hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964. hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306. hdb1-03:60030 user7218407885253116621*363234* *Questions:* 1. Can the HBase developer community guide me on how to collect the *raw logs* (directly from the master/region servers) for the above table which I've retrieved from the Master server? 2. And how the master server is getting these logs from the region servers? As far I've understand from the architecture the client will directly communicate with the region servers to read/write the data bypassing the master server (unless the first time or if the region server is not responding) 3. How frequently the master collects these logs? Is it real-time (within 1 sec interval !!)? 4. Which HBase metrics will be most helpful to notice region hotspots from Ganglia? I want to know which transaction request (read/write) going to which region servers from the raw log dumps as like No:12345 ---- Type:Write ---- Query ---- Region06 and so on ... Many thanks again... Regards, Joarder Kamal