Hi we are using 7nodes HBase configuration + 1 master. Each node/master have 8GB of memory with 4core cpu. One master is used for hadoop and also as hbase master. Also, 7 nodes are shared for hadoop and hbase. In configuration files we set 2GB of memory for hbase and additional 2GB for hadoop. HDFS has 1.6TB of free space.
Now we are trying to import 50 millions rows of data. Each row have 100 columns (in reality we will have sparsely populated table, but now we are testing worst-case scenario). We are having 50 million records encoded in about 100 CSV files stored in HDFS. Importing process is really simple one: small map reduce program will read CSV file, split lines and insert it into table (only Map, no Reduce parts). We are using default hadoop configuration (on 7 nodes we can run 14 maps). Also we are using 32MB for writeBufferSize on HBase and also we set setWriteToWAL to false. At the beginning everything looks fine, but after ~33 millions of records we are encounter strange behavior of HBase. Firstly one of nodes where META table resides have high load. Status web page shows ~1700 requests on that node even if we are not running any MapReduce (0 request on other nodes). Also, i do not see any activity in log files on that node. Here is the last couple of lines from log on that node: 2010-01-18 14:46:26,666 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: profiles2,1a2e1b7a-a43e-4e4f-9f84-40b4662cc4e0,1263825424277 2010-01-18 14:46:26,667 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: profiles2,1a2e1b7a-a43e-4e4f-9f84-40b4662cc4e0,1263825424277 2010-01-18 14:46:27,441 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed profiles2,1a2e1b7a-a43e-4e4f-9f84-40b4662cc4e0,1263825424277 2010-01-18 14:47:38,773 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,0d9deec6-b6df-43a3-ab94-685dade5af61,1263825533141 in 2mins, 18sec 2010-01-18 14:47:38,773 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,0dbd64eb-3e59-4b35-af4a-92a83a1e1858,1263825533141 2010-01-18 14:49:01,881 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,0dbd64eb-3e59-4b35-af4a-92a83a1e1858,1263825533141 in 1mins, 23sec 2010-01-18 14:49:01,883 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,3f726ebf-2ec8-43a0-bd50-d40bec1776d4,1263825595669 2010-01-18 14:49:52,186 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,3f726ebf-2ec8-43a0-bd50-d40bec1776d4,1263825595669 in 50sec 2010-01-18 14:49:52,186 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,3f5303ee-4729-4ab9-bfd6-3c319d429c4f,1263825595669 2010-01-18 14:50:57,328 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,3f5303ee-4729-4ab9-bfd6-3c319d429c4f,1263825595669 in 1mins, 5sec 2010-01-18 14:50:57,328 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,8f50a54d-e8d5-4dec-84a4-05a468fbf8e1,1263825624515 2010-01-18 14:51:24,508 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,8f50a54d-e8d5-4dec-84a4-05a468fbf8e1,1263825624515 in 27sec 2010-01-18 14:51:24,508 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,8f309cdb-eb70-49e0-90d4-d2510e38ae51,1263825624515 2010-01-18 14:52:19,736 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,8f309cdb-eb70-49e0-90d4-d2510e38ae51,1263825624515 in 55sec 2010-01-18 14:52:19,736 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,84bd729a-c64b-4d75-8189-e828dbf06797,1263825639973 2010-01-18 14:53:44,053 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,84bd729a-c64b-4d75-8189-e828dbf06797,1263825639973 in 1mins, 24sec 2010-01-18 14:53:44,053 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,84dcfe35-e488-4eec-99d8-83be178f1b22,1263825639973 2010-01-18 14:55:09,999 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,84dcfe35-e488-4eec-99d8-83be178f1b22,1263825639973 in 1mins, 25sec 2010-01-18 14:55:09,999 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,a6252b0c-b2b1-4bd2-acf4-522065a2a3be,1263825653683 2010-01-18 14:56:22,364 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,a6252b0c-b2b1-4bd2-acf4-522065a2a3be,1263825653683 in 1mins, 12sec 2010-01-18 14:56:22,364 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region profiles,a644b61a-f2c0-4855-ad99-1e6ab2d82e61,1263825653683 2010-01-18 14:57:41,518 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profiles,a644b61a-f2c0-4855-ad99-1e6ab2d82e61,1263825653683 in 1mins, 19sec second manifestation is that i can create new empty table and start importing data normaly, but if i try to import more data into same table (now having ~33 millions) i'm having really bad performance and hbase status page does not work at all (will not load into browser). Currently ~33 millions of records uses 800GB of disk and i'm having 1.1TB free HDFS storage. So my questions is: what i'm doing wrong? Is current cluster good enough to support 50millions records or my current 33 millions is limit on current configuration? Any hints. Also, I'm getting about 800 inserts per second, is this slow? Any hint is appreciated. Best Zaharije
