Bonjour Guillaume, Your issue #2 looks like two separate issues:
2a) Memcache flusher gating. This is better in 0.20.0. I encourage you to upgrade for this and any number of other reasons. 2b) HDFS-127. See https://issues.apache.org/jira/browse/HDFS-127. Upgrade to HBase 0.20.0 or patch the Hadoop 0.19.1 jar with a fix for this issue and deploy into hbase/lib/. Your issue #3 has also been fixed in release 0.20.0. The client will retain the edits which were not committed in the write buffer. I encourage you to upgrade. This will also have an impact on your issue #1. Essentially the entire I/O subsystem of the region server was rewritten, so 0.20.0 has a completely different performance profile than 0.19. We can revisit your issue #1 under the circumstances of 0.20.0 if you still have problems or concerns. Best regards, - Andy ________________________________ From: "guillaume.vil...@orange-ftgroup.com" <guillaume.vil...@orange-ftgroup.com> To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org> Sent: Wednesday, September 16, 2009 8:35:26 AM Subject: Issues/Problems concerning hbase data insertion Hi all, Being in the process of evaluating hbase for managing "bigtable" (to give an idea ~ 1G entries of 500 bytes). We are now facing some issues and i would like to have comments concerning what i have noticed. Our configuration is hadoop 0.19.1 and hbase 0.19.3, both hadoop-default/site.xml and hbase-default/site.xml are attached, 15 nodes (16 or 8 Go RAM and 1,3To disk, linux kernel 2.6.24-standard, java version "1.6.0_12"). For now the test case is on one IndexedTable (without at the moment using the index column) with 25 M of entries/rows: Map is formatting the data and 15 reduces are BatchUpdating the textual data (like url and simple text fields < 500 bytes) All processes (hadoop/hbase) are started with -Xmx1000m and IndexedTable is configured with AutoCommit to false. ISSUE 1, We need one column index to have "fast" UI query (for instance as an answer to Web form we could expect waiting at max 30sec). The only documentation I found concerning indexed column comes from http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html Instead of using the indextable properties in hbase-site.xml (that I have tested but that gives very poor performance and also lost entries...) I pass the properties to the job through a -conf indextable_properties.xml (file is in attachement). I suppose that putting the indextable properties into the hbase-site.xml apply to the whole hbase cluster making the whole performance significantly decreasing ? The best perf were reached passing through the -conf option of the Tool.run method. ISSUE2, we are facing serious regionserver problems often leading to regionserver shutdown like: 2009-09-16 10:21:15,887 INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region urlsdata-validation,forum.telecharger.01net.com/index.php?page=01net_voter&forum=microhebdo&category=5&topic=344142&post=5653085,1253089082422: 23, waiting or 2009-09-14 16:39:24,611 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region urlsdata-validation,www.abovetopsecret.com/forum/thread119/pg1&title=Underground+Communities,1252939031807: Memcache size 128.0m is >= than blocking 128.0m size 2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-873614322830930554_111500 2009-09-14 16:39:31,180 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-873614322830930554_111500 bad datanode[0] nodes == null 2009-09-14 16:39:31,181 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase/urlsdata-validation/1733902030/info/mapfiles/2690714750206504745/data" - Aborting... 2009-09-14 16:39:31,241 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown I've read some hbase/jira issues (hbase-1415, hbase-1058, hbase-1084...) concerning similar problems, but i cannot get a clear idea of what kind of fix is proposed ? ISSUE3, Theses problems are causing table.commit() IOException losing all the entries: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 192.168.255.8:60020 for region urlsdata-validation,twitter.com/statuses/434272962,1253089707924, row 'www.harmonicasurcher.com', but failed after 10 attempts. Exceptions: java.io.IOException: Call to /192.168.255.8:60020 failed on local exception: java.io.EOFException java.net.ConnectException: Call to /192.168.255.8:60020 failed on connection exception: java.net.ConnectException: Connection refused Is there a way to get back the uncommitted entries (there are many of them because we are in AutoCommit false) to resubmit them later ? To give an idea, we sometime lost about 170 000 entries out of 25M entries due to this commit exception. Guillaume Viland (guillaume.vil...@orange-ftgroup.com) FT/TGPF/OPF/PORTAIL/DOP Sophia Antipolis ********************************* This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. Messages are susceptible to alteration. France Telecom Group shall not be liable for the message if altered, changed or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. ********************************