On Wed, Sep 16, 2009 at 8:35 AM, <[email protected]>wrote: ...
> Our configuration is hadoop 0.19.1 and hbase 0.19.3, both > hadoop-default/site.xml and hbase-default/site.xml are attached, 15 nodes > (16 or 8 Go RAM and 1,3To disk, linux kernel 2.6.24-standard, java version > "1.6.0_12"). > As per Jon, please use hbase 0.20.x evaluating us. There has been an ocean of improvement since 0.19.x: http://su.pr/2vXp1v .... > > ISSUE 1, We need one column index to have "fast" UI query (for instance as > an answer to Web form we could expect waiting at max 30sec). The only > documentation I found concerning indexed column comes from > http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html > (Thats a nice little article.) > Instead of using the indextable properties in hbase-site.xml (that I have > tested but that gives very poor performance and also lost entries...) I pass > the properties to the job through a -conf indextable_properties.xml (file is > in attachement). I suppose that putting the indextable properties into the > hbase-site.xml apply to the whole hbase cluster making the whole performance > significantly decreasing ? > The best perf were reached passing through the -conf option of the Tool.run > method. > If I were to guess, using -conf, you are not using indexedatable over your cluster? > > ISSUE2, we are facing serious regionserver problems often leading to > regionserver shutdown like: > 2009-09-16 10:21:15,887 INFO > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files > for region urlsdata-validation, > forum.telecharger.01net.com/index.php?page=01net_voter&forum=microhebdo&category=5&topic=344142&post=5653085,1253089082422: > 23, waiting > > This is not a 'bug'; its just the memcache flusher holding up flushing a while till compactions run (though yes, as Jon says, its usually an indication that compactions are overrunning the system). > or > > 2009-09-14 16:39:24,611 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Blocking updates for 'IPC Server handler 1 on 60020' on region > urlsdata-validation, > www.abovetopsecret.com/forum/thread119/pg1&title=Underground+Communities,1252939031807: > Memcache size 128.0m is >= than blocking 128.0m size > This is not a bug either. Its just HBase putting up a temporary block of writes till it catches its breath (flushes and compacts). You are running with 1G of heap so will see more of these than you would if you gave hbase more RAM. > 2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Could not read from stream > 2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-873614322830930554_111500 > 2009-09-14 16:39:31,180 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_-873614322830930554_111500 bad datanode[0] nodes == > null > 2009-09-14 16:39:31,181 WARN org.apache.hadoop.hdfs.DFSClient: Could not > get block locations. Source file > "/hbase/urlsdata-validation/1733902030/info/mapfiles/2690714750206504745/data" > - Aborting... > 2009-09-14 16:39:31,241 FATAL > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog > required. Forcing server shutdown > > This is a problem. Its probably the absence of hdfs-127 on your hadoop cluster (See the hbase 0.20.0 'Getting Started' notes). > ISSUE3, Theses problems are causing table.commit() IOException losing all > the entries: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server 192.168.255.8:60020 for region urlsdata-validation, > twitter.com/statuses/434272962,1253089707924, row ' > www.harmonicasurcher.com', but failed after 10 attempts. > Exceptions: > java.io.IOException: Call to /192.168.255.8:60020 failed on local > exception: java.io.EOFException > java.net.ConnectException: Call to /192.168.255.8:60020 failed on > connection exception: java.net.ConnectException: Connection refused > > Is there a way to get back the uncommitted entries (there are many of them > because we are in AutoCommit false) > to resubmit them later ? > In 0.20.0, on exception, the list you passed is modified so it contains only the uncommitted writes (See HTable.put(Put [] put) in 0.20.0 API). > To give an idea, we sometime lost about 170 000 entries out of 25M entries > due to this commit exception. > A gentleman on IRC has been reliably been putting close to 1B entries into hbase. There is an issue where we lose one entry because of a race between a close and a read on a StoreFile that we are trying to track down but should fix before we ship 0.20.1 in the next week or so. Thanks for writing the list (and trying hbase). St.Ack > > > Guillaume Viland ([email protected]) > FT/TGPF/OPF/PORTAIL/DOP Sophia Antipolis > > > > ********************************* > This message and any attachments (the "message") are confidential and > intended solely for the addressees. > Any unauthorised use or dissemination is prohibited. > Messages are susceptible to alteration. > France Telecom Group shall not be liable for the message if altered, > changed or falsified. > If you are not the intended addressee of this message, please cancel it > immediately and inform the sender. > ******************************** > >
