Re: Issues/Problems concerning hbase data insertion

Andrew Purtell Wed, 16 Sep 2009 09:51:18 -0700

Bonjour Guillaume,

Your issue #2 looks like two separate issues:


   2a) Memcache flusher gating. This is better in 0.20.0. I encourage you to 
upgrade for this and any number of other reasons.


   2b) HDFS-127. See https://issues.apache.org/jira/browse/HDFS-127. Upgrade to 
HBase 0.20.0 or patch the Hadoop 0.19.1 jar with a fix for this issue and 
deploy into hbase/lib/.

Your issue #3 has also been fixed in release 0.20.0. The client will retain the 
edits which were not committed in the write buffer.

I encourage you to upgrade. This will also have an impact on your issue #1. 
Essentially the entire I/O subsystem of the region server was rewritten, so 
0.20.0 has a completely different performance profile than 0.19. We can revisit 
your issue #1 under the circumstances of 0.20.0 if you still have problems or 
concerns. 

Best regards,

   - Andy



________________________________
From: "guillaume.vil...@orange-ftgroup.com" 
<guillaume.vil...@orange-ftgroup.com>
To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
Sent: Wednesday, September 16, 2009 8:35:26 AM
Subject: Issues/Problems concerning hbase data insertion

Hi all,
Being in the process of evaluating hbase for managing "bigtable" (to give an 
idea ~ 1G entries of 500 bytes). We are now facing some issues and i would like 
to have comments concerning what i have noticed.
Our configuration is hadoop 0.19.1 and hbase 0.19.3, both 
hadoop-default/site.xml and hbase-default/site.xml are attached, 15 nodes (16 
or 8 Go RAM and 1,3To disk, linux kernel 2.6.24-standard, java version 
"1.6.0_12").
For now the test case is on one IndexedTable (without at the moment using the 
index column) with 25 M of entries/rows:
Map is formatting the data and 15 reduces are BatchUpdating the textual data 
(like url and simple text fields < 500 bytes)
All processes (hadoop/hbase) are started with -Xmx1000m and IndexedTable is 
configured with AutoCommit to false.

ISSUE 1, We need one column index to have "fast" UI query (for instance as an 
answer to Web form we could expect waiting at max 30sec). The only 
documentation I found concerning indexed column comes from 
http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html
Instead of using the indextable properties in hbase-site.xml (that I have 
tested but that gives very poor performance and also lost entries...) I pass 
the properties to the job through a -conf indextable_properties.xml (file is in 
attachement). I suppose that putting the indextable properties into the 
hbase-site.xml apply to the whole hbase cluster making the whole performance 
significantly decreasing ?
The best perf were reached passing through the -conf option of the Tool.run 
method.

ISSUE2, we are facing serious regionserver problems often leading to 
regionserver shutdown like:
2009-09-16 10:21:15,887 INFO 
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for 
region 
urlsdata-validation,forum.telecharger.01net.com/index.php?page=01net_voter&forum=microhebdo&category=5&topic=344142&post=5653085,1253089082422:
 23, waiting

or

2009-09-14 16:39:24,611 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 1 on 60020' on region 
urlsdata-validation,www.abovetopsecret.com/forum/thread119/pg1&title=Underground+Communities,1252939031807:
 Memcache size 128.0m is >= than blocking 128.0m size
2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream java.io.IOException: Could not read from stream
2009-09-14 16:39:24,942 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_-873614322830930554_111500
2009-09-14 16:39:31,180 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block blk_-873614322830930554_111500 bad datanode[0] nodes == null
2009-09-14 16:39:31,181 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
block locations. Source file 
"/hbase/urlsdata-validation/1733902030/info/mapfiles/2690714750206504745/data" 
- Aborting...
2009-09-14 16:39:31,241 FATAL 
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. 
Forcing server shutdown

I've read some hbase/jira issues (hbase-1415, hbase-1058, hbase-1084...) 
concerning similar problems,
but i cannot get a clear idea of what kind of fix is proposed ?


ISSUE3, Theses problems are causing table.commit() IOException losing all the 
entries:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server 192.168.255.8:60020 for region 
urlsdata-validation,twitter.com/statuses/434272962,1253089707924, row 
'www.harmonicasurcher.com', but failed after 10 attempts.
Exceptions:
java.io.IOException: Call to /192.168.255.8:60020 failed on local exception: 
java.io.EOFException
java.net.ConnectException: Call to /192.168.255.8:60020 failed on connection 
exception: java.net.ConnectException: Connection refused 

Is there a way to get back the uncommitted entries (there are many of them 
because we are in AutoCommit false)
to resubmit them later ?
To give an idea, we sometime lost about 170 000 entries out of 25M entries due 
to this commit exception.


Guillaume Viland (guillaume.vil...@orange-ftgroup.com)
FT/TGPF/OPF/PORTAIL/DOP Sophia Antipolis



*********************************
This message and any attachments (the "message") are confidential and intended 
solely for the addressees. 
Any unauthorised use or dissemination is prohibited.
Messages are susceptible to alteration. 
France Telecom Group shall not be liable for the message if altered, changed or 
falsified.
If you are not the intended addressee of this message, please cancel it 
immediately and inform the sender.
********************************

Re: Issues/Problems concerning hbase data insertion

Reply via email to