Re: HBase 0.90.4 missing data in production

Stack Wed, 02 Nov 2011 14:56:47 -0700

On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov
<[email protected]> wrote:
> We can confirm that by running  our own internal tool.


Whats this tool doing Vladimir?  Is it running against HBase API?

> It seems, that we loose only during first restart
>

How are you doing the restart?  You killing regionservers?  On
restart, are we splitting wal logs or is it a 'clean' restart where no
wals are split?

Can you figure what the missing data is?  Is it all from same one or
two regions?


> Table's TTL = 1 year. There is a slim chance that we load data with 
> timestamps more than one year behind, but it does not explain the difference
> between total number of rows before and after cluster's restart.
>

I'd think that if your 'internal tool' found the stuff before the
restart, then this is probably ok (a year could have elapsed over the
restart I suppose but that be odd...)


> All RS are time synched.
>
> In Master log I do not see any WARN or ERRORs during cluster re-start. In RS 
> logs I see a lot of:
>
> 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Roll 
> /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451,
>  entries=76, filesize=68053806. New hlog 
> /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380
> 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got 
> brand-new compressor
> 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Could not append. Requesting close of hlog
> java.io.IOException: Reflection
>        at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
>        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002)
>        at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483)
>        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591)
>        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
>        ... 10 more
> Caused by: java.lang.NullPointerException
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216)
>        at 
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>        at 
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
>        ... 14 more
>


Perhaps a newer CDH would have fix for this?


> This is probably not all ERRORS and FATALs I am continuing investigation and 
> will post my other findings later.
>

Let us know.

St.Ack

Re: HBase 0.90.4 missing data in production

Reply via email to