On Wed, Nov 2, 2011 at 1:24 PM, Vladimir Rodionov <[email protected]> wrote: > We can confirm that by running our own internal tool.
Whats this tool doing Vladimir? Is it running against HBase API? > It seems, that we loose only during first restart > How are you doing the restart? You killing regionservers? On restart, are we splitting wal logs or is it a 'clean' restart where no wals are split? Can you figure what the missing data is? Is it all from same one or two regions? > Table's TTL = 1 year. There is a slim chance that we load data with > timestamps more than one year behind, but it does not explain the difference > between total number of rows before and after cluster's restart. > I'd think that if your 'internal tool' found the stuff before the restart, then this is probably ok (a year could have elapsed over the restart I suppose but that be odd...) > All RS are time synched. > > In Master log I do not see any WARN or ERRORs during cluster re-start. In RS > logs I see a lot of: > > 2011-11-02 00:16:07,620 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > Roll > /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192949451, > entries=76, filesize=68053806. New hlog > /hbase/.logs/us01-ciqps1-grid01.carrieriq.com,60020,1320187507171/us01-ciqps1-grid01.carrieriq.com%3A60020.1320192967380 > 2011-11-02 00:16:07,621 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > 2011-11-02 00:16:07,621 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: > Could not append. Requesting close of hlog > java.io.IOException: Reflection > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:955) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1483) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1392) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2591) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 10 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3306) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3216) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > ... 14 more > Perhaps a newer CDH would have fix for this? > This is probably not all ERRORS and FATALs I am continuing investigation and > will post my other findings later. > Let us know. St.Ack
