Hello Stack, After we got these exceptions, we restart the cluster and restarted the job that failed, and the job succeeded. Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, we got " Cannot access /hbase/summary/1491233486/metrics/5046821377427277894: No such file or directory." .
The messages about this file in namenode logs are in here: http://rapidshare.com/files/363938595/log.txt.html The job failed startted about at 17 o'clock. By the way, the hadoop version we are using is 0.20.1, the hbase version we are using is 0.20.3. Regards, LvZheng 2010/3/16 Stack <st...@duboce.net> > Can you get that file from hdfs? > > > ./bin/hadoop fs -get > /hbase/summary/1491233486/metrics/5046821377427277894 > > Does it look wholesome? Is it empty? > > What if you trace the life of that file in regionserver logs or > probably better, over in namenode log? If you move this file aside, > the region deploys? > > St.Ack > > On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv <lvzheng19800...@gmail.com> > wrote: > > Hello Everyone, > > Recently we often got these in our client logs: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > > contact region server 172.16.1.208:60020 for region > > > summary,SITE_0000000032\x01pt\x0120100314000000\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017, > > row > > > 'SITE_0000000032\x01pt\x0120100315000000\x01\x2521\x25EF\x25BC\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA\x25E5\x2599\x25A8\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581', > > but failed after 10 attempts. > > Exceptions: > > java.io.IOException: java.io.IOException: Cannot open filename > > /hbase/summary/1491233486/metrics/5046821377427277894 > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616) > > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) > > at java.io.DataInputStream.read(DataInputStream.java:132) > > at > > > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99) > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) > > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020) > > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971) > > at > > > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1304) > > at > > > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1186) > > at > > > org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo(HalfHFileReader.java:207) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile(StoreFileGetScan.java:80) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get(StoreFileGetScan.java:65) > > at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:1461) > > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396) > > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385) > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1731) > > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > Is there any way to fix this problem? Or is there anything we can do > > even manually to relieve it? > > Any suggestion? > > Thank you. > > LvZheng > > >