Hey Zheng: On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv <lvzheng19800...@gmail.com> wrote: > Hello Stack, > After we got these exceptions, we restart the cluster and restarted the > job that failed, and the job succeeded. > Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, > we got " Cannot access > /hbase/summary/1491233486/metrics/5046821377427277894: No such file or > directory." . >
So, that would seem to indicate that the reference was in memory only.. that file was not in filesystem. You could have tried closing that region. It would have been interesting also to find history on that region, to try and figure how it came to hold in memory a reference to a file since removed. > The messages about this file in namenode logs are in here: > http://rapidshare.com/files/363938595/log.txt.html This is interesting. Do you have regionserver logs from 209, 208, and 210 for corresponding times? Thanks, St.Ack > The job failed startted about at 17 o'clock. > By the way, the hadoop version we are using is 0.20.1, the hbase version > we are using is 0.20.3. > > Regards, > LvZheng > 2010/3/16 Stack <st...@duboce.net> > >> Can you get that file from hdfs? >> >> > ./bin/hadoop fs -get >> /hbase/summary/1491233486/metrics/5046821377427277894 >> >> Does it look wholesome? Is it empty? >> >> What if you trace the life of that file in regionserver logs or >> probably better, over in namenode log? If you move this file aside, >> the region deploys? >> >> St.Ack >> >> On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv <lvzheng19800...@gmail.com> >> wrote: >> > Hello Everyone, >> > Recently we often got these in our client logs: >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >> > contact region server 172.16.1.208:60020 for region >> > >> summary,SITE_0000000032\x01pt\x0120100314000000\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017, >> > row >> > >> 'SITE_0000000032\x01pt\x0120100315000000\x01\x2521\x25EF\x25BC\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA\x25E5\x2599\x25A8\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581', >> > but failed after 10 attempts. >> > Exceptions: >> > java.io.IOException: java.io.IOException: Cannot open filename >> > /hbase/summary/1491233486/metrics/5046821377427277894 >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616) >> > at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) >> > at java.io.DataInputStream.read(DataInputStream.java:132) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99) >> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) >> > at >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020) >> > at >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1304) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1186) >> > at >> > >> org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo(HalfHFileReader.java:207) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile(StoreFileGetScan.java:80) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get(StoreFileGetScan.java:65) >> > at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:1461) >> > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396) >> > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1731) >> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >> > at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> > Is there any way to fix this problem? Or is there anything we can do >> > even manually to relieve it? >> > Any suggestion? >> > Thank you. >> > LvZheng >> > >> >