Sorry. It's taken me a while.  Let try and get to this this evening

Thank you for your patience



On Mar 17, 2010, at 2:29 AM, Zheng Lv <lvzheng19800...@gmail.com> wrote:

Hello Stack,
 Did you receive my mail?It looks like you didnt.
   LvZheng

2010/3/16 Zheng Lv <lvzheng19800...@gmail.com>

Hello Stack,
I have uploaded some parts of the logs on master, regionserver208 and
regionserver210 to:
 http://rapidshare.com/files/363988384/master_207_log.txt.html
 http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
 http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
 I noticed that there are some LeaseExpiredException and "2010-03-15
16:06:32,864 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region ..." before 17 oclock. Did these lead to
the error? Why did these happened? How to avoid these?
 Thanks.
   LvZheng
2010/3/16 Stack <st...@duboce.net>

Maybe just the master log would be sufficient from around this time to
figure the story.
St.Ack

On Mon, Mar 15, 2010 at 10:04 PM, Stack <st...@duboce.net> wrote:
Hey Zheng:

On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv <lvzheng19800...@gmail.com >
wrote:
Hello Stack,
After we got these exceptions, we restart the cluster and restarted
the
job that failed, and the job succeeded.
Now when we access
/hbase/summary/1491233486/metrics/5046821377427277894,
we got " Cannot access
/hbase/summary/1491233486/metrics/5046821377427277894: No such file or
directory." .


So, that would seem to indicate that the reference was in memory
only.. that file was not in filesystem. You could have tried closing that region. It would have been interesting also to find history on
that region, to try and figure how it came to hold in memory a
reference to a file since removed.

The messages about this file in namenode logs are in here:
http://rapidshare.com/files/363938595/log.txt.html

This is interesting. Do you have regionserver logs from 209, 208, and
210 for corresponding times?

Thanks,
St.Ack

The job failed startted about at 17 o'clock.
By the way, the hadoop version we are using is 0.20.1, the hbase
version
we are using is 0.20.3.

Regards,
LvZheng
2010/3/16 Stack <st...@duboce.net>

Can you get that file from hdfs?

./bin/hadoop fs -get
/hbase/summary/1491233486/metrics/5046821377427277894

Does it look wholesome?  Is it empty?

What if you trace the life of that file in regionserver logs or
probably better, over in namenode log? If you move this file aside,
the region deploys?

St.Ack

On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv <lvzheng19800...@gmail.com >
wrote:
Hello Everyone,
  Recently we often got these in our client logs:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
to
contact region server 172.16.1.208:60020 for region


summary,SITE_0000000032\x01pt\x0120100314000000\x01\x25E7\x258C \x25AE\x25E5\x258E\x25BF \x25E5\ x2586\ x2580\ x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D \x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD \x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC \x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- \x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE \x2589\x25E5\x2585\x25A8\x25E7\x259A \x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D \x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA \x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B \x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC \x2581,1268640385017,
row


'SITE_0000000032\x01pt\x0120100315000000\x01\x2521\x25EF\x25BC \x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F \x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T \x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF \x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F \x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD \x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA \x25E5\x2599\x25A8\x25EF\x25BC\x258C \x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- \x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE \x2589\x25E5\x2585\x25A8\x25E7\x259A \x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D \x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA \x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B \x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581',
but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: Cannot open filename
/hbase/summary/1491233486/metrics/5046821377427277894
at


org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo (DFSClient.java:1474)
at


org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode (DFSClient.java:1800)
at


org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo (DFSClient.java:1616)
at

org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read (DFSClient.java:1743)
at java.io.DataInputStream.read(DataInputStream.java:132)
at


org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read (BoundedRangeFileInputStream.java:99)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at

org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress (HFile.java:1020)
at

org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java: 971)
at


org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock (HFile.java:1304)
at


org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo (HFile.java:1186)
at


org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo (HalfHFileReader.java:207)
at


org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile (StoreFileGetScan.java:80)
at


org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get (StoreFileGetScan.java:65)
at org.apache.hadoop.hbase.regionserver.Store.get(Store.java: 1461)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385)
at


org.apache.hadoop.hbase.regionserver.HRegionServer.get (HRegionServer.java:1731)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at


sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
at

org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run (HBaseServer.java:915)
Is there any way to fix this problem? Or is there anything we can
do
even manually to relieve it?
  Any suggestion?
  Thank you.
  LvZheng







Reply via email to