Sorry. It's taken me a while. Let try and get to this this evening
Thank you for your patience
On Mar 17, 2010, at 2:29 AM, Zheng Lv <lvzheng19800...@gmail.com> wrote:
Hello Stack,
Did you receive my mail?It looks like you didnt.
LvZheng
2010/3/16 Zheng Lv <lvzheng19800...@gmail.com>
Hello Stack,
I have uploaded some parts of the logs on master, regionserver208
and
regionserver210 to:
http://rapidshare.com/files/363988384/master_207_log.txt.html
http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
I noticed that there are some LeaseExpiredException and "2010-03-15
16:06:32,864 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region ..." before 17 oclock. Did these
lead to
the error? Why did these happened? How to avoid these?
Thanks.
LvZheng
2010/3/16 Stack <st...@duboce.net>
Maybe just the master log would be sufficient from around this
time to
figure the story.
St.Ack
On Mon, Mar 15, 2010 at 10:04 PM, Stack <st...@duboce.net> wrote:
Hey Zheng:
On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv <lvzheng19800...@gmail.com
>
wrote:
Hello Stack,
After we got these exceptions, we restart the cluster and
restarted
the
job that failed, and the job succeeded.
Now when we access
/hbase/summary/1491233486/metrics/5046821377427277894,
we got " Cannot access
/hbase/summary/1491233486/metrics/5046821377427277894: No such
file or
directory." .
So, that would seem to indicate that the reference was in memory
only.. that file was not in filesystem. You could have tried
closing
that region. It would have been interesting also to find
history on
that region, to try and figure how it came to hold in memory a
reference to a file since removed.
The messages about this file in namenode logs are in here:
http://rapidshare.com/files/363938595/log.txt.html
This is interesting. Do you have regionserver logs from 209,
208, and
210 for corresponding times?
Thanks,
St.Ack
The job failed startted about at 17 o'clock.
By the way, the hadoop version we are using is 0.20.1, the hbase
version
we are using is 0.20.3.
Regards,
LvZheng
2010/3/16 Stack <st...@duboce.net>
Can you get that file from hdfs?
./bin/hadoop fs -get
/hbase/summary/1491233486/metrics/5046821377427277894
Does it look wholesome? Is it empty?
What if you trace the life of that file in regionserver logs or
probably better, over in namenode log? If you move this file
aside,
the region deploys?
St.Ack
On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv <lvzheng19800...@gmail.com
>
wrote:
Hello Everyone,
Recently we often got these in our client logs:
org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying
to
contact region server 172.16.1.208:60020 for region
summary,SITE_0000000032\x01pt\x0120100314000000\x01\x25E7\x258C
\x25AE\x25E5\x258E\x25BF
\x25E5\
x2586\
x2580\
x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D
\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD
\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC
\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--
\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE
\x2589\x25E5\x2585\x25A8\x25E7\x259A
\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D
\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA
\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B
\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC
\x2581,1268640385017,
row
'SITE_0000000032\x01pt\x0120100315000000\x01\x2521\x25EF\x25BC
\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F
\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T
\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF
\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F
\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD
\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA
\x25E5\x2599\x25A8\x25EF\x25BC\x258C
\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--
\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE
\x2589\x25E5\x2585\x25A8\x25E7\x259A
\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D
\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA
\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B
\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581',
but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: Cannot open filename
/hbase/summary/1491233486/metrics/5046821377427277894
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo
(DFSClient.java:1474)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode
(DFSClient.java:1800)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo
(DFSClient.java:1616)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read
(DFSClient.java:1743)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read
(BoundedRangeFileInputStream.java:99)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress
(HFile.java:1020)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:
971)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock
(HFile.java:1304)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo
(HFile.java:1186)
at
org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo
(HalfHFileReader.java:207)
at
org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile
(StoreFileGetScan.java:80)
at
org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get
(StoreFileGetScan.java:65)
at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:
1461)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get
(HRegionServer.java:1731)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run
(HBaseServer.java:915)
Is there any way to fix this problem? Or is there anything
we can
do
even manually to relieve it?
Any suggestion?
Thank you.
LvZheng