Hello Stack,
  Does it take you too long time to get logs? May I upload them to any other
sites you are easy to download?
    LvZheng

2010/3/19 Zheng Lv <lvzheng19800...@gmail.com>

> Hello Stack,
>   I must say thank you, for your patience too.
>   I'm sorry for that you had tried for many times but the logs you got were
> not that usful. Now I have turn the logging to debug level, so if we get
> these exceptions again, I will send you debug logs. Anyway, I still upload
> the logs you want to rapidshare, although they are not in debug level. The
> urls:
>
> http://rapidshare.com/files/365292889/hadoop-root-namenode-cactus207.log.2010-03-15.html
>
> http://rapidshare.com/files/365293127/hbase-root-master-cactus207.log.2010-03-15.html
>
> http://rapidshare.com/files/365293238/hbase-root-regionserver-cactus208.log.2010-03-15.html
>
> http://rapidshare.com/files/365293391/hbase-root-regionserver-cactus209.log.2010-03-15.html
>
> http://rapidshare.com/files/365293488/hbase-root-regionserver-cactus210.log.2010-03-15.html
>
>   >For sure you've upped xceivers on your hdfs cluster and you've upped
> >the file descriptors as per the 'Getting Started'? (Sorry, have to
> >ask).
>   Before I got your mail, we didn't set the properties you mentioned,
> because we didn't got the "too many open files" or something which are
> mentioned in "getting start" docs. But now I have upped these properties.
> We'll see what will happen.
>
>   If you need more infomations, just tell me.
>
>   Thanks again,
>   LvZheng.
>
>
> 2010/3/19 Stack <st...@duboce.net>
>
> Yeah, I had to retry a couple of times ("Too busy; try back later --
>> or sign up premium service!").
>>
>> It would have been nice to have wider log snippets.  I'd like to have
>> seen if the issue was double assignment.  The master log snippet only
>> shows the split.  Regionserver 209's log is the one where the
>> interesting stuff is going on around this time, 2010-03-15
>> 16:06:51,150, but its not in the provided set.  Neither are you
>> running at DEBUG level so it'd be harder to see what is up even if you
>> provided it.
>>
>> Looking in 208, I see a few exceptions beyond the one you paste below.
>>  For sure you've upped xceivers on your hdfs cluster and you've upped
>> the file descriptors as per the 'Getting Started'? (Sorry, have to
>> ask).
>>
>> Can I have more of the logs?  Can I have all of the namenode log, all
>> of the master log and 209's log?  This rapidshare thing is fine with
>> me.  I don't mind retrying.
>>
>> Sorry it took me a while to get to this.
>> St.Ack
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Mar 17, 2010 at 8:32 PM, Zheng Lv <lvzheng19800...@gmail.com>
>> wrote:
>> > Hello Stack,
>> >    >Sorry. It's taken me a while.  Let try and get to this this evening
>> >    Is it downloading the log files what take you a while? I m sorry, I
>> used
>> > to upload files to skydrive, but now we cant access the website. Is
>> there
>> > any netdisk or something you can download fast? I can upload to it.
>> >    LvZheng
>> > 2010/3/18 Stack <saint....@gmail.com>
>> >
>> >> Sorry. It's taken me a while.  Let try and get to this this evening
>> >>
>> >> Thank you for your patience
>> >>
>> >>
>> >>
>> >>
>> >> On Mar 17, 2010, at 2:29 AM, Zheng Lv <lvzheng19800...@gmail.com>
>> wrote:
>> >>
>> >> Hello Stack,
>> >>>  Did you receive my mail?It looks like you didnt.
>> >>>   LvZheng
>> >>>
>> >>> 2010/3/16 Zheng Lv <lvzheng19800...@gmail.com>
>> >>>
>> >>> Hello Stack,
>> >>>>  I have uploaded some parts of the logs on master, regionserver208
>> and
>> >>>> regionserver210 to:
>> >>>>  http://rapidshare.com/files/363988384/master_207_log.txt.html
>> >>>>  http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
>> >>>>  http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
>> >>>>  I noticed that there are some LeaseExpiredException and "2010-03-15
>> >>>> 16:06:32,864 ERROR
>> >>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>> >>>> Compaction/Split failed for region ..." before 17 oclock. Did these
>> lead
>> >>>> to
>> >>>> the error? Why did these happened? How to avoid these?
>> >>>>  Thanks.
>> >>>>   LvZheng
>> >>>> 2010/3/16 Stack <st...@duboce.net>
>> >>>>
>> >>>> Maybe just the master log would be sufficient from around this time
>> to
>> >>>>> figure the story.
>> >>>>> St.Ack
>> >>>>>
>> >>>>> On Mon, Mar 15, 2010 at 10:04 PM, Stack <st...@duboce.net> wrote:
>> >>>>>
>> >>>>>> Hey Zheng:
>> >>>>>>
>> >>>>>> On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv <
>> lvzheng19800...@gmail.com>
>> >>>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hello Stack,
>> >>>>>>> After we got these exceptions, we restart the cluster and
>> restarted
>> >>>>>>>
>> >>>>>> the
>> >>>>>
>> >>>>>> job that failed, and the job succeeded.
>> >>>>>>> Now when we access
>> >>>>>>>
>> >>>>>> /hbase/summary/1491233486/metrics/5046821377427277894,
>> >>>>>
>> >>>>>> we got " Cannot access
>> >>>>>>> /hbase/summary/1491233486/metrics/5046821377427277894: No such
>> file or
>> >>>>>>> directory." .
>> >>>>>>>
>> >>>>>>>
>> >>>>>> So, that would seem to indicate that the reference was in memory
>> >>>>>> only.. that file was not in filesystem.  You could have tried
>> closing
>> >>>>>> that region.   It would have been interesting also to find history
>> on
>> >>>>>> that region, to try and figure how it came to hold in memory a
>> >>>>>> reference to a file since removed.
>> >>>>>>
>> >>>>>> The messages about this file in namenode logs are in here:
>> >>>>>>> http://rapidshare.com/files/363938595/log.txt.html
>> >>>>>>>
>> >>>>>>
>> >>>>>> This is interesting.  Do you have regionserver logs from 209, 208,
>> and
>> >>>>>> 210 for corresponding times?
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> St.Ack
>> >>>>>>
>> >>>>>> The job failed startted about at 17 o'clock.
>> >>>>>>> By the way, the hadoop version we are using is 0.20.1, the hbase
>> >>>>>>>
>> >>>>>> version
>> >>>>>
>> >>>>>> we are using is 0.20.3.
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> LvZheng
>> >>>>>>> 2010/3/16 Stack <st...@duboce.net>
>> >>>>>>>
>> >>>>>>> Can you get that file from hdfs?
>> >>>>>>>>
>> >>>>>>>> ./bin/hadoop fs -get
>> >>>>>>>>>
>> >>>>>>>> /hbase/summary/1491233486/metrics/5046821377427277894
>> >>>>>>>>
>> >>>>>>>> Does it look wholesome?  Is it empty?
>> >>>>>>>>
>> >>>>>>>> What if you trace the life of that file in regionserver logs or
>> >>>>>>>> probably better, over in namenode log?  If you move this file
>> aside,
>> >>>>>>>> the region deploys?
>> >>>>>>>>
>> >>>>>>>> St.Ack
>> >>>>>>>>
>> >>>>>>>> On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv <
>> lvzheng19800...@gmail.com
>> >>>>>>>> >
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hello Everyone,
>> >>>>>>>>>  Recently we often got these in our client logs:
>> >>>>>>>>>  org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying
>> >>>>>>>>>
>> >>>>>>>> to
>> >>>>>
>> >>>>>>  contact region server 172.16.1.208:60020 for region
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> summary,SITE_0000000032\x01pt\x0120100314000000\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017,
>> >>>>>
>> >>>>>>  row
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> 'SITE_0000000032\x01pt\x0120100315000000\x01\x2521\x25EF\x25BC\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA\x25E5\x2599\x25A8\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581',
>> >>>>>
>> >>>>>>  but failed after 10 attempts.
>> >>>>>>>>> Exceptions:
>> >>>>>>>>> java.io.IOException: java.io.IOException: Cannot open filename
>> >>>>>>>>> /hbase/summary/1491233486/metrics/5046821377427277894
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
>> >>>>>
>> >>>>>>  at java.io.DataInputStream.read(DataInputStream.java:132)
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99)
>> >>>>>
>> >>>>>>  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1304)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1186)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo(HalfHFileReader.java:207)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile(StoreFileGetScan.java:80)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get(StoreFileGetScan.java:65)
>> >>>>>
>> >>>>>>  at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:1461)
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1731)
>> >>>>>
>> >>>>>>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>>>
>> >>>>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>>>>>>> at
>> >>>>>>>>>
>> >>>>>>>>
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>> >>>>>
>> >>>>>>  at
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>> >>>>>
>> >>>>>>   Is there any way to fix this problem? Or is there anything we can
>> >>>>>>>>>
>> >>>>>>>> do
>> >>>>>
>> >>>>>>  even manually to relieve it?
>> >>>>>>>>>  Any suggestion?
>> >>>>>>>>>  Thank you.
>> >>>>>>>>>  LvZheng
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >
>>
>
>

Reply via email to