Hi.DEBUG wasn't enabled , because it decrease the performance and increase log size. Regarding the ulimit - yes it's upped for 32K. You remember correct - during massive load i run the balancer and from this time everything is started to behave strange.
Currently , i can't tell you the the regions that are in the table - i re-formatted hdfs ( this was the only way i can get my cluster back to work). I have 7 datatnodes , 6 of them are running region server and one is Hmaster. Best Regards. On Tue, Nov 11, 2008 at 1:08 AM, stack <[EMAIL PROTECTED]> wrote: > I took a look. > > First, enable DEBUG. See the hbase FAQ for how. > > Looking, I see that all was running fine till: > > 2008-11-03 14:10:08,261 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: /10.X.X.Y:60020. Already tried 0 time(s). > > ...in the middle of an attempt at scanning the .META. region. > > Looking through regionserver logs, they are all fine till about that above > time when I start to see variations on: > > 2008-11-03 14:08:46,440 INFO org.apache.hadoop.dfs.DFSClient: Could not > obtain block blk_1223341017118968735_305051 from any node: > java.io.IOException: No live nodes contain current block > > ....and > > 2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.X.X.Y:50010 > 2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Abandoning > block blk_6726606309673852040_314096 > > Your hdfs went bad for some reason around above time. I don't see any > obvious explanation for why it went bad. You were running balancer at the > time IIRC? > > Could you netstat your running datanodes and see how many concurrent > connections you had running? Was 1024 enough? You had configured a max of > 1024? I don't see the ulimit print out in these logs so presume its > 1024. > > How many regions do you have in your table when it starts to go wonky? You > have 6 datanodes running beside your 6 regionservers? > > St.Ack > > > Slava Gorelik wrote: > >> Hi Michael. >> I'm sending logs, in 2 parts (2 messages) >> Part 1 >> >> >> On Tue, Nov 4, 2008 at 11:44 PM, Slava Gorelik <[EMAIL PROTECTED]<mailto: >> [EMAIL PROTECTED]>> wrote: >> >> Thank You. Now it's clear. >> >> >> On Tue, Nov 4, 2008 at 11:31 PM, stack <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> Slava Gorelik wrote: >> >> One more regarding the blockCache, how changes in store >> files (as i >> understand those are MapFiles) are reflected on client >> side cache. If we are >> talking about more than one client that doing a changes ? >> If each client has >> different part of the MapFile ? or something else ? >> >> >> The block cache cache is over in the server. Its a cache for >> store files which never change once written. Did I say >> client-side cache? I should have been more clear. The client >> in this case is the regionserver itself. The cache is so the >> regionserver saves on its trips over the network visiting >> datanodes. >> St.Ack >> >> >> >> Best Regards. >> >> On Tue, Nov 4, 2008 at 11:10 PM, Slava Gorelik >> <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>>wrote: >> >> >> I can try to reproduce it again, but before this i >> would like to send you a >> logs. >> Best Regards. >> >> >> On Tue, Nov 4, 2008 at 10:05 PM, stack >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >> >> >> Then we should try and figure if there is an issue >> in the balancer, or >> maybe there is something missing if we are not >> doing a big upload in a >> manner that balances the upload across HDFS? >> St.Ack >> >> Slava Gorelik wrote: >> >> >> Sure, i'll arrange logs tomorrow.About >> balancer, to wait when the massive >> work is finished is good in testing >> environment but in production it's >> not >> relevant :-) >> >> Best Regards. >> >> On Tue, Nov 4, 2008 at 9:48 PM, stack >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> >> >> wrote: >> >> >> >> >> Slava Gorelik wrote: >> >> >> >> >> Hi.Regarding the failure of new block >> creation - i failed to run hbase >> till >> i reformatted HDFS again. >> >> >> >> >> >> I'd be interested in the logs. >> >> I just wandering if hadoop re balancing >> is necessary? Will it balance >> >> >> >> itself >> ? As i understand hadoop balancer is >> moving data between data nodes, >> but >> in >> my case this is during massive (8 >> clients just adding a records - about >> 400 >> requests for all region servers - 6). >> So, is it good idea to run >> balancer during heavy load ? >> >> >> >> >> >> I don't have sufficient experience running >> the balancer. Perhaps wait >> till >> upload is done, then run it? >> >> St.Ack >> >> >> >> >> >> >> >> >> >> >> >> >> >
