I took a look.
First, enable DEBUG. See the hbase FAQ for how.
Looking, I see that all was running fine till:
2008-11-03 14:10:08,261 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.X.X.Y:60020. Already tried 0 time(s).
...in the middle of an attempt at scanning the .META. region.
Looking through regionserver logs, they are all fine till about that
above time when I start to see variations on:
2008-11-03 14:08:46,440 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_1223341017118968735_305051 from any node:
java.io.IOException: No live nodes contain current block
....and
2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Exception
in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.X.X.Y:50010
2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_6726606309673852040_314096
Your hdfs went bad for some reason around above time. I don't see any
obvious explanation for why it went bad. You were running balancer at
the time IIRC?
Could you netstat your running datanodes and see how many concurrent
connections you had running? Was 1024 enough? You had configured a max
of 1024? I don't see the ulimit print out in these logs so presume its
> 1024.
How many regions do you have in your table when it starts to go wonky?
You have 6 datanodes running beside your 6 regionservers?
St.Ack
Slava Gorelik wrote:
Hi Michael.
I'm sending logs, in 2 parts (2 messages)
Part 1
On Tue, Nov 4, 2008 at 11:44 PM, Slava Gorelik
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Thank You. Now it's clear.
On Tue, Nov 4, 2008 at 11:31 PM, stack <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
Slava Gorelik wrote:
One more regarding the blockCache, how changes in store
files (as i
understand those are MapFiles) are reflected on client
side cache. If we are
talking about more than one client that doing a changes ?
If each client has
different part of the MapFile ? or something else ?
The block cache cache is over in the server. Its a cache for
store files which never change once written. Did I say
client-side cache? I should have been more clear. The client
in this case is the regionserver itself. The cache is so the
regionserver saves on its trips over the network visiting
datanodes.
St.Ack
Best Regards.
On Tue, Nov 4, 2008 at 11:10 PM, Slava Gorelik
<[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>wrote:
I can try to reproduce it again, but before this i
would like to send you a
logs.
Best Regards.
On Tue, Nov 4, 2008 at 10:05 PM, stack
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Then we should try and figure if there is an issue
in the balancer, or
maybe there is something missing if we are not
doing a big upload in a
manner that balances the upload across HDFS?
St.Ack
Slava Gorelik wrote:
Sure, i'll arrange logs tomorrow.About
balancer, to wait when the massive
work is finished is good in testing
environment but in production it's
not
relevant :-)
Best Regards.
On Tue, Nov 4, 2008 at 9:48 PM, stack
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
wrote:
Slava Gorelik wrote:
Hi.Regarding the failure of new block
creation - i failed to run hbase
till
i reformatted HDFS again.
I'd be interested in the logs.
I just wandering if hadoop re balancing
is necessary? Will it balance
itself
? As i understand hadoop balancer is
moving data between data nodes,
but
in
my case this is during massive (8
clients just adding a records - about
400
requests for all region servers - 6).
So, is it good idea to run
balancer during heavy load ?
I don't have sufficient experience running
the balancer. Perhaps wait
till
upload is done, then run it?
St.Ack