Jean-Adrien wrote:
..
Stack, you ask me if my hard disks were full. I said one is. Why did you
link the above problem with that. Because of the du problem noticed in
HADOOP-3232 ? I don't think I'm affected by this problem, my BlockReport
process duration is less than a second.
We were seeing HADOOP-3831 on our cluster (hadoop 0.18.0 and hbase 0.18.1RC1). After a rebalance of the hdfs content, brought on by the observation that loading was lopsided, the issue went away. Thought -- not proven -- is that the lopsidedness was causing disks to fill which eventually led to 3831.

...
Another question by the way:
We saw that the hadoop-default.xml is used by hbase client, it overrides the
replication factor; ok. But could it override the dfs.datanode.du.reserved /
dfs.datanode.pct properties ? (which sounds to be policy of datanode rather
than client). I said that my settings doesn't seem to affect the behaviour
of datanodes.
I could be wrong, but I don't see how. You are running start-dfs.sh over in HADOOP_HOME, not in HBASE_HOME. Unless you somehow have CLASSPATHs intermingled, datanode startup should not be picking up content of HBASE_HOME/conf.

I owe you other answers/support. In particular, I need to try running dfs.datanode.socket.write.timeout = 0 to see if I get same problem as you. Let me know if anything else you'd have me try.

Thanks for all the excellent diagnosis.
St.Ack

Reply via email to