Hi Andrew,
Andrew Purtell wrote:
Thanks. Your configuration looks fine. (The defaults point to somewhere in
/tmp, which is bad for data longevity, so must always be changed.)
Yes, I recognise, tmp is not the right place for the dfs.
Did I understand you correctly that the corruption appeared to happen during
the period of time when your cluster and DFS was unstable due to excessive load?
Yes, it is exact. These errors message appeared after "hard reboots" of
HBase (by hard reboot, I mean kill signal since the HRegionServer
process was stuck), when the cluster was not stable.
These errors does not seem to interfere with the "normal operations" of
HBase. We are still able to query and upload data. The only things is
that HBase seems to be stuck in a loop, trying to read these regions,
and fills the log with this error message.
There is alas no "hbasefsck" yet so if the mapfiles become corrupted in DFS, there is
little that can be done except to drop and then recreate the table and start over. (For this reason
right now my application treats HBase as an enormous but temporary workspace, where any critical
data is replicated to different storage media and any data loss is only a setback in terms of
needing to recompute what was lost.) "hbasefsck" is on the road map, as is additional
data integrity considerations that can be possible once HADOOP-1700 is ready.
Ok, Good to know.
--
Renaud Delbru