On 1/12/11 1:05 PM, Friso van Vollenhoven wrote:
If I am correct your proposed solution would set you back to a image
from about 15-30 minutes before the crash. I think it depends on what
you do with your HDFS (HBase, append only things, ?), whether that will
work out. In our case we are running HBase and going back in time with
the NN image is not very helpful then, because of splits and compactions
removing and adding files all the time. On append only workloads where
you have the option of redoing whatever it is that you did just before
the time of the crash, this could work. But, please verify with someone
with a better understanding of HDFS internals.

We do run HBase. Its our desire to avoid trashing the intervening data, however ditching the particular MR output files that show up in the error would be fine.

Also, there apparently is a way of healing a corrupt edits file using
your favorite hex editor. There is a thread here:
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3caanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com%3e
<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/<aanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com>>

Thanks for the link. Manually editing the edits file is our current thought, a little understanding of the format should save us some pain.

There is a thread about this (our) problem on the cdh-user Google group.
You could also try to post there.

Thanks, I'll go take a look there.

- Adam

Reply via email to