On 1/12/11 1:05 PM, Friso van Vollenhoven wrote:
If I am correct your proposed solution would set you back to a image from about 15-30 minutes before the crash. I think it depends on what you do with your HDFS (HBase, append only things, ?), whether that will work out. In our case we are running HBase and going back in time with the NN image is not very helpful then, because of splits and compactions removing and adding files all the time. On append only workloads where you have the option of redoing whatever it is that you did just before the time of the crash, this could work. But, please verify with someone with a better understanding of HDFS internals.
We do run HBase. Its our desire to avoid trashing the intervening data, however ditching the particular MR output files that show up in the error would be fine.
Also, there apparently is a way of healing a corrupt edits file using your favorite hex editor. There is a thread here: http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3caanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com%3e <http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/<aanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com>>
Thanks for the link. Manually editing the edits file is our current thought, a little understanding of the format should save us some pain.
There is a thread about this (our) problem on the cdh-user Google group. You could also try to post there.
Thanks, I'll go take a look there. - Adam