Hi Stack,
> I've not seen it before. Exception should note the
> file it was trying to read from I'd say at a minimum.
> Looks like failure trying to read in MapFile(SequenceFile)
> content. And you've not seen it since the restart?
> (Would be odd that a problematic file would heal itself).
It is odd that the file problem was "healed" automatically.
I'm not sure what to think exactly. Maybe it was a log file
and so the damaged portion was skipped during recover/
restart? Or maybe it was not truly a file problem at all.
Concur that the exception should include the file so better
failure analysis is possible.
> What about the files you made when crawler had no
> upper-bound on sizes pulled down? Are they still in your
> hbase?
>
> Disabling compression brought on a bunch of splits but
> otherwise, it seems to be working?
What I did was 'hadoop fs -rmr /data/hbase' and start over,
without compression or blockcache in the schema. :-) At
least right now data loss like that is only a temporary
inconvenience. That won't be the case much longer.
Also, now I have a file size limit in place on the crawler.
(Re: hbase-writer patch #6.)
I am still seeing OOME take down region servers. Last night
there were 5 failures in an 8 hour window. With the
exception of the IndexOutOfBounds incident, none of the
failures have needed manual intervention for recovery. But
this is only by luck I suspect. With 2GB heap only the
regionservers have been kind enough to go down on OOME.
:-) I'm going to start collecting logs and heap dumps and
see if I can find something in common therein.
- Andy