Yes. Accumulo fully recovered when I restarted the loggers.
On Wed, Jan 30, 2013 at 11:30 AM, Keith Turner <[email protected]> wrote: > Was this resolved? > > On Mon, Jan 28, 2013 at 8:28 AM, David Medinets > <[email protected]> wrote: >> I had a plain Java program, single-threaded, that read an HDFS >> Sequence File with fairly small Sqoop records (probably under 200 >> bytes each). As each record was read a Mutation was created, then >> written via Batch Writer to Accumulo. This program was as simple as it >> gets. Read a record, Write a mutation. The Row Id used YYYYMMDD (a >> date) so the ingest targeted one tablet. The ingest rate was over 150 >> million entries for about 19 hours. Everything seemed fine. Over 3.5 >> Billion entries were written. Then the nodes ran out of memory and >> Accumulo nodes went dead. 90% of the server was lost. And data poofed >> out of existence. Only 800M entries are visible now. >> >> We restarted the data node processes and the cluster has been running >> garbage collection for over 2 days. >> >> I did not expect this simple approach to cause an issue. From looking >> at the logs file, I think that at least two compactions were being run >> while still ingested those 176 million entries per hour. The hold >> times started rising and eventually the system simply ran out of >> memory. I have no certainty about this explanation though. >> >> My current thinking is to re-initialize Accumulo and find some way to >> programatically monitoring the hold time. The add a delay to the >> ingest process whenever the hold time rises over 30 seconds. Does that >> sound feasible? >> >> I know there are other approaches to ingest and I might give up this >> method and use another. I was trying to get some kind of baseline for >> analysis reasons with this approach.
