[ 
https://issues.apache.org/jira/browse/ACCUMULO-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157353#comment-14157353
 ] 

Josh Elser commented on ACCUMULO-3182:
--------------------------------------

It appears that I incorrectly assumed that 1.5 would also have this issue in 
the first place. Best as I can tell it does not. It seems like this was 
introduced with some of the crypto-related changes.

In 1.5, if we can't read the header or there is no header (was this previously 
the case for WALs?), we'll return an InputStream seek'ed to '0'. Then, the 
LogSorter will attempt to read pairs of LogFileKeys and LogFileValues which 
handles an EOFException. In the catch, it will also create the empty MapFile so 
that recovery will gracefully complete as well.

> Empty or partial WAL header blocks successful recovery
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3182
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3182
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: 
> 0001-ACCUMULO-3182-Gracefully-handles-incomplete-missing-.patch
>
>
> Haven't ever seen this one before. A replication IT failed -- looking into 
> it, it was because the tserver that came up (after killing the original) 
> failed to complete recovery. The below happened a few times before the test 
> ultimately timed out.
> {noformat}
> 2014-09-29 04:46:10,259 [zookeeper.DistributedWorkQueue] DEBUG: Looking for 
> work in /accumulo/f98e79c4-9dcd-4fb0-8ec9-5804f0818839/recovery
> 2014-09-29 04:46:10,340 [zookeeper.DistributedWorkQueue] DEBUG: got lock for 
> af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] DEBUG: Sorting 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
>  to 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
>  using sortId af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] INFO : Copying 
> file:/var/lib/jenkins/home/jobs/Accumulo-Master-Integration-Tests/workspace/test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
>  to 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,345 [log.LogSorter] ERROR: java.io.EOFException
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:197)
>       at java.io.DataInputStream.readFully(DataInputStream.java:169)
>       at 
> org.apache.accumulo.tserver.log.DfsLogger.readHeaderAndReturnStream(DfsLogger.java:282)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:113)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
>       at 
> org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> 2014-09-29 04:46:10,346 [log.LogSorter] ERROR: Error during cleanup sort/copy 
> af53bf1e-c293-463b-b4de-5efdb8b34962
> java.lang.NullPointerException
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:183)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:151)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
>       at 
> org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to