[ 
https://issues.apache.org/jira/browse/ACCUMULO-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158026#comment-14158026
 ] 

Christopher Tubbs commented on ACCUMULO-3182:
---------------------------------------------

The failure in master was before my last commit (which was 
dc0d01ce8ca5a9f7642ec53017476db2c01d91b4) . There was a missing import in the 
replication code and a variable not assigned error, both resulting in a failure 
to compile. My patch fixed the compilation errors, and I believe it to be 
correct, but I'd like you to review, in case there were any other issues that 
happened with that merge. The changes to the replication code appeared to occur 
in the merge commit (089408d596941e3c621037d35288bdd87deca5b7) during merge 
conflict resolution.

> Empty or partial WAL header blocks successful recovery
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3182
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3182
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: 
> 0001-ACCUMULO-3182-Gracefully-handles-incomplete-missing-.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Haven't ever seen this one before. A replication IT failed -- looking into 
> it, it was because the tserver that came up (after killing the original) 
> failed to complete recovery. The below happened a few times before the test 
> ultimately timed out.
> {noformat}
> 2014-09-29 04:46:10,259 [zookeeper.DistributedWorkQueue] DEBUG: Looking for 
> work in /accumulo/f98e79c4-9dcd-4fb0-8ec9-5804f0818839/recovery
> 2014-09-29 04:46:10,340 [zookeeper.DistributedWorkQueue] DEBUG: got lock for 
> af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] DEBUG: Sorting 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
>  to 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
>  using sortId af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] INFO : Copying 
> file:/var/lib/jenkins/home/jobs/Accumulo-Master-Integration-Tests/workspace/test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
>  to 
> file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,345 [log.LogSorter] ERROR: java.io.EOFException
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:197)
>       at java.io.DataInputStream.readFully(DataInputStream.java:169)
>       at 
> org.apache.accumulo.tserver.log.DfsLogger.readHeaderAndReturnStream(DfsLogger.java:282)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:113)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
>       at 
> org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> 2014-09-29 04:46:10,346 [log.LogSorter] ERROR: Error during cleanup sort/copy 
> af53bf1e-c293-463b-b4de-5efdb8b34962
> java.lang.NullPointerException
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:183)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:151)
>       at 
> org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
>       at 
> org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to