Keith Turner created ACCUMULO-1831:
--------------------------------------

             Summary: Write ahead logs from upgrade prematurely GCed
                 Key: ACCUMULO-1831
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1831
             Project: Accumulo
          Issue Type: Sub-task
            Reporter: Keith Turner
            Priority: Blocker


I was running {{test/system/upgrade_test.sh dirty}} and the test hung.  Upon 
inspection, the wals from 1.5 were deleted before all tablets were recovered.   

Some tablets from 1.5 recovered fine.

{noformat}
2013-10-29 20:29:26,475 [log.SortedLogRecovery] INFO : Recovery complete for 
!!R<< using 
hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
{noformat}

Then the GC kicked in and deleted files before tablets were finished recovering.

{noformat}
2013-10-29 20:29:30,421 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL 
for offline server 
hdfs://nnhost:6093/rktl/accumulo-upt/wal/127.0.0.1+9997/754f171b-c260-42dd-b17e-bd15064608c7
2013-10-29 20:29:30,428 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing 
sorted WAL 
hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
{noformat}

Tablet failed to recover.

{noformat}
2013-10-29 20:29:30,858 [tabletserver.TabletServer] WARN : exception trying to 
assign tablet 1<;row_0000180000 /default_tablet
java.lang.RuntimeException: java.io.IOException: Unable to find recovery files 
for extent 1<;row_0000180000 logEntry: 1<; 754f171b-c260-42dd-b17e-bd15064608c7 
(19)
        at 
org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1398)
        at 
org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1233)
        at 
org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1088)
        at 
org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1076)
{noformat}


I had set my gc delay to 30 secs while testing another issue and thats why I 
ran into this issue.   

Looking at the code, I do not think its properly converting relative paths from 
1.5 to absolute paths.   I think the code should convert everything to relative 
paths (just UUIDs) to avoid problems caused by differing configurations.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to