keith-turner opened a new issue, #3006:
URL: https://github.com/apache/accumulo/issues/3006
Saw the following while running bulk import RW test.
The test added splits to the table
```
2022-10-07T07:39:40,584 [randomwalk.bulk.Split] INFO : split for [r008c7,
r00ee2, r011a5, r0531f, r0929e, r0942e, r0b580, r0d6d4, r10134, r13a86, r15eb5,
r16920] finished
```
Then later a bulk import failed complaining about one of the splits created
above. In the test logs below it was also running a concurrent merge, so I
suspect that caused the problem.
```
2022-10-07T07:40:56,728 [randomwalk.bulk.BulkPlusOne] DEBUG: preparing bulk
files with start rows [r00000, r05042, r09cea, r13ea4, r180f5] last row r1869f
marker 0000153
2022-10-07T07:40:56,728 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_0.rf
2022-10-07T07:40:56,902 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_1.rf
2022-10-07T07:40:56,982 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_3.rf
2022-10-07T07:40:57,079 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_4.rf
2022-10-07T07:40:57,086 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_2.rf
2022-10-07T07:40:57,173 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_5.rf
2022-10-07T07:40:57,237 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_6.rf
2022-10-07T07:40:57,370 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_3.rf
2022-10-07T07:40:57,498 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_4.rf
2022-10-07T07:40:57,570 [randomwalk.bulk.Verify] INFO : Waiting for 103
nodes to complete
2022-10-07T07:40:57,752 [randomwalk.bulk.Merge] INFO : merging (-inf ->
r05e6e] complete
2022-10-07T07:40:57,752 [randomwalk.bulk.ConsistencyCheck] INFO : Checking
r06864
2022-10-07T07:40:57,794 [randomwalk.bulk.BulkPlusOne] ERROR:
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet 18;r
0531f was deleted while iterating
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet
18;r0531f was deleted while iterating
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeFileToTabletMappings(BulkImport.java:579)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeMappingFromFiles(BulkImport.java:487)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.load(BulkImport.java:152)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.bulk.BulkPlusOne.bulkLoadLots(BulkPlusOne.java:96)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.bulk.BulkPlusOne.runLater(BulkPlusOne.java:105)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.bulk.BulkTest.lambda$visit$0(BulkTest.java:31)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.util.concurrent.ExecutionException:
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet
18;r0531f was deleted while iterating
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
~[?:?]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeFileToTabletMappings(BulkImport.java:573)
~[accumulo-testing-shaded.jar:?]
... 13 more
Caused by: org.apache.accumulo.core.metadata.schema.TabletDeletedException:
Tablet 18;r0531f was deleted while iterating
at
org.apache.accumulo.core.metadata.schema.LinkingIterator.resetSource(LinkingIterator.java:127)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.metadata.schema.LinkingIterator.next(LinkingIterator.java:180)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.metadata.schema.LinkingIterator.next(LinkingIterator.java:50)
~[accumulo-testing-shaded.jar:?]
at
java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
~[?:?]
at
java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
~[?:?]
at
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
~[?:?]
at
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
~[?:?]
at
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
~[?:?]
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
~[?:?]
at
org.apache.accumulo.core.clientImpl.bulk.ConcurrentKeyExtentCache.lookup(ConcurrentKeyExtentCache.java:124)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.findOverlappingTablets(BulkImport.java:331)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.findOverlappingTablets(BulkImport.java:356)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.core.clientImpl.bulk.BulkImport.lambda$computeFileToTabletMappings$7(BulkImport.java:549)
~[accumulo-testing-shaded.jar:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
~[?:?]
... 5 more
2022-10-07T07:40:57,804 [randomwalk.bulk.BulkMinusOne] INFO : Decrementing
```
The test kept running after the failure above and failed later during verify.
```
2022-10-07T07:43:28,304 [testing.randomwalk.Framework] ERROR: Error during
random walk
java.lang.Exception: Error running node bulk.Verify
at
org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:360)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:54)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:98)
~[accumulo-testing-shaded.jar:?]
Caused by: java.lang.Exception: Bad key at r00000 cf:000 [] 1665128566432
false=-1
at
org.apache.accumulo.testing.randomwalk.bulk.Verify.visit(Verify.java:73)
~[accumulo-testing-shaded.jar:?]
at
org.apache.accumulo.testing.randomwalk.Module.lambda$visit$0(Module.java:295)
~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
~[accumulo-testing-shaded.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
2022-10-07T07:43:28,305 [testing.randomwalk.Framework] INFO : Test finished
```
I analyzed the markers in the the test table and found that marker 00153 was
missing which corresponds to the bulk import that failed.
I suspect the bulk import code need to catch TabletDeletedException and
completely restart the operation. That exception is thrown by the
LinkingIterator that checks the consistency of the metadata table while
scanning. The LinkingIterator can handle concurrent splits, but there is no
way for it to handle concurrent merges so it throws that exception. Higher
level operations need to start over when they see that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]