kevinrr888 opened a new pull request, #5813:
URL: https://github.com/apache/accumulo/pull/5813
This PR partially addresses #5787
I have reached a dead end with debugging this test.
The test logic has no issues as far as I can tell and the FATE logic (as far
as I can tell) has one potential concurrency issue (which I addressed in this
PR), but the failure still occurs occassionally. From jstacking the test
process in a failure case, it appears that the thread is either getting stuck
on the `workQueue.poll(100, MILLISECONDS)` call or it is repeatedly retrying
it, neither of which should be possible given the shutdown logic. Here is the
code:
```
while (fate.getKeepRunning().get() && !stop.get()) {
FateId unreservedFateId = workQueue.poll(100, MILLISECONDS);
...
```
The jstack trace shows this throughout the time FATE is trying to shutdown:
```
"accumulo.pool.manager.fate.user.commit_compaction.namespace_create.namespace_delete.namespace_rename.shutdown_tserver.system_split.system_merge.table_bulk_import2.table_cancel_compact.table_clone.table_compact-Worker-1"
#57 daemon prio=5 os_prio=0 cpu=82600.00ms elapsed=86.73s
tid=0x00007693e00058f0 nid=0x2304f runnable [0x00007694a5ef9000]
java.lang.Thread.State: RUNNABLE
at
java.util.concurrent.LinkedTransferQueue.awaitMatch([email protected]/LinkedTransferQueue.java:652)
at
java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:616)
at
java.util.concurrent.LinkedTransferQueue.poll([email protected]/LinkedTransferQueue.java:1294)
at
org.apache.accumulo.core.fate.FateExecutor$TransactionRunner.reserveFateTx(FateExecutor.java:349)
at
org.apache.accumulo.core.fate.FateExecutor$TransactionRunner.run(FateExecutor.java:378)
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
at java.lang.Thread.run([email protected]/Thread.java:840)
```
This doesn't make sense as:
1) When we shutdown FATE, we first set keepRunning to false, so the while
loop should terminate
2) The poll will return after, at most, 100ms
I have run out of ideas. This could use another set of eyes, if anyone has
the time. I can explain anything in regards to test logic or the fate logic, if
needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]