>From Ritik Raj <[email protected]>: Ritik Raj has submitted this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967?usp=email )
Change subject: [NO ISSUE][HYR] Fix NPE in AbstractIoOperation and improve cloud interrupt handling ...................................................................... [NO ISSUE][HYR] Fix NPE in AbstractIoOperation and improve cloud interrupt handling - user model changes: no - storage format changes: no - interface changes: no Details: During a distributed job cancellation (e.g., HYR0034), the Hyracks framework interrupts active threads. If a thread is interrupted during an LSM bulk load, it enters an abort sequence that triggers AbstractIoOperation.cleanup(). Because the interrupt originated externally, the I/O operation never registered an internal failure. This resulted in getFailure() returning null, causing a NullPointerException when cleanup() attempted to attach suppressed teardown exceptions. This NPE masked the true state of the system. Furthermore, CloudRetryableRequestUtil failed to recognize the AWS SDK's AbortedException as a standard interrupt, causing it to violently rethrow the exception instead of cleanly suppressing it during cloud teardowns. Change-Id: If36c1336f8e8377f2d410c8878d168f1209bcf87 Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967 Integration-Tests: Michael Blow <[email protected]> Tested-by: Jenkins <[email protected]> Reviewed-by: Michael Blow <[email protected]> Reviewed-by: Ritik Raj <[email protected]> --- M asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java M hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java M hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java M hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java M hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java 5 files changed, 17 insertions(+), 7 deletions(-) Approvals: Michael Blow: Looks good to me, approved; Verified Ritik Raj: Looks good to me, but someone else must approve Jenkins: Verified diff --git a/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java b/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java index 1ea6941..a286a33 100644 --- a/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java +++ b/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java @@ -155,7 +155,7 @@ } if (exceptionRef.get() != null) { exceptionRef.get().printStackTrace(); - Assert.fail("Exception in insert threads: " + exceptionRef.get().getMessage()); + Assert.fail("Exception in insert threads: " + exceptionRef.get()); } for (int i = 0; i < NUM_PARTITIONS; i++) { List<ILSMDiskComponent> diskComponents = new ArrayList<>(primaryIndexes[i].getDiskComponents()); diff --git a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java index 5461ed5..43fbfc9 100644 --- a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java +++ b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java @@ -111,10 +111,14 @@ try { return doRun(request, retry, RETRY_ALWAYS_PREDICATE); } catch (Throwable e) { - // First, clear the interrupted flag - interrupted |= Thread.interrupted(); - if (ExceptionUtils.causedByInterrupt(e)) { + // clear & check any interrupted status + boolean currentCycleInterrupted = Thread.interrupted(); + interrupted |= currentCycleInterrupted; + if (ExceptionUtils.causedByInterrupt(e) || currentCycleInterrupted) { interrupted = true; + LOGGER.debug( + "Suppressing interrupt in run(). causedByInterrupt: {}, currentCycleInterrupted: {}", + ExceptionUtils.causedByInterrupt(e), currentCycleInterrupted); } else { // The cause isn't an interruption, rethrow throw e; @@ -197,9 +201,12 @@ } try { if (Thread.currentThread().isInterrupted() || !retryPolicy.retry(e)) { + LOGGER.debug("Exiting doRun() loop. isInterrupted: {}, attempt: {}/{}", + Thread.currentThread().isInterrupted(), attempt, NUMBER_OF_RETRIES); throw HyracksDataException.create(e); } } catch (InterruptedException interruptedEx) { + LOGGER.debug("Thread interrupted while waiting for retry policy backoff."); throw HyracksDataException.create(interruptedEx); } attempt++; diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java index 8ae0571..7176e22 100644 --- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java +++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java @@ -96,7 +96,7 @@ bufferCache.deleteFile(file); } } catch (HyracksDataException hde) { - getFailure().addSuppressed(hde); + ExceptionUtils.suppress(failure, hde); } } } diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java index 841d389..3d73289 100644 --- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java +++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java @@ -84,9 +84,9 @@ @Override public void abort() throws HyracksDataException { - opCtx.getIoOperation().setStatus(LSMIOOperationStatus.FAILURE); - fail(null); try { + opCtx.getIoOperation().setStatus(LSMIOOperationStatus.FAILURE); + fail(null); try { componentBulkLoader.abort(); } finally { diff --git a/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java b/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java index 8169fb9..40020a6 100644 --- a/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java +++ b/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java @@ -1011,6 +1011,9 @@ sweepAndFlush(fInfo, false); try { if (fInfo.getReferenceCount() > 0) { + LOGGER.info( + "Resource leak detected: Force-deleting file with active references. fileId: {}, referenceCount: {}", + fileId, fInfo.getReferenceCount()); throw new HyracksDataException("Deleting open file"); } } finally { -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: asterixdb Gerrit-Branch: lumina Gerrit-Change-Id: If36c1336f8e8377f2d410c8878d168f1209bcf87 Gerrit-Change-Number: 20967 Gerrit-PatchSet: 8 Gerrit-Owner: Ritik Raj <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Ritik Raj <[email protected]> Gerrit-CC: Anon. E. Moose #1000171 Gerrit-CC: Murtadha Hubail <[email protected]>
