>From Ritik Raj <[email protected]>:

Ritik Raj has submitted this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967?usp=email )

Change subject: [NO ISSUE][HYR] Fix NPE in AbstractIoOperation and improve 
cloud interrupt handling
......................................................................

[NO ISSUE][HYR] Fix NPE in AbstractIoOperation and improve cloud interrupt 
handling

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
During a distributed job cancellation (e.g., HYR0034),
the Hyracks framework interrupts active threads.
If a thread is interrupted during an LSM bulk load,
it enters an abort sequence that triggers
AbstractIoOperation.cleanup().

Because the interrupt originated externally,
the I/O operation never registered an internal
failure. This resulted in getFailure() returning
null, causing a NullPointerException when
cleanup() attempted to attach suppressed
teardown exceptions. This NPE masked the true
state of the system.

Furthermore, CloudRetryableRequestUtil failed
to recognize the AWS SDK's AbortedException as
a standard interrupt, causing it to violently
rethrow the exception instead of cleanly
suppressing it during cloud teardowns.

Change-Id: If36c1336f8e8377f2d410c8878d168f1209bcf87
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967
Integration-Tests: Michael Blow <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Michael Blow <[email protected]>
Reviewed-by: Ritik Raj <[email protected]>
---
M 
asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java
M 
hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java
M 
hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java
M 
hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java
M 
hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java
5 files changed, 17 insertions(+), 7 deletions(-)

Approvals:
  Michael Blow: Looks good to me, approved; Verified
  Ritik Raj: Looks good to me, but someone else must approve
  Jenkins: Verified




diff --git 
a/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java
 
b/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java
index 1ea6941..a286a33 100644
--- 
a/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java
+++ 
b/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/GlobalVirtualBufferCacheTest.java
@@ -155,7 +155,7 @@
             }
             if (exceptionRef.get() != null) {
                 exceptionRef.get().printStackTrace();
-                Assert.fail("Exception in insert threads: " + 
exceptionRef.get().getMessage());
+                Assert.fail("Exception in insert threads: " + 
exceptionRef.get());
             }
             for (int i = 0; i < NUM_PARTITIONS; i++) {
                 List<ILSMDiskComponent> diskComponents = new 
ArrayList<>(primaryIndexes[i].getDiskComponents());
diff --git 
a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java
 
b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java
index 5461ed5..43fbfc9 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/util/CloudRetryableRequestUtil.java
@@ -111,10 +111,14 @@
                 try {
                     return doRun(request, retry, RETRY_ALWAYS_PREDICATE);
                 } catch (Throwable e) {
-                    // First, clear the interrupted flag
-                    interrupted |= Thread.interrupted();
-                    if (ExceptionUtils.causedByInterrupt(e)) {
+                    // clear & check any interrupted status
+                    boolean currentCycleInterrupted = Thread.interrupted();
+                    interrupted |= currentCycleInterrupted;
+                    if (ExceptionUtils.causedByInterrupt(e) || 
currentCycleInterrupted) {
                         interrupted = true;
+                        LOGGER.debug(
+                                "Suppressing interrupt in run(). 
causedByInterrupt: {}, currentCycleInterrupted: {}",
+                                ExceptionUtils.causedByInterrupt(e), 
currentCycleInterrupted);
                     } else {
                         // The cause isn't an interruption, rethrow
                         throw e;
@@ -197,9 +201,12 @@
                 }
                 try {
                     if (Thread.currentThread().isInterrupted() || 
!retryPolicy.retry(e)) {
+                        LOGGER.debug("Exiting doRun() loop. isInterrupted: {}, 
attempt: {}/{}",
+                                Thread.currentThread().isInterrupted(), 
attempt, NUMBER_OF_RETRIES);
                         throw HyracksDataException.create(e);
                     }
                 } catch (InterruptedException interruptedEx) {
+                    LOGGER.debug("Thread interrupted while waiting for retry 
policy backoff.");
                     throw HyracksDataException.create(interruptedEx);
                 }
                 attempt++;
diff --git 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java
 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java
index 8ae0571..7176e22 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/AbstractIoOperation.java
@@ -96,7 +96,7 @@
                     bufferCache.deleteFile(file);
                 }
             } catch (HyracksDataException hde) {
-                getFailure().addSuppressed(hde);
+                ExceptionUtils.suppress(failure, hde);
             }
         }
     }
diff --git 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java
 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java
index 841d389..3d73289 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMIndexDiskComponentBulkLoader.java
@@ -84,9 +84,9 @@

     @Override
     public void abort() throws HyracksDataException {
-        opCtx.getIoOperation().setStatus(LSMIOOperationStatus.FAILURE);
-        fail(null);
         try {
+            opCtx.getIoOperation().setStatus(LSMIOOperationStatus.FAILURE);
+            fail(null);
             try {
                 componentBulkLoader.abort();
             } finally {
diff --git 
a/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java
 
b/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java
index 8169fb9..40020a6 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java
@@ -1011,6 +1011,9 @@
         sweepAndFlush(fInfo, false);
         try {
             if (fInfo.getReferenceCount() > 0) {
+                LOGGER.info(
+                        "Resource leak detected: Force-deleting file with 
active references. fileId: {}, referenceCount: {}",
+                        fileId, fInfo.getReferenceCount());
                 throw new HyracksDataException("Deleting open file");
             }
         } finally {

--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20967?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: If36c1336f8e8377f2d410c8878d168f1209bcf87
Gerrit-Change-Number: 20967
Gerrit-PatchSet: 8
Gerrit-Owner: Ritik Raj <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Ritik Raj <[email protected]>
Gerrit-CC: Anon. E. Moose #1000171
Gerrit-CC: Murtadha Hubail <[email protected]>

Reply via email to