Peeyush Gupta created ASTERIXDB-3557:
----------------------------------------

             Summary: Failure in reading atomic txn log file results in crash 
loop
                 Key: ASTERIXDB-3557
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3557
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: TX - Transactions
            Reporter: Peeyush Gupta


On failures to deserialize an atomic transaction log file during recovery, the 
CC enters a crash loop. In those cases, we need to delete the invalid files and 
continue processing.

Sample failures:
 
 {{}}
{noformat}
2025-01-28T11:18:30.840+00:00 ERRO CBAS.replication.NcLifecycleCoordinator 
[Executor-13:ClusterController] Node b420f4d7c136b5e56bda9374743cde5a failed to 
complete startup
org.apache.asterix.common.exceptions.ACIDException: 
java.lang.NullPointerException: Cannot read the array length because "bytes" is 
null
        at 
org.apache.asterix.transaction.management.service.transaction.TransactionManager.rollbackMetadataTransactionsWithoutWAL(TransactionManager.java:225)
 ~[asterix-transactions-1.0.3-2467.jar:1.0.3-2467]
        at 
org.apache.asterix.app.nc.task.LocalStorageCleanupTask.perform(LocalStorageCleanupTask.java:51)
 ~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
        at 
org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:63)
 ~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
        at 
org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108)
 ~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
 ~[?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 
~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 [?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 [?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.lang.NullPointerException
        at java.base/java.lang.String.<init>(String.java:1437) ~[?:?]
        at 
org.apache.asterix.transaction.management.service.transaction.TransactionManager.rollbackMetadataTransactionsWithoutWAL(TransactionManager.java:215)
 ~[asterix-transactions-1.0.3-2467.jar:1.0.3-2467]
        ... 8 more{noformat}
{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to