[ 
https://issues.apache.org/jira/browse/AMQ-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ritesh adval updated AMQ-9747:
------------------------------
    Description: 
This bug affects all version of activemq, I have added a few ones starting 
5.18.4.  This is a really critical bug, which causes checkpoint runner thread 
to SILENTLY GET KILLED (exception eaten away), causing kahadb journal files to 
KEEP GROWING, till you restart activemq.  There is a bug in handling of io 
exception... see screen shot below.  In the catch block its calls 
brokerService.handleIOException(). 

!image-2025-07-21-11-14-20-510.png! iif you take a look at the default io 
exception handler which is used, it will throw this SuppressReplyException at 
[https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L165]
  and if stopStartConnectors is true (which is if you use 
LeaseLockerIOExceptionHandler) then also it throws this SuppressReplyException 
at  
[https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L155]
 and because of this, the CheckPoint runner thread as shown in above screen 
shot would silently die, even though broker is still running...   

it seems the checkpoint runner should not be dying.... we had a situation where 
we were using EFS as our storage for kahadb... and due to a blip in connection 
EFS, an io exception in page.flush in MessageDatabase was thrown (we were using 
LeaseLockerIOExceptionHandler)... that caused DefaultIOExceptionHandler logic 
to start and stop connectors and return SuppressReplyException as i mentioned 
above, causing CheckPoint runner to silient get killed... while broker was 
still running....

 

we had this in production and the fix we did is to extend 
LeaseLockerIOExceptionHandler and catch exception throw from handle(IOException 
ex) method and log it as warn and not propogate it up to checkpoint runner 
thread... but this is temporary fix.. i am not even sure if CheckpointRunner 
needs to use DefaultIOExceptionHandler.... it shouldn't die silently...

  was:
This bug affects all version of activemq, I have added a few ones starting 
5.18.4.  This is a really critical bug, which causes checkpoint runner thread 
to SILENTLY GET KILLED (exception eaten away), causing kahadb journal files to 
KEEP GROWING, till you restart activemq.  There is a bug in handling of io 
exception... see screen shot below.  In the catch block its calls 
brokerService.handleIOException(). 

!image-2025-07-21-11-14-20-510.png! iif you take a look at the default io 
exception handler which is used, it will throw this SuppressReplyException at 
[https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L165]
  and if stopStartConnectors is true (which is if you use 
LeaseLockerIOExceptionHandler) then also it throws this SuppressReplyException 
at  
[https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L155]
 and because of this, the CheckPoint runner thread as shown in above screen 
shot would silently die, even though broker is still running...    

it seems the checkpoint runner should not be dying.... we had a situation where 
we were using EFS as our storage for kahadb... and due to a blip in connection 
EFS, an io exception in page.flush in MessageDatabase was thrown (we were using 
LeaseLockerIOExceptionHandler)... that caused DefaultIOExceptionHandler logic 
to start and stop connectors and return SuppressReplyException as i mentioned 
above, causing CheckPoint runner to silient get killed... while broker was 
still running.... 

 

we had this in production and the fix we did is to extend 
LeaseLockerIOExceptionHandler and catch exception throw from handle(IOException 
ex) method and log it as warn and not propogate it up to checkpoint runner 
thread... but this is temporary fix.. i am not even sure if CheckpointRunner 
needs to use DefaultIOExceptionHandler.... it should die siliently...


> Kahadb checkpoint runner thread dies without catching exception
> ---------------------------------------------------------------
>
>                 Key: AMQ-9747
>                 URL: https://issues.apache.org/jira/browse/AMQ-9747
>             Project: ActiveMQ Classic
>          Issue Type: Bug
>          Components: KahaDB
>    Affects Versions: 6.1.4, 6.1.6, 5.18.7, 6.1.7
>            Reporter: ritesh adval
>            Priority: Major
>         Attachments: image-2025-07-21-11-14-20-510.png
>
>
> This bug affects all version of activemq, I have added a few ones starting 
> 5.18.4.  This is a really critical bug, which causes checkpoint runner thread 
> to SILENTLY GET KILLED (exception eaten away), causing kahadb journal files 
> to KEEP GROWING, till you restart activemq.  There is a bug in handling of io 
> exception... see screen shot below.  In the catch block its calls 
> brokerService.handleIOException(). 
> !image-2025-07-21-11-14-20-510.png! iif you take a look at the default io 
> exception handler which is used, it will throw this SuppressReplyException at 
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L165]
>   and if stopStartConnectors is true (which is if you use 
> LeaseLockerIOExceptionHandler) then also it throws this 
> SuppressReplyException at  
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L155]
>  and because of this, the CheckPoint runner thread as shown in above screen 
> shot would silently die, even though broker is still running...   
> it seems the checkpoint runner should not be dying.... we had a situation 
> where we were using EFS as our storage for kahadb... and due to a blip in 
> connection EFS, an io exception in page.flush in MessageDatabase was thrown 
> (we were using LeaseLockerIOExceptionHandler)... that caused 
> DefaultIOExceptionHandler logic to start and stop connectors and return 
> SuppressReplyException as i mentioned above, causing CheckPoint runner to 
> silient get killed... while broker was still running....
>  
> we had this in production and the fix we did is to extend 
> LeaseLockerIOExceptionHandler and catch exception throw from 
> handle(IOException ex) method and log it as warn and not propogate it up to 
> checkpoint runner thread... but this is temporary fix.. i am not even sure if 
> CheckpointRunner needs to use DefaultIOExceptionHandler.... it shouldn't die 
> silently...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to