Jesse Fugitt created AMQ-5394:
---------------------------------

             Summary: Incorrect handling of duplicate update message commands 
in KahaDB can lead to broker startup errors
                 Key: AMQ-5394
                 URL: https://issues.apache.org/jira/browse/AMQ-5394
             Project: ActiveMQ
          Issue Type: Bug
          Components: Message Store
    Affects Versions: 5.10.0
            Reporter: Jesse Fugitt


When using the new (in 5.10) persistJMSRedelivered option to make sure all 
duplicates are marked as redelivered (the activemq.xml config file used 
<policyEntry queue=">" persistJMSRedelivered="true"></policyEntry>), we 
occasionally had a broker fail to start up with the following error:

2014-10-07 17:31:15,117 | ERROR | Looking for key 7 but not found in fileMap: 
{8=db-8.log number = 8 , length = 9132256} | 
org.apache.activemq.store.kahadb.disk.journal.Journal | main

2014-10-07 17:31:15,117 | ERROR | Failed to start Apache ActiveMQ ([broker0, 
null], java.io.IOException: Could not locate data file 
/local/temp/apache-activemq-5.10.0/data/kahadb/db-7.log) | 
org.apache.activemq.broker.BrokerService | main


The root cause seems to be when KahaDB processes a "duplicate" update message 
command.  The code in KahaDB logs a warning when this occurs from the following 
else statement and then updates the metadata location and then exits the 
function as shown below:

...
} else {
            LOG.warn("Non existent message update attempt rejected. 
Destination: {}://{}, Message id: {}", command.getDestination().getType(), 
command.getDestination().getName(), command.getMessageId());
        }
metadata.lastUpdate = location;
...

It turns out that the metadata.lastUpdate = location; line should not run if we 
took the else branch above so the simple fix is to move that line up into the 
if block so that it will not run after the log warning.  Once we did that, we 
no longer see the broker startup errors.  Note that this log warning does not 
always lead to a broker startup error as it is also related to writing at the 
end of a transaction log file or the checkpoint timer interval so it is not 
simple to reproduce but we have not see the startup error once the 
metadata.lastUpdate line was moved to the correct location.

A patch will be provided to show the change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to