cshannon commented on PR #1659: URL: https://github.com/apache/activemq/pull/1659#issuecomment-3923949758
> To clarify the background task involvement: in a standard serial flow, the Destination handles the guards correctly. However, in my ActiveMQTextMessageStressTest, the race occurs during high-concurrency scenarios where multiple threads interact with the message command. > > The failure specifically occurs here: Caused by: java.lang.AssertionError: Text should never be null during stress at line 138 > > This trace confirms that even though the message was produced with text, the unmarshalled state was cleared out from under the consumer thread before it could be read. > > Because clearUnmarshalledState() is public, it can be invoked by internal broker components (like Advisory dispatchers or NIO transport threads) to reduce memory footprint. Since the current implementation doesn't check if content is actually populated before nulling the text, it creates this 'double-null' state. > > Hardening the command itself makes it 'safe by default,' protecting data integrity regardless of which internal broker component calls the cleanup So taking a step back, at this point I'm not sure there's a real problem we need to solve here, or if there is I haven't seen the evidence yet for an actual broker problem to fix because the messages should be copied on dispatch. A few things to point out: First, the goal here should not be to make changes to the code to simply make the unit test pass. The unit test is an artificial recreation of an error by manually calling that method outside of normal operation in a multi threaded environment. As @tabish121 pointed out, those messages were **never** intended to be used by multiple threads so there will be race conditions if 2 threads are operating on the message. That alone makes the test invalid as the messages are inherently not thread safe so a stress test will break it. Second, simply making this change inside of clearUnmarshalledState() to serialize and marshal the data if not serialized does not make the messages thread safe. There is still a race condition and using that method is only safe from a single threaded environment. You could still get into trouble if calling the method from multiple threads so it doesn't really solve the issue if there are 2 threads touching the message by mistake. Third, yes the method is public, so in theory someone could certainly write code that would invoke the method and clear the unmarshaled state without marshaling the data first. However, if a user is going to write code to do that a user is also capable of checking the data has been marshaled before calling that method just like the broker does. If anything, it may make more sense to do a state check and throw an exception inside of clearUnmarshalledState() if the marshaled data is missing as if someone is calling that method they likely are going to be expecting the data to already be marshaled and transparently marshaling would just be hiding an error. Lastly, this issue and PR were originally opened up because of receiving null message bodies in a real environment but I have yet to see evidence or a demonstration of how this is possible to happen because the messages should be copied and be unique for each consumer. I'm not saying it's impossible to happen but so far receiving null bodies has only been demonstrated through a unit test that is not really a valid test because it creates a race condition scenario that should not be possible in a normal broker operation because those messages should always be copied. Originally I was ok adding synchronization because I was thinking there was a spot in the broker where multiple threads might be interacting with the same copy in the VM transport. However, so far that doesn't appear to be the case or at least we haven't found where that is yet as we should be copying the message on dispatch. If there is a bug to fix that fix would be to locate the spot where multiple threads are interacting with the same copy vs their own copy and fix that. Any client code that is calling clearUnmarshalledState() should make sure to do so on its own copy of the message and should verify it's safe to call that method (ie check it's marshaled first) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] For further information, visit: https://activemq.apache.org/contact
