cshannon commented on PR #1659:
URL: https://github.com/apache/activemq/pull/1659#issuecomment-3923949758

   > To clarify the background task involvement: in a standard serial flow, the 
Destination handles the guards correctly. However, in my 
ActiveMQTextMessageStressTest, the race occurs during high-concurrency 
scenarios where multiple threads interact with the message command.
   > 
   > The failure specifically occurs here: Caused by: java.lang.AssertionError: 
Text should never be null during stress at line 138
   > 
   > This trace confirms that even though the message was produced with text, 
the unmarshalled state was cleared out from under the consumer thread before it 
could be read.
   > 
   > Because clearUnmarshalledState() is public, it can be invoked by internal 
broker components (like Advisory dispatchers or NIO transport threads) to 
reduce memory footprint. Since the current implementation doesn't check if 
content is actually populated before nulling the text, it creates this 
'double-null' state.
   > 
   > Hardening the command itself makes it 'safe by default,' protecting data 
integrity regardless of which internal broker component calls the cleanup
   
   So taking a step back, at this point I'm not sure there's a real problem we 
need to solve here, or if there is I haven't seen the evidence yet for an 
actual broker problem to fix because the messages should be copied on dispatch.
   
   A few things to point out:
   
   First, the goal here should not be to make changes to the code to simply 
make the unit test pass. The unit test is an artificial recreation of an error 
by manually calling that method outside of normal operation in a multi threaded 
environment.  As @tabish121 pointed out, those messages were **never** intended 
to be used by multiple threads so there will be race conditions if 2 threads 
are operating on the message. That alone makes the test invalid as the messages 
are inherently not thread safe so a stress test will break it.
   
   Second, simply making this change inside of clearUnmarshalledState() to 
serialize and marshal the data if not serialized does not make the messages 
thread safe. There is still a race condition and using that method is only safe 
from a single threaded environment. You could still get into trouble if calling 
the method from multiple threads so it doesn't really solve the issue if there 
are 2 threads touching the message by mistake.
   
   Third, yes the method is public, so in theory someone could certainly write 
code that would invoke the method and clear the unmarshaled state without 
marshaling the data first.  However, if a user is going to write code to do 
that a user is also capable of checking the data has been marshaled before 
calling that method just like the broker does. If anything, it may make more 
sense to do a state check and throw an exception inside of 
clearUnmarshalledState() if the marshaled data is missing as if someone is 
calling that method they likely are going to be expecting the data to already 
be marshaled and transparently marshaling would just be hiding an error.
   
   Lastly, this issue and PR were originally opened up because of receiving 
null message bodies in a real environment but I have yet to see evidence or a 
demonstration of how this is possible to happen because the messages should be 
copied and be unique for each consumer. I'm not saying it's impossible to 
happen but so far receiving null bodies has only been demonstrated through a 
unit test that is not really a valid test because it creates a race condition 
scenario that should not be possible in a normal broker operation because those 
messages should always be copied. 
   
   Originally I was ok adding synchronization because I was thinking there was 
a spot in the broker where multiple threads might be interacting with the same 
copy in the VM transport. However, so far that doesn't appear to be the case or 
at least we haven't found where that is yet as we should be copying the message 
on dispatch.  If there is a bug to fix that fix would be to locate the spot 
where multiple threads are interacting with the same copy vs their own copy and 
fix that. Any client code that is calling clearUnmarshalledState() should make 
sure to do so on its own copy of the message and should verify it's safe to 
call that method (ie check it's marshaled first)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to