[jira] [Updated] (AMQ-8398) 4-byte Unicode message from JMS to STOMP will be corrupted

Simon Lundstrom (Jira) Wed, 06 Oct 2021 08:11:06 -0700


     [ 
https://issues.apache.org/jira/browse/AMQ-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simon Lundstrom updated AMQ-8398:
---------------------------------
    Description: 
When sending a message from:
JMS producer to STOMP consumer
or
STOMP producer to JMS consumer
which contains a 4-byte unicode code points e.g. 
https://unicode-table.com/en/1F5A4/ there is a corruption of the message.
In the JMS to STOMP case the code point gets converted to:
{{ef bf bd ef bf bd}} when it should be {{f0 9f 96 a4}}.
and in the STOMP to JMS case the JMS client throws an exception:
{code}
Exception in thread "main" javax.jms.JMSException: 
java.io.UTFDataFormatException
        at 
org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
        at 
org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:104)
        at 
org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:84)
        at testkonsument.App.JMS(App.java:86)
        at testkonsument.App.main(App.java:42)
Caused by: java.io.UTFDataFormatException
        at 
org.apache.activemq.util.MarshallingSupport.convertUTF8WithBuf(MarshallingSupport.java:389)
        at 
org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:358)
        at 
org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:101)
        ... 3 more
{code}

Using 4-byte unicode points
from STOMP to STOMP
or
from JMS to JMS
is not a problem, both works and does not corrupt the code point.

Note that 2- (e.g. https://unicode-table.com/en/00F6/) or 3-byte (e.g. 
https://unicode-table.com/en/2614/) Unicode code points does NOT get corrupted, 
even if the same message includes a 4-byte Unicode code point.

  was:
When sending a message from:
JMS producer to STOMP consumer
or
STOMP producer to JMS consumer
which contains a 4-byte unicode code points e.g. 
https://unicode-table.com/en/1F5A4/ there is a corruption of the message.
In the JMS to STOMP case the code point gets converted to:
{{ef bf bd ef bf bd}} when it should be {{f0 9f 96 a4}}.
and in the STOMP to JMS case the JMS client throws an exception:
{code}
Exception in thread "main" javax.jms.JMSException: 
java.io.UTFDataFormatException
        at 
org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
        at 
org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:104)
        at 
org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:84)
        at testkonsument.App.JMS(App.java:86)
        at testkonsument.App.main(App.java:42)
Caused by: java.io.UTFDataFormatException
        at 
org.apache.activemq.util.MarshallingSupport.convertUTF8WithBuf(MarshallingSupport.java:389)
        at 
org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:358)
        at 
org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:101)
        ... 3 more
{code}

Using 4-byte unicode points
from STOMP to STOMP
or
from JMS to JMS
is not a problem, both works and does not corrupt the code point.

Note that 2- or 3-byte Unicode code points does NOT get corrupted, even if the 
same message includes a 4-byte Unicode code point.


> 4-byte Unicode message from JMS to STOMP will be corrupted
> ----------------------------------------------------------
>
>                 Key: AMQ-8398
>                 URL: https://issues.apache.org/jira/browse/AMQ-8398
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, STOMP, Transport
>    Affects Versions: 5.16.3
>            Reporter: Simon Lundstrom
>            Priority: Major
>
> When sending a message from:
> JMS producer to STOMP consumer
> or
> STOMP producer to JMS consumer
> which contains a 4-byte unicode code points e.g. 
> https://unicode-table.com/en/1F5A4/ there is a corruption of the message.
> In the JMS to STOMP case the code point gets converted to:
> {{ef bf bd ef bf bd}} when it should be {{f0 9f 96 a4}}.
> and in the STOMP to JMS case the JMS client throws an exception:
> {code}
> Exception in thread "main" javax.jms.JMSException: 
> java.io.UTFDataFormatException
>         at 
> org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
>         at 
> org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:104)
>         at 
> org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:84)
>         at testkonsument.App.JMS(App.java:86)
>         at testkonsument.App.main(App.java:42)
> Caused by: java.io.UTFDataFormatException
>         at 
> org.apache.activemq.util.MarshallingSupport.convertUTF8WithBuf(MarshallingSupport.java:389)
>         at 
> org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:358)
>         at 
> org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:101)
>         ... 3 more
> {code}
> Using 4-byte unicode points
> from STOMP to STOMP
> or
> from JMS to JMS
> is not a problem, both works and does not corrupt the code point.
> Note that 2- (e.g. https://unicode-table.com/en/00F6/) or 3-byte (e.g. 
> https://unicode-table.com/en/2614/) Unicode code points does NOT get 
> corrupted, even if the same message includes a 4-byte Unicode code point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (AMQ-8398) 4-byte Unicode message from JMS to STOMP will be corrupted

Reply via email to