[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140439#comment-14140439 ]
Dominic Evans commented on PROTON-576: -------------------------------------- [~gemmellr] not sure what machine I ran the previous benchmarks on, but here's the results from my desktop just now comparing a vanilla qpid-proton-0.7 with one that has had my patch applied: {code:title=# Results|borderStyle=solid} Benchmark Mode Samples Mean Mean error Units t.TestStringType.testUsingPatchedProton thrpt 4096 7.270 0.031 ops/ms t.TestStringType.testUsingProton07 thrpt 4096 9.196 0.015 ops/ms {code} As you can see there is a small drop in ~2 ops/ms, but the performance is still reasonably close to what it was before, and I'm not sure we can really consider a benchmark of the previously incomplete UTF-8 parser as being a valid performance comparison anyway :) I could slightly optimise the isSurrogatePair checking to only call string.charAt(i+1) if the current char is a highSurrogate value. I'll see if that brings us any closer. > proton-j: codec support for UTF-8 encoding and decoding appears broken? > ----------------------------------------------------------------------- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j > Affects Versions: 0.7 > Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.3.4#6332)