[ https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971341#comment-16971341 ]
Yifan Cai edited comment on CASSANDRA-15350 at 11/11/19 6:02 PM: ----------------------------------------------------------------- -The change of calculating the exact size of UTF-8 string has negative performance impact. It needs to iterate through the entire string to determine the actual size in UTF-8. - The previous [benchmark setup|https://issues.apache.org/jira/secure/attachment/12985494/Utf8StringEncodeBench.java] was wrong. For the cases of writing with exact size, `reserveAndWriteUtf8` should be called to avoid resizing the buffer. I have refined the benchmarks and introduced 2 new ones that leverage the encodeSize from the previous step. The result shows performance improvement. {code:java} [java] Benchmark Mode Cnt Score Error Units [java] Utf8StringEncodeBench.writeLongText avgt 6 571.949 ± 19.791 ns/op [java] Utf8StringEncodeBench.writeLongTextWithExactSize avgt 6 459.932 ± 27.790 ns/op [java] Utf8StringEncodeBench.writeLongTextWithExactSizeSkipCalc avgt 6 216.085 ± 3.480 ns/op [java] Utf8StringEncodeBench.writeShortText avgt 6 62.775 ± 6.159 ns/op [java] Utf8StringEncodeBench.writeShortTextWithExactSize avgt 6 44.071 ± 5.645 ns/op [java] Utf8StringEncodeBench.writeShortTextWithExactSizeSkipCalc avgt 6 36.358 ± 5.135 ns/op {code} * writeLongText: the original implementation that calls `ByteBufUtils.writeUtf8`. It over-estimates the size of string that causes resizing the buffer. * writeLongTextWithExactSize: calls `TypeSizes.encodeUTF8Length` to reserve the exact size of bytes to write. * writeLongTextWithExactSizeSkipCalc: optimize by removing calculating the UTF8 length. Because we calculated the encodeSize before encode for messages. Therefore, the size of the final bytes is known, we can leverage this information to just reserve using the remaining capacity. was (Author: yifanc): The change of calculating the exact size of UTF-8 string has negative performance impact. It needs to iterate through the entire string to determine the actual size in UTF-8. The [benchmark setup|https://issues.apache.org/jira/secure/attachment/12985494/Utf8StringEncodeBench.java] and the result: {code:java} [java] Benchmark Mode Cnt Score Error Units [java] Utf8StringEncodeBench.writeLongText avgt 6 552.458 ± 9.141 ns/op [java] Utf8StringEncodeBench.writeLongTextWithExactSize avgt 6 787.676 ± 120.057 ns/op [java] Utf8StringEncodeBench.writeShortText avgt 6 70.311 ± 8.031 ns/op [java] Utf8StringEncodeBench.writeShortTextWithExactSize avgt 6 71.716 ± 4.790 ns/op {code} I will revert the change. > Add CAS “uncertainty” and “contention" messages that are currently propagated > as a WriteTimeoutException. > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-15350 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15350 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Lightweight Transactions > Reporter: Alex Petrov > Assignee: Yifan Cai > Priority: Normal > Labels: protocolv5, pull-request-available > Attachments: Utf8StringEncodeBench.java > > Time Spent: 20m > Remaining Estimate: 0h > > Right now, CAS uncertainty introduced in > https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as > WriteTimeout. One of this conditions it manifests is when there’s at least > one acceptor that has accepted the value, which means that this value _may_ > still get accepted during the later round, despite the proposer failure. > Similar problem happens with CAS contention, which is also indistinguishable > from the “regular” timeout, even though it is visible in metrics correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org