[ https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061842#comment-15061842 ]
Sylvain Lebresne commented on CASSANDRA-9258: --------------------------------------------- bq. Sylvain, are you happy with the provided data or do you insist on writing a jmh bench for regression testing? I think we still should write a jmh regression test for, well, regression testing, but also because it'll give us a good baseline for future improvements. That said, the provided data (and general reasoning) is convincing enough that this is a net improvement that if no-one has time to write that jmh bench shortly, I won't hold back on committing since this is potentially serious for some users. Still, let the record show that I dislike the idea of postponing the write of such regression test because history so far shows that this is codename for "never getting it done". > Range movement causes CPU & performance impact > ---------------------------------------------- > > Key: CASSANDRA-9258 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9258 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.4 > Reporter: Rick Branson > Assignee: Dikang Gu > Fix For: 2.1.x > > Attachments: 0001-pending-ranges-map.patch, Screenshot 2015-12-16 > 16.11.36.png, Screenshot 2015-12-16 16.11.51.png > > > Observing big CPU & latency regressions when doing range movements on > clusters with many tens of thousands of vnodes. See CPU usage increase by > ~80% when a single node is being replaced. > Top methods are: > 1) Ljava/math/BigInteger;.compareTo in > Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo > 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in > Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next > 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in > Lorg/apache/cassandra/dht/Range;.contains > Here's a sample stack from a thread dump: > {code} > "Thrift:50673" daemon prio=10 tid=0x00007f2f20164800 nid=0x3a04af runnable > [0x00007f2d878d0000] > java.lang.Thread.State: RUNNABLE > at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260) > at org.apache.cassandra.dht.Range.contains(Range.java:51) > at org.apache.cassandra.dht.Range.contains(Range.java:110) > at > org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916) > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775) > at > org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541) > at > org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616) > at > org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101) > at > org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083) > at > org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)