[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535478#comment-16535478 ]
Benedict commented on CASSANDRA-7282: ------------------------------------- I've posted a slightly modified gist [here|https://gist.github.com/belliottsmith/b559b6a09882009fb229e1485dcb7ac4] This simply ensures that the tests for each algorithm run identical operations (by ensuring we use a consistent seed across forks/runs), randomises a larger number of search keys, introduces a new polluteCache parameter that reduces cache occupancy by generating many tests, and walking through them linearly for each invocation. I also fixed a couple of minor bugs - or at least inconsistencies in logic. It also introduces two new variants: A simpler (IMO) binary search variant, using some old code of mine from [here|https://github.com/belliottsmith/jjoost/blob/master/jjoost-base/src/org/jjoost/util/order/IntOrder.java], and a Van Emde Boas array layout, that should exhibit the best cache behaviour, by netting the possible benefits of a tree (highest level nodes retaining cache occupancy across operations) with those of the array (much smaller footprint). In reality there are better still variants, but the project is a long way from benefitting from those, here, I expect. ||Benchmark||(polluteCache)||(vnodes)||Mode||Cnt||Score Error Units|| |BSTBench.multiArrayBinarySearchFloor|1|4|thrpt|16|12065.122 ± 401.339 ops/ms| |BSTBench.multiArrayBinarySearchFloor|1|64|thrpt|16|6292.591 ± 135.290 ops/ms| |BSTBench.multiArrayBinarySearchFloor|1|512|thrpt|16|4702.458 ± 54.880 ops/ms| |BSTBench.multiArrayBinarySearchFloor|4096|4|thrpt|16|4691.372 ± 151.983 ops/ms| |BSTBench.multiArrayBinarySearchFloor|4096|64|thrpt|16|4218.482 ± 145.470 ops/ms| |BSTBench.multiArrayBinarySearchFloor|4096|512|thrpt|16|3737.212 ± 110.495 ops/ms| |BSTBench.multiArrayBinarySeek|1|4|thrpt|16|12657.483 ± 266.605 ops/ms| |BSTBench.multiArrayBinarySeek|1|64|thrpt|16|7002.695 ± 105.505 ops/ms| |BSTBench.multiArrayBinarySeek|1|512|thrpt|16|4339.367 ± 235.567 ops/ms| |BSTBench.multiArrayBinarySeek|4096|4|thrpt|16|4517.014 ± 94.628 ops/ms| |BSTBench.multiArrayBinarySeek|4096|64|thrpt|16|3984.565 ± 173.088 ops/ms| |BSTBench.multiArrayBinarySeek|4096|512|thrpt|16|3682.319 ± 125.062 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|1|4|thrpt|16|15872.360 ± 390.277 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|1|64|thrpt|16|7285.622 ± 100.053 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|1|512|thrpt|16|4568.883 ± 603.805 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|4096|4|thrpt|16|5217.134 ± 12.719 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|4096|64|thrpt|16|4978.465 ± 55.890 ops/ms| |BSTBench.multiArrayVanEmdeBoasLayout|4096|512|thrpt|16|4361.356 ± 240.606 ops/ms| |BSTBench.treeSeek|1|4|thrpt|16|14766.468 ± 134.707 ops/ms| |BSTBench.treeSeek|1|64|thrpt|16|6400.449 ± 42.273 ops/ms| |BSTBench.treeSeek|1|512|thrpt|16|4116.524 ± 102.472 ops/ms| |BSTBench.treeSeek|4096|4|thrpt|16|5132.769 ± 183.395 ops/ms| |BSTBench.treeSeek|4096|64|thrpt|16|4003.251 ± 69.857 ops/ms| |BSTBench.treeSeek|4096|512|thrpt|16|3458.094 ± 17.092 ops/ms| > Faster Memtable map > ------------------- > > Key: CASSANDRA-7282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Michael Burman > Priority: Major > Labels: performance > Fix For: 4.x > > Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, > run1.svg, writes.svg > > > Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in > our memtables. Maintaining this is an O(lg(n)) operation; since the vast > majority of users use a hash partitioner, it occurs to me we could maintain a > hybrid ordered list / hash map. The list would impose the normal order on the > collection, but a hash index would live alongside as part of the same data > structure, simply mapping into the list and permitting O(1) lookups and > inserts. > I've chosen to implement this initial version as a linked-list node per item, > but we can optimise this in future by storing fatter nodes that permit a > cache-line's worth of hashes to be checked at once, further reducing the > constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org