[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Benedict (JIRA) Fri, 06 Jul 2018 16:34:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535478#comment-16535478
 ]


Benedict commented on CASSANDRA-7282:
-------------------------------------

I've posted a slightly modified gist 
[here|https://gist.github.com/belliottsmith/b559b6a09882009fb229e1485dcb7ac4]

This simply ensures that the tests for each algorithm run identical operations 
(by ensuring we use a consistent seed across forks/runs), randomises a larger 
number of search keys, introduces a new polluteCache parameter that reduces 
cache occupancy by generating many tests, and walking through them linearly for 
each invocation.

I also fixed a couple of minor bugs - or at least inconsistencies in logic.

It also introduces two new variants: A simpler (IMO) binary search variant, 
using some old code of mine from 
[here|https://github.com/belliottsmith/jjoost/blob/master/jjoost-base/src/org/jjoost/util/order/IntOrder.java],
 and a Van Emde Boas array layout, that should exhibit the best cache 
behaviour, by netting the possible benefits of a tree (highest level nodes 
retaining cache occupancy across operations) with those of the array (much 
smaller footprint).  In reality there are better still variants, but the 
project is a long way from benefitting from those, here, I expect.

 
||Benchmark||(polluteCache)||(vnodes)||Mode||Cnt||Score Error Units||
|BSTBench.multiArrayBinarySearchFloor|1|4|thrpt|16|12065.122 ± 401.339 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|1|64|thrpt|16|6292.591 ± 135.290 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|1|512|thrpt|16|4702.458 ± 54.880 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|4|thrpt|16|4691.372 ± 151.983 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|64|thrpt|16|4218.482 ± 145.470 
ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|512|thrpt|16|3737.212 ± 110.495 
ops/ms|
|BSTBench.multiArrayBinarySeek|1|4|thrpt|16|12657.483 ± 266.605 ops/ms|
|BSTBench.multiArrayBinarySeek|1|64|thrpt|16|7002.695 ± 105.505 ops/ms|
|BSTBench.multiArrayBinarySeek|1|512|thrpt|16|4339.367 ± 235.567 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|4|thrpt|16|4517.014 ± 94.628 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|64|thrpt|16|3984.565 ± 173.088 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|512|thrpt|16|3682.319 ± 125.062 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|4|thrpt|16|15872.360 ± 390.277 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|64|thrpt|16|7285.622 ± 100.053 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|512|thrpt|16|4568.883 ± 603.805 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|4|thrpt|16|5217.134 ± 12.719 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|64|thrpt|16|4978.465 ± 55.890 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|512|thrpt|16|4361.356 ± 240.606 
ops/ms|
|BSTBench.treeSeek|1|4|thrpt|16|14766.468 ± 134.707 ops/ms|
|BSTBench.treeSeek|1|64|thrpt|16|6400.449 ± 42.273 ops/ms|
|BSTBench.treeSeek|1|512|thrpt|16|4116.524 ± 102.472 ops/ms|
|BSTBench.treeSeek|4096|4|thrpt|16|5132.769 ± 183.395 ops/ms|
|BSTBench.treeSeek|4096|64|thrpt|16|4003.251 ± 69.857 ops/ms|
|BSTBench.treeSeek|4096|512|thrpt|16|3458.094 ± 17.092 ops/ms|

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Michael Burman
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Reply via email to