[ 
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633019#comment-14633019
 ] 

Stefania commented on CASSANDRA-8894:
-------------------------------------

[~benedict] thanks for your review comments. I've applied them in this [latest 
commit|https://github.com/stef1927/cassandra/commit/4b9ae3f08102cde5b72b596312662d1e6c390612],
 which is also duplicated on the pre-8099 branch and available for performance 
testing. In addition to your comments, I've added a couple of unit tests and 
changed the *>* to *>=* when determining whether to add one page. This is so 
that if the page cross chance is zero we always add one page even at the 
boundaries (record size is a multiple of a page size). If you want to override 
this during commit, that's fine but you need to change the unit tests expected 
values too. Because of this, I've added fuzzy comparison for doubles, using 
epsilon of 10^-16.

I've updated the test files to remove the restriction on the population of the 
partition id, I had no idea the default was so big. Your two other comments on 
number of operations and threads are well noted. I was planning on using bigger 
number of operations, the small number was just to test the platform, however I 
was unsure on the number of threads. And yes I will use syntax such as 100M 
from now on. :)

Unfortunately cstar_perf is not available at the moment, tests are getting 
stuck and fail to progress on both blade_11 and blade_11_b, cc [~enigmacurry].


> Our default buffer size for (uncompressed) buffered reads should be smaller, 
> and based on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.x
>
>         Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
>
>
> A large contributor to slower buffered reads than mmapped is likely that we 
> read a full 64Kb at once, when average record sizes may be as low as 140 
> bytes on our stress tests. The TLB has only 128 entries on a modern core, and 
> each read will touch 32 of these, meaning we are unlikely to almost ever be 
> hitting the TLB, and will be incurring at least 30 unnecessary misses each 
> time (as well as the other costs of larger than necessary accesses). When 
> working with an SSD there is little to no benefit reading more than 4Kb at 
> once, and in either case reading more data than we need is wasteful. So, I 
> propose selecting a buffer size that is the next larger power of 2 than our 
> average record size (with a minimum of 4Kb), so that we expect to read in one 
> operation. I also propose that we create a pool of these buffers up-front, 
> and that we ensure they are all exactly aligned to a virtual page, so that 
> the source and target operations each touch exactly one virtual page per 4Kb 
> of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to