[ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631108#comment-14631108 ]
Benedict commented on CASSANDRA-8894: ------------------------------------- A few comments on the stress testing: * The blob_id population doesn't need to be constrained (it defaults to something like 1..100B) * To perform the inserts, we want to ensure we construct a dataset large enough to spill to disk, i.e. we want to probably insert at least 100M items (perhaps 200M+) if they're only ~50 bytes each. * We probably want to run with slightly more threads, say 300 The graphs don't appear to actually be broken that were produced: the stress run was simply extremely brief, since it only operated over 100K items :) At risk of sounding like a broken record to everyone, it can help to use K, M, B syntax for your numbers in the profile/command line. > Our default buffer size for (uncompressed) buffered reads should be smaller, > and based on the expected record size > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-8894 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Labels: benedict-to-commit > Fix For: 3.x > > Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml > > > A large contributor to slower buffered reads than mmapped is likely that we > read a full 64Kb at once, when average record sizes may be as low as 140 > bytes on our stress tests. The TLB has only 128 entries on a modern core, and > each read will touch 32 of these, meaning we are unlikely to almost ever be > hitting the TLB, and will be incurring at least 30 unnecessary misses each > time (as well as the other costs of larger than necessary accesses). When > working with an SSD there is little to no benefit reading more than 4Kb at > once, and in either case reading more data than we need is wasteful. So, I > propose selecting a buffer size that is the next larger power of 2 than our > average record size (with a minimum of 4Kb), so that we expect to read in one > operation. I also propose that we create a pool of these buffers up-front, > and that we ensure they are all exactly aligned to a virtual page, so that > the source and target operations each touch exactly one virtual page per 4Kb > of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)