[ https://issues.apache.org/jira/browse/CASSANDRA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019402#comment-16019402 ]
Stefania commented on CASSANDRA-13120: -------------------------------------- To clarify further, the code before 8099 did not increment the number of sstables iterated if the sstable iterator had a [null column family|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/CollationController.java#L131], which is the equivalent of {{SSTableReader.getPosition()}} returning null. Therefore, sstables excluded by the BF or a failed partition lookup were not considered. This patch will restore this behavior. > Trace and Histogram output misleading > ------------------------------------- > > Key: CASSANDRA-13120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13120 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Adam Hattrell > Assignee: Benjamin Lerer > Priority: Minor > > If we look at the following output: > {noformat} > [centos@cassandra-c-3]$ nodetool getsstables -- keyspace table > 60ea4399-6b9f-4419-9ccb-ff2e6742de10 > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647146-big-Data.db > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647147-big-Data.db > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647145-big-Data.db > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647152-big-Data.db > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647157-big-Data.db > /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-648137-big-Data.db > {noformat} > We can see that this key value appears in just 6 sstables. However, when we > run a select against the table and key we get: > {noformat} > Tracing session: a6c81330-d670-11e6-b00b-c1d403fd6e84 > activity > | timestamp | source > | source_elapsed > -------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+---------------- > > Execute CQL3 query | 2017-01-09 13:36:40.419000 | > 10.200.254.141 | 0 > Parsing SELECT * FROM keyspace.table WHERE id = > 60ea4399-6b9f-4419-9ccb-ff2e6742de10; [SharedPool-Worker-2] | > 2017-01-09 13:36:40.419000 | 10.200.254.141 | 104 > > Preparing statement [SharedPool-Worker-2] | 2017-01-09 13:36:40.419000 | > 10.200.254.141 | 220 > Executing single-partition query on > table [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | > 10.200.254.141 | 450 > Acquiring > sstable references [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | > 10.200.254.141 | 477 > Bloom filter allows skipping > sstable 648146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | > 10.200.254.141 | 496 > Bloom filter allows skipping > sstable 648145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 503 > Key cache hit for > sstable 648140 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 513 > Bloom filter allows skipping > sstable 648135 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 520 > Bloom filter allows skipping > sstable 648130 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 526 > Bloom filter allows skipping > sstable 648048 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 530 > Bloom filter allows skipping > sstable 647749 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 535 > Bloom filter allows skipping > sstable 647404 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 540 > Key cache hit for > sstable 647145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 548 > Key cache hit for > sstable 647146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | > 10.200.254.141 | 556 > Key cache hit for > sstable 647147 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 564 > Bloom filter allows skipping > sstable 647148 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 570 > Bloom filter allows skipping > sstable 647149 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 575 > Bloom filter allows skipping > sstable 647150 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 580 > Bloom filter allows skipping > sstable 647151 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 585 > Key cache hit for > sstable 647152 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 591 > Bloom filter allows skipping > sstable 647153 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 597 > Bloom filter allows skipping > sstable 647154 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 601 > Bloom filter allows skipping > sstable 647155 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 606 > Bloom filter allows skipping > sstable 647156 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 611 > Key cache hit for > sstable 647157 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | > 10.200.254.141 | 617 > Bloom filter allows skipping > sstable 647158 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | > 10.200.254.141 | 623 > Skipped 0/22 non-slice-intersecting sstables, included 0 > due to tombstones [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | > 10.200.254.141 | 644 > Merging data from memtables > and 22 sstables [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | > 10.200.254.141 | 654 > Read 9 live and 0 > tombstone cells [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | > 10.200.254.141 | 732 > > Request complete | 2017-01-09 13:36:40.419808 | > 10.200.254.141 | 808 > {noformat} > You'll note we claim not to have skipped any files due to bloom filters - > even though we know the data is only in 6 files. > CFHistograms also report that we're hitting every sstables: > {noformat} > Percentile SSTables Write Latency Read Latency Partition Size Cell Count > (micros) (micros) (bytes) > 50% 24.00 14.24 182.79 103 1 > 75% 24.00 17.08 315.85 149 2 > 95% 24.00 20.50 7007.51 372 7 > 98% 24.00 24.60 10090.81 642 12 > 99% 24.00 29.52 12108.97 770 14 > Min 21.00 3.31 29.52 43 0 > Max 29.00 1358.10 62479.63 1597 35 > {noformat} > Code for the read is here: > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L561 > We seem to iterate over all the sstables and increment the metric as part of > that iteration. > Either the reporting is incorrect - or we should maybe check the bloom > filters first and then iterate the tombstones after? > In this particular case we were using TWCS which makes the problem more > apparent. TWCS guarantees that we'll keep more sstables in an un-merged > state. With STCS we have to search them all, but most of them should be > merged together if Compaction is keeping up. LCS the read path is restricted > which will disguise the impact. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org