[ 
https://issues.apache.org/jira/browse/CASSANDRA-19494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19494:
-----------------------------------
    Resolution: Duplicate
        Status: Resolved  (was: Triage Needed)

Will be resolved as part of CASSANDRA-15452, very exciting.

> Optimize I/O during table scans
> -------------------------------
>
>                 Key: CASSANDRA-19494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19494
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>            Priority: Normal
>         Attachments: reads.txt
>
>
> The storage engine reads chunk by chunk during table scans.  We'd be much 
> better off if we could perform larger I/O operations to an internal buffer, 
> perform fewer I/O operations, and avoid making excessive system calls.
> For example, doing a scan against this table:
> {noformat}
> CREATE TABLE easy_cass_stress.keyvalue (
>     key text PRIMARY KEY,
>     value text
> ) WITH additional_write_policy = '99p'
>     AND allow_auto_snapshot = true
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND cdc = false
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND memtable = 'default'
>     AND crc_check_chance = 1.0
>     AND default_time_to_live = 0
>     AND extensions = {}
>     AND gc_grace_seconds = 864000
>     AND incremental_backups = true
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair = 'BLOCKING'
>     AND speculative_retry = '99p';{noformat}
> I see the following I/O activity (sample only, see attachment for full 
> accounting of all reads)
>  
> {noformat}
> TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
> 16:59:23 ReadStage-2    2523   R 15051   0           0.02 nb-6-big-Data.db
> 16:59:23 ReadStage-2    2523   R 15049   0           0.01 nb-8-big-Data.db
> 16:59:23 ReadStage-2    2523   R 15025   0           0.01 nb-5-big-Data.db
> 16:59:23 ReadStage-2    2523   R 15064   0           0.01 nb-7-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15051   0           0.01 nb-6-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15049   0           0.01 nb-8-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15025   0           0.01 nb-5-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15064   0           0.00 nb-7-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15064   14          0.01 nb-5-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15051   0           0.01 nb-6-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15049   0           0.00 nb-8-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15064   14          0.00 nb-5-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15064   0           0.00 nb-7-big-Data.db
> 16:59:25 ReadStage-2    2523   R 15012   29          0.01 
> nb-5-big-Data.db{noformat}
> with a sample of our off-cpu time looking like this (after dropping caches)
> {noformat}
> cpudist -O -p $(cassandra-pid) -m 1 30
>      msecs               : count     distribution
>          0 -> 1          : 5259     |****************************************|
>          2 -> 3          : 486      |***                                     |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 1        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 29       |                                        |
>         64 -> 127        : 77       |                                        |
>        128 -> 255        : 4        |                                        |
>        256 -> 511        : 6        |                                        |
>        512 -> 1023       : 6        |                                        
> |{noformat}
> We pay a pretty serious throughput penalty for excessive I/O.  
> We should be able to leverage the work in CASSANDRA-15452 for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to