[ https://issues.apache.org/jira/browse/CASSANDRA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-6134: ----------------------------------------- Reviewer: (was: Aleksey Yeschenko) > More efficient BatchlogManager > ------------------------------ > > Key: CASSANDRA-6134 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6134 > Project: Cassandra > Issue Type: Improvement > Reporter: Oleg Anastasyev > Assignee: Oleg Anastasyev > Priority: Minor > Attachments: BatchlogManager.txt > > > As we discussed earlier in CASSANDRA-6079 this is the new BatchManager. > It stores batch records in > {code} > CREATE TABLE batchlog ( > id_partition int, > id timeuuid, > data blob, > PRIMARY KEY (id_partition, id) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (id DESC) > {code} > where id_partition is minute-since-epoch of id uuid. > So when it scans for batches to replay ot scans within a single partition for > a slice of ids since last processed date till now minus write timeout. > So no full batchlog CF scan and lot of randrom reads are made on normal > cycle. > Other improvements: > 1. It runs every 1/2 of write timeout and replays all batches written within > 0.9 * write timeout from now. This way we ensure, that batched updates will > be replayed to th moment client times out from coordinator. > 2. It submits all mutations from single batch in parallel (Like StorageProxy > do). Old implementation played them one-by-one, so client can see half > applied batches in CF for a long time (depending on size of batch). > 3. It fixes a subtle racing bug with incorrect hint ttl calculation -- This message was sent by Atlassian JIRA (v6.1.5#6160)