[ https://issues.apache.org/jira/browse/CASSANDRA-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Cai updated CASSANDRA-19334: ---------------------------------- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/e0ae9d7484e242f6af495aac2cb4d8dc121fba89 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Upgrade to Cassandra 4.0.12 and remove RowBufferMode and > BatchSize options > -------------------------------------------------------------------------------------- > > Key: CASSANDRA-19334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19334 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > Fix For: NA > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In cassandra-all:4.0.12, improvements were made for the CQLSSTableWriter. The > sorted writer now can produce size-capped SSTables. It replaces the need for > the unsorted sstable writer, which has to buffer and sort data on flushing. > The dataset to write in the spark application is already sorted. By avoiding > using the unsorted writer, it prevents wasting CPU time on sorting the sorted > data. Since the sorted sstable writer does not need to buffer data, its size > estimation is more accurate than the unsorted one, meaning the produced > sstables files are closer to the expectation. > By removing the unsorted sstable writer, it no longer requires the > RowBufferMode option. > By supporting size-capping in sorted writer, it no longer requires the > BatchSize option. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org