yifan-c opened a new pull request, #36:
URL: https://github.com/apache/cassandra-analytics/pull/36

   … BatchSize options
   
   In cassandra-all:4.0.12, improvements were made for the CQLSSTableWriter. 
The sorted writer now can produce size-capped SSTables. It replaces the need 
for the unsorted sstable writer, which has to buffer and sort data on flushing. 
The dataset to write in the spark application is already sorted. By avoiding 
using the unsorted writer, it prevents wasting CPU time on sorting the sorted 
data. Since the sorted sstable writer does not need to buffer data, its size 
estimation is more accurate than the unsorted one, meaning the produced 
sstables files are closer to the expectation.
   
   By removing the unsorted sstable writer, it no longer requires the 
RowBufferMode option. By supporting size-capping in sorted writer, it no longer 
requires the BatchSize option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to