yifan-c opened a new pull request, #36: URL: https://github.com/apache/cassandra-analytics/pull/36
… BatchSize options In cassandra-all:4.0.12, improvements were made for the CQLSSTableWriter. The sorted writer now can produce size-capped SSTables. It replaces the need for the unsorted sstable writer, which has to buffer and sort data on flushing. The dataset to write in the spark application is already sorted. By avoiding using the unsorted writer, it prevents wasting CPU time on sorting the sorted data. Since the sorted sstable writer does not need to buffer data, its size estimation is more accurate than the unsorted one, meaning the produced sstables files are closer to the expectation. By removing the unsorted sstable writer, it no longer requires the RowBufferMode option. By supporting size-capping in sorted writer, it no longer requires the BatchSize option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org