Stefania created CASSANDRA-11053: ------------------------------------ Summary: COPY FROM on large datasets: fix progress report and debug performance Key: CASSANDRA-11053 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Stefania Assignee: Stefania Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x Attachments: copy_from_large_benchmark.txt
Running COPY from on a large dataset (20G divided in 20M records) revealed two issues: * The progress report is incorrect, it is very slow until almost the end of the test at which point it catches up extremely quickly. * The performance in rows per second is similar to running smaller tests with a smaller cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages 50,000 rows per second under the same set-up, therefore resulting 1.5 times faster. See attached file _copy_from_large_benchmark.txt_ for the benchmark details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)