[ https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171292#comment-15171292 ]
Stefania commented on CASSANDRA-11053: -------------------------------------- bq. I've asked offline regarding the target version, hopefully we'll know soon. >From offline discussions it seems this patch can go into 2.1 provided the risk >is not too high. bq. I could build in another deserializer for BytesType that returns a bytearray. This would be helpful for 2.2 and 3.0 since for 2.1 we shouldn't upgrade the driver from 2.7.2 to 3.0 and for trunk we should keep the formatting changes, see next point. bq. I think I favor the cql type interpretation despite the complexity for one reason: this decouples formatting from driver return values. I agree but I prefer not to have these changes in older releases if they are not necessary for COPY FROM performance. Therefore I've opened CASSANDRA-11274 to deliver these changes only on trunk. The formatting changes have also been removed from the main branch and the {{-no-formatting}} branch has been deleted. The old branch however still exists with the postfix {{-with-formatting}}. bq. I generally err on the side of caution. Reasonable limits would prevent someone from inadvertently crushing a server with a basic command. The command options make it easy enough to dial up for big load operations. It makes sense, I've reverted both values and fixed a spacing problem in the options documentation. > COPY FROM on large datasets: fix progress report and debug performance > ---------------------------------------------------------------------- > > Key: CASSANDRA-11053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11053 > Project: Cassandra > Issue Type: Bug > Components: Tools > Reporter: Stefania > Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: copy_from_large_benchmark.txt, > copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, > worker_profiles.txt, worker_profiles_2.txt > > > Running COPY from on a large dataset (20G divided in 20M records) revealed > two issues: > * The progress report is incorrect, it is very slow until almost the end of > the test at which point it catches up extremely quickly. > * The performance in rows per second is similar to running smaller tests with > a smaller cluster locally (approx 35,000 rows per second). As a comparison, > cassandra-stress manages 50,000 rows per second under the same set-up, > therefore resulting 1.5 times faster. > See attached file _copy_from_large_benchmark.txt_ for the benchmark details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)