[ https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148394#comment-15148394 ]
Stefania commented on CASSANDRA-11053: -------------------------------------- I have experimented with CPU affinity by pinning each worker process to a core but whist this gave slightly better results locally, on AWS it actually made it worst. I've also examined the {{strace}} output locally and the most frequent system calls are {{futex, read, write and poll}}. To reduce contention I've replaced the python queue with multiple point-to-point pipes (the queue was implemented over a single pipe with interprocess locks). I didn't see much improvement locally but perhaps on AWS it matters more since locally I can only run 2 worker processes or I max out the cluster that also runs locally. By removing the Python queue I was also able to remove one thread, which in Python is a good thing due to the GIL (Global Interpreter Lock). I plan to test this implementation on AWS, together with an additional suggestion to increase time slicing ({{schedtool -B}}), then if everything works as expected I will move the ticket to patch available. It's worth noting that the driver doesn't coalesce messages on the socket at present. This could be detrimental on virtualized environments like AWS, especially if [enhanced networking|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#other-linux-enhanced-networking-instance-store] is not available. However we would probably need to worry about this once our encoding functions are faster, at the moment the bottleneck is still encoding so I would leave this for a future ticket. > COPY FROM on large datasets: fix progress report and debug performance > ---------------------------------------------------------------------- > > Key: CASSANDRA-11053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11053 > Project: Cassandra > Issue Type: Bug > Components: Tools > Reporter: Stefania > Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: copy_from_large_benchmark.txt, > copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, > worker_profiles.txt, worker_profiles_2.txt > > > Running COPY from on a large dataset (20G divided in 20M records) revealed > two issues: > * The progress report is incorrect, it is very slow until almost the end of > the test at which point it catches up extremely quickly. > * The performance in rows per second is similar to running smaller tests with > a smaller cluster locally (approx 35,000 rows per second). As a comparison, > cassandra-stress manages 50,000 rows per second under the same set-up, > therefore resulting 1.5 times faster. > See attached file _copy_from_large_benchmark.txt_ for the benchmark details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)