[ 
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148394#comment-15148394
 ] 

Stefania commented on CASSANDRA-11053:
--------------------------------------

I have experimented with CPU affinity by pinning each worker process to a core 
but whist this gave slightly better results locally, on AWS it actually made it 
worst. 

I've also examined the {{strace}} output locally and the most frequent system 
calls are {{futex, read, write and poll}}. To reduce contention I've replaced 
the python queue with multiple point-to-point pipes (the queue was implemented 
over a single pipe with interprocess locks). I didn't see much improvement 
locally but perhaps on AWS it matters more since locally I can only run 2 
worker processes or I max out the cluster that also runs locally. By removing 
the Python queue I was also able to remove one thread, which in Python is a 
good thing due to the GIL (Global Interpreter Lock).

I plan to test this implementation on AWS, together with an additional 
suggestion to increase time slicing ({{schedtool -B}}), then if everything 
works as expected I will move the ticket to patch available.

It's worth noting that the driver doesn't coalesce messages on the socket at 
present. This could be detrimental on virtualized environments like AWS, 
especially if [enhanced 
networking|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#other-linux-enhanced-networking-instance-store]
 is not available. However we would probably need to worry about this once our 
encoding functions are faster, at the moment the bottleneck is still encoding 
so I would leave this for a future ticket.

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, 
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, 
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed 
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of 
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with 
> a smaller cluster locally (approx 35,000 rows per second). As a comparison, 
> cassandra-stress manages 50,000 rows per second under the same set-up, 
> therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to