[ https://issues.apache.org/jira/browse/CASSANDRA-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101519#comment-14101519 ]
Aleksey Yeschenko commented on CASSANDRA-7405: ---------------------------------------------- Maybe this should be handled separately, in another ticket, but there are a few more things we could optimize (all import related): 1. If we assume that a significant subset of COPY FROM csv's are going to be results of COPY TO command, then rows will be grouped by the partition key. In that case we'd win from batching (until another partition key is met, and constrained by some limit of rows per batch, we don't want huge batches) 2. Additionally we could switch to prepared statements for writes (assuming that python serialization cost wouldn't outweigh the server-side benefits). It's a bit involved though, but may be worth it. Should also prepare the SELECT, really - it doesn't win us a lot, but it is a trivial change, so probably worth it. > Optimize cqlsh COPY TO and COPY FROM > ------------------------------------ > > Key: CASSANDRA-7405 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7405 > Project: Cassandra > Issue Type: Improvement > Reporter: Aleksey Yeschenko > Assignee: Mikhail Stepura > Fix For: 2.1.1 > > Attachments: CASSANDRA-2.1-7405.patch > > > Now that we are using native proto via python-driver, we can, and should, at > the very least: > 1. Use proto paging in COPY TO > 2. Use async writes in COPY FROM -- This message was sent by Atlassian JIRA (v6.2#6252)