[ 
https://issues.apache.org/jira/browse/CASSANDRA-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101519#comment-14101519
 ] 

Aleksey Yeschenko commented on CASSANDRA-7405:
----------------------------------------------

Maybe this should be handled separately, in another ticket, but there are a few 
more things we could optimize (all import related):

1. If we assume that a significant subset of COPY FROM csv's are going to be 
results of COPY TO command, then rows will be grouped by the partition key. In 
that case we'd win from batching (until another partition key is met, and 
constrained by some limit of rows per batch, we don't want huge batches)
2. Additionally we could switch to prepared statements for writes (assuming 
that python serialization cost wouldn't outweigh the server-side benefits). 
It's a bit involved though, but may be worth it.

Should also prepare the SELECT, really - it doesn't win us a lot, but it is a 
trivial change, so probably worth it.


> Optimize cqlsh COPY TO and COPY FROM
> ------------------------------------
>
>                 Key: CASSANDRA-7405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7405
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Mikhail Stepura
>             Fix For: 2.1.1
>
>         Attachments: CASSANDRA-2.1-7405.patch
>
>
> Now that we are using native proto via python-driver, we can, and should, at 
> the very least:
> 1. Use proto paging in COPY TO
> 2. Use async writes in COPY FROM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to