[ https://issues.apache.org/jira/browse/CASSANDRA-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194433#comment-14194433 ]
Sylvain Lebresne commented on CASSANDRA-8225: --------------------------------------------- For what it's worth, I do think we should go with bulk streaming right away. I see no particular point in having multiple code to do bulk loading and so I think we have everything to win in standardizing on only one as soon as possible. Also, since we already have CQLSSTableWriter and sstableloader, I don't think a simple csvloader command line tool (that cqlsh COPY FROM would use) is that much effort either, and so I'm not fan of "stalling" by improving the cqlsh code. > Production-capable COPY FROM > ---------------------------- > > Key: CASSANDRA-8225 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8225 > Project: Cassandra > Issue Type: New Feature > Components: Tools > Reporter: Jonathan Ellis > Fix For: 2.1.2 > > > Via [~schumacr], > bq. I pulled down a sourceforge data generator and created a moc file of > 500,000 rows that had an incrementing sequence number, date, and SSN. I then > used our COPY command and MySQL's LOAD DATA INFILE to load the file on my > Mac. Results were: > {noformat} > mysql> load data infile '/Users/robin/dev/datagen3.txt' into table p_test > fields terminated by ','; > Query OK, 500000 rows affected (2.18 sec) > {noformat} > C* 2.1.0 (pre-CASSANDRA-7405) > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > 500000 rows imported in 16 minutes and 45.485 seconds. > {noformat} > Cassandra 2.1.1: > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > Processed 500000 rows; Write: 4037.46 rows/s > 500000 rows imported in 2 minutes and 3.058 seconds. > {noformat} > [jbellis] 7405 gets us almost an order of magnitude improvement. > Unfortunately we're still almost 2 orders slower than mysql. > I don't think we can continue to tell people, "use sstableloader instead." > The number of users sophisticated enough to use the sstable writers is small > and (relatively) decreasing as our user base expands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)