[ https://issues.apache.org/jira/browse/CASSANDRA-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191685#comment-14191685 ]
Aleksey Yeschenko commented on CASSANDRA-8225: ---------------------------------------------- Maybe I'm just too used to COPY being a development tool. Anyway, I do agree that 'we should beef up our data loading story". I don't necessarily agree with COPY FROM being the proper way to do it, though, and I do recall conversations about using Spark for the job in some way. > Production-capable COPY FROM > ---------------------------- > > Key: CASSANDRA-8225 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8225 > Project: Cassandra > Issue Type: New Feature > Components: Tools > Reporter: Jonathan Ellis > Fix For: 2.1.2 > > > Via [~schumacr], > bq. I pulled down a sourceforge data generator and created a moc file of > 500,000 rows that had an incrementing sequence number, date, and SSN. I then > used our COPY command and MySQL's LOAD DATA INFILE to load the file on my > Mac. Results were: > {noformat} > mysql> load data infile '/Users/robin/dev/datagen3.txt' into table p_test > fields terminated by ','; > Query OK, 500000 rows affected (2.18 sec) > {noformat} > C* 2.1.0 (pre-CASSANDRA-7405) > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > 500000 rows imported in 16 minutes and 45.485 seconds. > {noformat} > Cassandra 2.1.1: > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > Processed 500000 rows; Write: 4037.46 rows/s > 500000 rows imported in 2 minutes and 3.058 seconds. > {noformat} > [jbellis] 7405 gets us almost an order of magnitude improvement. > Unfortunately we're still almost 2 orders slower than mysql. > I don't think we can continue to tell people, "use sstableloader instead." > The number of users sophisticated enough to use the sstable writers is small > and (relatively) decreasing as our user base expands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)