[ 
https://issues.apache.org/jira/browse/CASSANDRA-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197152#comment-14197152
 ] 

Aleksey Yeschenko commented on CASSANDRA-8225:
----------------------------------------------

bq. Aleksey Yeschenko, in your mind what is the right solution for Ryan's "I 
have a big file on my SAN that I want to load?"

Either 10x better COPY FROM (which would be good enough for most cases - being 
only ~5x slower than mysql's), or the Spark-based loader for truly huge ones 
(in either standalone or distributed mode).

My secret sources are telling me that LHF COPY FROM stuff to get us to 10x will 
only take us a day or two. So we should do that for now, and start discussing 
the design of the Spark-based not-just-csv loader - here or in a separate 
ticket.

> Production-capable COPY FROM
> ----------------------------
>
>                 Key: CASSANDRA-8225
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8225
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>             Fix For: 2.1.2
>
>
> Via [~schumacr],
> bq. I pulled down a sourceforge data generator and created a moc file of 
> 500,000 rows that had an incrementing sequence number, date, and SSN. I then 
> used our COPY command and MySQL's LOAD DATA INFILE to load the file on my 
> Mac. Results were: 
> {noformat}
> mysql> load data infile '/Users/robin/dev/datagen3.txt'  into table p_test  
> fields terminated by ',';
> Query OK, 500000 rows affected (2.18 sec)
> {noformat}
> C* 2.1.0 (pre-CASSANDRA-7405)
> {noformat}
> cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with 
> delimiter=',';
> 500000 rows imported in 16 minutes and 45.485 seconds.
> {noformat}
> Cassandra 2.1.1:
> {noformat}
> cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with 
> delimiter=',';
> Processed 500000 rows; Write: 4037.46 rows/s
> 500000 rows imported in 2 minutes and 3.058 seconds.
> {noformat}
> [jbellis] 7405 gets us almost an order of magnitude improvement.  
> Unfortunately we're still almost 2 orders slower than mysql.
> I don't think we can continue to tell people, "use sstableloader instead."  
> The number of users sophisticated enough to use the sstable writers is small 
> and (relatively) decreasing as our user base expands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to