[ 
https://issues.apache.org/jira/browse/CASSANDRA-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8404:
---------------------------------------
    Fix Version/s: 2.1.4

> CQLSSTableLoader can not create SSTable for csv file of 10M rows.
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-8404
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8404
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: I am using Cassandra 2.1.1 on 32 bit Ubuntu 12.04. I am 
> running the program with -Xmx1000M
> manish@manish[~]:> uname -a
> Linux manish 3.2.0-72-generic-pae #107-Ubuntu SMP Thu Nov 6 14:44:10 UTC 2014 
> i686 i686 i386 GNU/Linux
>            Reporter: Manish
>             Fix For: 2.1.4
>
>         Attachments: Test1.java, cassandra.yaml
>
>
> I am able to create SSTable for one file of 10M rows but not for other file. 
> The data file which works is subscribers1.gz and data file which does not 
> work is subscriber2.gz. Both files have same values in first column but 
> different values for second column. I wonder why CQLSSTableLoader does not 
> work for different set of data. 
> Program expected unzipped txt files. So please unzip files before running 
> program. What I have observed is High GC when program processes around 5.2M 
> lines of file subscriber2.gz. It is able to process till 5.8M lines with very 
> frequent Full GC runs. It is not able to process beyond 5.8M rows because of 
> memory not being available.
> I have attached Test1.java and cassandra.yaml I used for creating sstable. In 
> classpath I am specifying all jars of lib folder of extracted 
> apache-cassandra-2.1.1-bin.tar.gz 
> Jira does not allow a file of size greater than 10 MB. So I am sharing data 
> files in google drive.
> link to download subscribers1.gz
> https://drive.google.com/file/d/0B6_-ugKWlrfoOTRTa2FCNTFWU2c/view?usp=sharing
> link to download subscribers2.gz
> https://drive.google.com/file/d/0B6_-ugKWlrfocndycm9yM21rN0E/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to