[ 
https://issues.apache.org/jira/browse/CASSANDRA-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Zhongxing updated CASSANDRA-7360:
------------------------------------

    Description: 
When using CQLSSTableWriter to write a table with compound primary key, if the 
partition key is identical for a huge amount of records, the sync() method is 
never called, and the memory usage keeps growing until the memory is exhausted. 

Could the code be improved to do sync() even when there is no new row is 
created? The relevant code is in SSTableSimpleUnsortedWriter.java and 
AbstractSSTableSimpleWriter.java. I am new to the code and cannot produce a 
reasonable patch for now.

The problem can be reproduced by the following test case:
{code}
import org.apache.cassandra.io.sstable.CQLSSTableWriter;
import org.apache.cassandra.exceptions.InvalidRequestException;

import java.io.IOException;
import java.util.UUID;

class SS {
    public static void main(String[] args) {
        String schema = "create table test.t (x uuid, y uuid, primary key (x, 
y))";


        String insert = "insert into test.t (x, y) values (?, ?)";
        CQLSSTableWriter writer = CQLSSTableWriter.builder()
            .inDirectory("/tmp/test/t")
            .forTable(schema).withBufferSizeInMB(32)
            .using(insert).build();

        UUID id = UUID.randomUUID();
        try {
            for (int i = 0; i < 50000000; i++) {
                UUID id2 = UUID.randomUUID();
                writer.addRow(id, id2);
            }

            writer.close();
        } catch (Exception e) {
            System.err.println("hell");
        }
    }
}
{code}

  was:
When using CQLSSTableWriter to write a table with compound primary key, if the 
cluster key is identical for a huge amount of records, the sync() method is 
never called, and the memory usage keeps growing until the memory is exhausted. 

Could the code be improved to do sync() even when there is no new row is 
created? The relevant code is in SSTableSimpleUnsortedWriter.java and 
AbstractSSTableSimpleWriter.java. I am new to the code and cannot produce a 
reasonable patch for now.

The problem can be reproduced by the following test case:
{code}
import org.apache.cassandra.io.sstable.CQLSSTableWriter;
import org.apache.cassandra.exceptions.InvalidRequestException;

import java.io.IOException;
import java.util.UUID;

class SS {
    public static void main(String[] args) {
        String schema = "create table test.t (x uuid, y uuid, primary key (x, 
y))";


        String insert = "insert into test.t (x, y) values (?, ?)";
        CQLSSTableWriter writer = CQLSSTableWriter.builder()
            .inDirectory("/tmp/test/t")
            .forTable(schema).withBufferSizeInMB(32)
            .using(insert).build();

        UUID id = UUID.randomUUID();
        try {
            for (int i = 0; i < 50000000; i++) {
                UUID id2 = UUID.randomUUID();
                writer.addRow(id, id2);
            }

            writer.close();
        } catch (Exception e) {
            System.err.println("hell");
        }
    }
}
{code}


> CQLSSTableWriter consumes all memory for table with compound primary key
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7360
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7360
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Xu Zhongxing
>            Assignee: Sylvain Lebresne
>
> When using CQLSSTableWriter to write a table with compound primary key, if 
> the partition key is identical for a huge amount of records, the sync() 
> method is never called, and the memory usage keeps growing until the memory 
> is exhausted. 
> Could the code be improved to do sync() even when there is no new row is 
> created? The relevant code is in SSTableSimpleUnsortedWriter.java and 
> AbstractSSTableSimpleWriter.java. I am new to the code and cannot produce a 
> reasonable patch for now.
> The problem can be reproduced by the following test case:
> {code}
> import org.apache.cassandra.io.sstable.CQLSSTableWriter;
> import org.apache.cassandra.exceptions.InvalidRequestException;
> import java.io.IOException;
> import java.util.UUID;
> class SS {
>     public static void main(String[] args) {
>         String schema = "create table test.t (x uuid, y uuid, primary key (x, 
> y))";
>         String insert = "insert into test.t (x, y) values (?, ?)";
>         CQLSSTableWriter writer = CQLSSTableWriter.builder()
>             .inDirectory("/tmp/test/t")
>             .forTable(schema).withBufferSizeInMB(32)
>             .using(insert).build();
>         UUID id = UUID.randomUUID();
>         try {
>             for (int i = 0; i < 50000000; i++) {
>                 UUID id2 = UUID.randomUUID();
>                 writer.addRow(id, id2);
>             }
>             writer.close();
>         } catch (Exception e) {
>             System.err.println("hell");
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to