[ https://issues.apache.org/jira/browse/CASSANDRA-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Bailey reopened CASSANDRA-1898: ------------------------------------ This breaks if a row is skipped during the export of the sstable. Currently this can happen if there is an IOException while deserializing or an OOM for an extremely large row. CASSANDRA-1877 will also hopefully make export skip other corruption as well. The problem here is the export will cause a line with only a ',' to be added to the json file. > json2sstable should support streaming > ------------------------------------- > > Key: CASSANDRA-1898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1898 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Nick Bailey > Assignee: Pavel Yaskevich > Fix For: 0.7.1 > > Attachments: CASSANDRA-1898-v2.patch, CASSANDRA-1898-v3.patch, > CASSANDRA-1898-v4.patch, CASSANDRA-1898.patch > > Original Estimate: 8h > Time Spent: 8h > Remaining Estimate: 0h > > json2sstable loads the entire json file into memory. This is so it can sort > the file before creating an sstable. If the file was created using > sstable2json and the partitioner isn't changing, this isn't necessary. For > very large files this means json2sstable requires a huge amount of memory. > There should be an option to stream the file. A simple check for out of order > keys will prevent writing bad sstables. > This should be possible with the SAX style parser available in our current > json library. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.