[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987272#comment-14987272 ] Andre Turgeon commented on CASSANDRA-10358: --- [~slebresne], I was not aware that {{CQLSSTableWriter}} was intended to be used solely as a way of generating data to be loaded via {{sstableloader}}. I was using it differently. I was using rsync to directly copy the output files over to their destination Cassandra nodes. Prior to CASSANDRA-7360, it worked. So from my perspective, that was a regression. Perhaps I should explain how I use {{CQLSSTableWriter}} a bit more clearly: We have a Map/Reduce program (running on Hadoop) which reads terabytes of data and generates SSTables in parallel. Once generated, these SSTables are "rsync"ed to their destination nodes. The generated SSTables are already at the appropriate (level compaction) level which saves a lot of compaction time. Because the Hadoop cluster is very large, it can crunch through the data much more quickly than the Cassandra cluster. The bottle neck is simply the transfer time at that point. This saves a lot of time when we bulk-load data. Using this method, our dataset loads in about 3 hours. When I use {{sstableloader}}, again in parallel using Hadoop, it takes over a week for the load and compaction to finish. > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987496#comment-14987496 ] Andre Turgeon commented on CASSANDRA-10358: --- [~slebresne], I see your point. I was using the unsorted writer because it breaks up my stream into roughly equally sized sstables. I could measure the stream myself and create a new {{CQLSSTablewriter}} for each sstable I generate. That leaves me with the need to control the name of the sstable and its level. Have you had a chance to look at {{SSTableWriterCreationStrategy}} idea? Could that be incorporated into the code base? > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Turgeon updated CASSANDRA-10358: -- Attachment: SSTableWriterCreationStrategy.patch Alternate implementation using a SSTableWriterCreationFactory. > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731 ] Andre Turgeon commented on CASSANDRA-10358: --- Thanks for the feedback [~slebresne]. I forgot to mention another requirement I have: I need to control the level at which an SSTable is created. How would you feel about having a SSTableWriter creation strategy? Something like this: public interface SSTableWriterCreationStrategy { SSTableWriter createWriter(File directory, CFMetaData metadata); } I submitted a patch with more details. > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731 ] Andre Turgeon edited comment on CASSANDRA-10358 at 11/2/15 6:53 PM: Thanks for the feedback [~slebresne]. I forgot to mention another requirement I have: I need to control the level at which a SSTable is created. How would you feel about having a SSTableWriter creation strategy? Something like this: {quote} public interface SSTableWriterCreationStrategy { SSTableWriter createWriter(File directory, CFMetaData metadata); } {quote} I submitted a patch with more details. was (Author: symbiosix): Thanks for the feedback [~slebresne]. I forgot to mention another requirement I have: I need to control the level at which an SSTable is created. How would you feel about having a SSTableWriter creation strategy? Something like this: public interface SSTableWriterCreationStrategy { SSTableWriter createWriter(File directory, CFMetaData metadata); } I submitted a patch with more details. > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731 ] Andre Turgeon edited comment on CASSANDRA-10358 at 11/2/15 6:54 PM: Thanks for the feedback [~slebresne]. I forgot to mention another requirement I have: I need to control the level at which a SSTable is created. How would you feel about having a SSTableWriter creation strategy? Something like this: {noformat} public interface SSTableWriterCreationStrategy { SSTableWriter createWriter(File directory, CFMetaData metadata); } {noformat} I submitted a patch with more details. was (Author: symbiosix): Thanks for the feedback [~slebresne]. I forgot to mention another requirement I have: I need to control the level at which a SSTable is created. How would you feel about having a SSTableWriter creation strategy? Something like this: {quote} public interface SSTableWriterCreationStrategy { SSTableWriter createWriter(File directory, CFMetaData metadata); } {quote} I submitted a patch with more details. > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986327#comment-14986327 ] Andre Turgeon commented on CASSANDRA-10358: --- As for the bug mentioned in #2 above. It appears that the code change for CASSANDRA-7360 introduced the bug. I'm not sure how to keep the benefits of CASSANDRA-7360 and fix the bug. [~slebresne], you worked on CASSANDRA-7360, do you have any suggestions? Should I create a new Jira to address the bug separately? > Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter > - > > Key: CASSANDRA-10358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 > Project: Cassandra > Issue Type: Improvement >Reporter: Andre Turgeon >Priority: Minor > Attachments: SSTableWriterCreationStrategy.patch, patch.txt > > > I've created a patch for your consideration. > This change to CQLSSTableWriter allows for a custom > AbstractSSTableSimpleWriter to be specified. > I needed this for a bulkload process I wrote. I believe the change would be > beneficial for other people as well. > Below are the reasons I needed a custom implementation of > AbstractSSTableSimpleWriter: > 1) The available implementations of AbstractSSTableSimpleWriter do not > provide a way to specify the filename (or rather revision) of the sstable. I > needed to control the name because my bulkload process write sstables in > parallel (on multiple machines) and I wish to avoid name collisions. > 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates > invalid level-compaction-style sstables; It allows a partition to span 2 > sstables which violates the "no overlap of token ranges" constraint of level > compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
Andre Turgeon created CASSANDRA-10358: - Summary: Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter Key: CASSANDRA-10358 URL: https://issues.apache.org/jira/browse/CASSANDRA-10358 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Andre Turgeon Priority: Minor Fix For: 2.0.x Attachments: patch.txt I've created a patch for your consideration. This change to CQLSSTableWriter allows for a custom AbstractSSTableSimpleWriter to be specified. I needed this for a bulkload process I wrote. I believe the change would be beneficial for other people as well. Below are the reasons I needed a custom implementation of AbstractSSTableSimpleWriter: 1) The available implementations of AbstractSSTableSimpleWriter do not provide a way to specify the filename (or rather revision) of the sstable. I needed to control the name because my bulkload process write sstables in parallel (on multiple machines) and I wish to avoid name collisions. 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates invalid level-compaction-style sstables; It allows a partition to span 2 sstables which violates the "no overlap of token ranges" constraint of level compaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)