[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-03 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987272#comment-14987272
 ] 

Andre Turgeon commented on CASSANDRA-10358:
---

[~slebresne], I was not aware that {{CQLSSTableWriter}} was intended to be used 
solely as a way of generating data to be loaded via {{sstableloader}}. I was 
using it differently. I was using rsync to directly copy the output files over 
to their destination Cassandra nodes. Prior to CASSANDRA-7360, it worked. So 
from my perspective, that was a regression. Perhaps I should explain how I use 
{{CQLSSTableWriter}} a bit more clearly:
We have a Map/Reduce program (running on Hadoop) which reads terabytes of data 
and generates SSTables in parallel. Once generated, these SSTables are 
"rsync"ed to their destination nodes. The generated SSTables are already at the 
appropriate (level compaction) level which saves a lot of compaction time. 
Because the Hadoop cluster is very large, it can crunch through the data much 
more quickly than the Cassandra cluster. The bottle neck is simply the transfer 
time at that point. 
This saves a lot of time when we bulk-load data. Using this method, our dataset 
loads in about 3 hours. When I use {{sstableloader}}, again in parallel using 
Hadoop, it takes over a week for the load and compaction to finish.

> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-03 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987496#comment-14987496
 ] 

Andre Turgeon commented on CASSANDRA-10358:
---

[~slebresne], I see your point. I was using the unsorted writer because it 
breaks up my stream into roughly equally sized sstables. I could measure the 
stream myself and create a new {{CQLSSTablewriter}} for each sstable I 
generate. 
That leaves me with the need to control the name of the sstable and its level. 
Have you had a chance to look at {{SSTableWriterCreationStrategy}} idea? Could 
that be incorporated into the code base?

> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-02 Thread Andre Turgeon (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andre Turgeon updated CASSANDRA-10358:
--
Attachment: SSTableWriterCreationStrategy.patch

Alternate implementation using a SSTableWriterCreationFactory. 

> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-02 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731
 ] 

Andre Turgeon commented on CASSANDRA-10358:
---

Thanks for the feedback [~slebresne]. I forgot to mention another requirement I 
have: I need to control the level at which an SSTable is created. 
How would you feel about having a SSTableWriter creation strategy? Something 
like this:

public interface SSTableWriterCreationStrategy {
  SSTableWriter createWriter(File directory, CFMetaData metadata);
} 

I submitted a patch with more details.


> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-02 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731
 ] 

Andre Turgeon edited comment on CASSANDRA-10358 at 11/2/15 6:53 PM:


Thanks for the feedback [~slebresne]. I forgot to mention another requirement I 
have: I need to control the level at which a SSTable is created. 
How would you feel about having a SSTableWriter creation strategy? Something 
like this:

{quote}
public interface SSTableWriterCreationStrategy {
  SSTableWriter createWriter(File directory, CFMetaData metadata);
} 
{quote}

I submitted a patch with more details.



was (Author: symbiosix):
Thanks for the feedback [~slebresne]. I forgot to mention another requirement I 
have: I need to control the level at which an SSTable is created. 
How would you feel about having a SSTableWriter creation strategy? Something 
like this:

public interface SSTableWriterCreationStrategy {
  SSTableWriter createWriter(File directory, CFMetaData metadata);
} 

I submitted a patch with more details.


> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-02 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985731#comment-14985731
 ] 

Andre Turgeon edited comment on CASSANDRA-10358 at 11/2/15 6:54 PM:


Thanks for the feedback [~slebresne]. I forgot to mention another requirement I 
have: I need to control the level at which a SSTable is created. 
How would you feel about having a SSTableWriter creation strategy? Something 
like this:

{noformat}
public interface SSTableWriterCreationStrategy {
  SSTableWriter createWriter(File directory, CFMetaData metadata);
} 
{noformat}

I submitted a patch with more details.



was (Author: symbiosix):
Thanks for the feedback [~slebresne]. I forgot to mention another requirement I 
have: I need to control the level at which a SSTable is created. 
How would you feel about having a SSTableWriter creation strategy? Something 
like this:

{quote}
public interface SSTableWriterCreationStrategy {
  SSTableWriter createWriter(File directory, CFMetaData metadata);
} 
{quote}

I submitted a patch with more details.


> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-11-02 Thread Andre Turgeon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986327#comment-14986327
 ] 

Andre Turgeon commented on CASSANDRA-10358:
---

As for the bug mentioned in #2 above. It appears that the code change for 
CASSANDRA-7360 introduced the bug. I'm not sure how to keep the benefits of 
CASSANDRA-7360 and fix the bug. [~slebresne], you worked on CASSANDRA-7360, do 
you have any suggestions? Should I create a new Jira to address the bug 
separately?

> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -
>
> Key: CASSANDRA-10358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andre Turgeon
>Priority: Minor
> Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom 
> AbstractSSTableSimpleWriter to be specified. 
> I needed this for a bulkload process I wrote. I believe the change would be 
> beneficial for other people as well. 
> Below are the reasons I needed a custom implementation of 
> AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not 
> provide a way to specify the filename (or rather revision) of the sstable. I 
> needed to control the name because my bulkload process write sstables in 
> parallel (on multiple machines) and I wish to avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
> invalid level-compaction-style sstables; It allows a partition to span 2 
> sstables which violates the "no overlap of token ranges" constraint of level 
> compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter

2015-09-16 Thread Andre Turgeon (JIRA)
Andre Turgeon created CASSANDRA-10358:
-

 Summary: Allow CQLSSTableWriter.Builder to use custom 
AbstractSSTableSimpleWriter 
 Key: CASSANDRA-10358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andre Turgeon
Priority: Minor
 Fix For: 2.0.x
 Attachments: patch.txt

I've created a patch for your consideration. 

This change to CQLSSTableWriter allows for a custom AbstractSSTableSimpleWriter 
to be specified. 

I needed this for a bulkload process I wrote. I believe the change would be 
beneficial for other people as well. 

Below are the reasons I needed a custom implementation of 
AbstractSSTableSimpleWriter:

1) The available implementations of AbstractSSTableSimpleWriter do not provide 
a way to specify the filename (or rather revision) of the sstable. I needed to 
control the name because my bulkload process write sstables in parallel (on 
multiple machines) and I wish to avoid name collisions.

2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates 
invalid level-compaction-style sstables; It allows a partition to span 2 
sstables which violates the "no overlap of token ranges" constraint of level 
compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)