[ 
https://issues.apache.org/jira/browse/CASSANDRA-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216190#comment-15216190
 ] 

Thom Valley commented on CASSANDRA-7779:
----------------------------------------

This grows to an even bigger scale when you move up to 5 DCs.

SSTABLELOADER is very inefficient when trying to load multiple DCs.  Most 
global multi-DC implementations are bandwidth sensitive and streaming RF copies 
of the data to each target DC is very expensive / can be impacted by bandwidth 
limitations.

Being able to LOAD data to a single DC and have Cassandra do the replication to 
the additional DCs would be much more efficient, as Cassandra does a great job 
of limiting resource consumption.  

I realize that's not really SSTABLELOADER as it exists today, but didn't want 
to file yet another ticket for something so closely related.

> Add option to sstableloader to only stream to the local dc
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7779
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Nick Bailey
>             Fix For: 2.1.x
>
>
> This is meant to be a potential workaround for CASSANDRA-4756. Due to that 
> ticket, trying to load a cluster wide snapshot via sstableloader will 
> potentially stream an enormous amount of data. In a 3 datacenter cluster with 
> rf=3 in each datacenter, 81 copies of the data would be streamed. Once we 
> have per range sstables we can optimize sstableloader to merge data and only 
> stream one copy, but until then we need a workaround. By only streaming to 
> the local datacenter we can load the data locally in each datacenter and only 
> have 9 copies of the data rather than 81.
> This could potentially be achieved by the option to ignore certain nodes that 
> already exists in sstableloader, but in the case of vnodes and topology 
> changes in the cluster, this could require specifying every node in the 
> cluster as 'ignored' on the command line which could be problematic. This is 
> just a shortcut to avoid that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to