[ 
https://issues.apache.org/jira/browse/CASSANDRA-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253628#comment-14253628
 ] 

Sylvain Lebresne commented on CASSANDRA-8519:
---------------------------------------------

Just to clarify, the information to compute this *is* available through CQL 
since you have access to tokens and the replication strategy. And in fact, the 
java driver (since it was mentioned) does do it already, it just don't expose 
the token range yet but that will be fixed by 
https://datastax-oss.atlassian.net/browse/JAVA-312.

Now, we can provide the token range pre-computed to 1) save every driver from 
having to compute it and 2) make it so that said drivers don't need to be 
updated if we ever add a new replication strategy. But since 1) it's not 
terribly hard for driver to do it (and I say that as the one that did it for 
the java driver) and 2) we're far from release new replication strategies every 
other day, I'm going to mark that as low priority.

> Mechanism to determine which nodes own which token ranges without Thrift
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8519
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter:  Brian Hess
>
> Right now the only way to determine which nodes own which token ranges is via 
> the Thrift interface.  There is not a Java/CQL driver mechanism to determine 
> this.  Applications that make multiple connections to Cassandra to extract 
> data in parallel need this ability so they can split the data into pieces, 
> and it is reasonable to want those splits to be on token range boundaries.  
> Of course, once you split this way, you would want to route those queries to 
> nodes that own that token range / split, for efficiency.
> This applies for both Hadoop and Spark, but other applications, too.  Hadoop 
> and Spark currently use Thrift to determine this topology.
> Additionally, different replication strategies and replication factors result 
> in different token range ownership, so there will have to be a different 
> answer based on which keyspace is used. 
> It would be useful if this data was stored in a CQL table and could be simply 
> queried.  A suggestion would be to add a column to the 
> SYSTEM.SCHEMA_KEYSPACES table (maybe a complex Map of Host to a UDT that has 
> a List of (beginRange, endRange) pairs - as an example).  This table would 
> need to be updated on an ALTER KEYSPACE command or on a topology change 
> event.  This would allow the server(s) to hold this information and the 
> drivers could simply query it (as opposed to having each driver manage this 
> separately).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to