Brian Hess created CASSANDRA-8519:
--------------------------------------

             Summary: Mechanism to determine which nodes own which token ranges 
without Thrift
                 Key: CASSANDRA-8519
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8519
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter:  Brian Hess


Right now the only way to determine which nodes own which token ranges is via 
the Thrift interface.  There is not a Java/CQL driver mechanism to determine 
this.  Applications that make multiple connections to Cassandra to extract data 
in parallel need this ability so they can split the data into pieces, and it is 
reasonable to want those splits to be on token range boundaries.  Of course, 
once you split this way, you would want to route those queries to nodes that 
own that token range / split, for efficiency.

This applies for both Hadoop and Spark, but other applications, too.  Hadoop 
and Spark currently use Thrift to determine this topology.

Additionally, different replication strategies and replication factors result 
in different token range ownership, so there will have to be a different answer 
based on which keyspace is used. 

It would be useful if this data was stored in a CQL table and could be simply 
queried.  A suggestion would be to add a column to the SYSTEM.SCHEMA_KEYSPACES 
table (maybe a complex Map of Host to a UDT that has a List of (beginRange, 
endRange) pairs - as an example).  This table would need to be updated on an 
ALTER KEYSPACE command or on a topology change event.  This would allow the 
server(s) to hold this information and the drivers could simply query it (as 
opposed to having each driver manage this separately).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to