Paulo Ricardo Motta Gomes created CASSANDRA-6436:
----------------------------------------------------

             Summary: AbstractColumnFamilyInputFormat does not use start and 
end tokens configured via ConfigHelper.setInputRange()
                 Key: CASSANDRA-6436
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6436
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
            Reporter: Paulo Ricardo Motta Gomes
             Fix For: 1.2.6


ConfigHelper allows to set a token input range via the setInputRange(conf, 
startToken, endToken) call (ConfigHelper:254).

We used this feature to limit a hadoop job range to a single Cassandra node's 
range, or even to single row key, mostly for testing purposes. 

This worked before the fix for CASSANDRA-5536 
(https://github.com/apache/cassandra/commit/aaf18bd08af50bbaae0954d78d5e6cbb684aded9),
 but after this ColumnFamilyInputFormat never uses the value of 
KeyRange.start_token when defining the input splits 
(AbstractColumnFamilyInputFormat:142-160), but only KeyRange.start_key, which 
needs an order preserving partitioner to work.

I propose the attached fix in order to allow defining Cassandra token ranges 
for a given Hadoop job even when using a non-order preserving partitioner.

Example use of ConfigHelper.setInputRange(conf, startToken, endToken) to limit 
the range to a single Cassandra Key with RandomPartitioner: 

IPartitioner part = ConfigHelper.getInputPartitioner(job.getConfiguration());
Token token = part.getToken(ByteBufferUtil.bytes("Cassandra Key"));
BigInteger endToken = (BigInteger) new 
BigIntegerConverter().convert(BigInteger.class, 
part.getTokenFactory().toString(token));
BigInteger startToken = endToken.subtract(new BigInteger("1"));
ConfigHelper.setInputRange(job.getConfiguration(), startToken.toString(), 
endToken.toString());



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to