[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492709#comment-14492709 ]
Piotr Kołaczkowski edited comment on CASSANDRA-8576 at 4/13/15 5:37 PM: ------------------------------------------------------------------------ {noformat} @@ -79,6 +90,7 @@ public class ColumnFamilySplit extends InputSplit implements Writable, org.apach { out.writeUTF(startToken); out.writeUTF(endToken); + out.writeBoolean(partitionKeyEqQuery); out.writeInt(dataNodes.length); {noformat} This is going to break mixed-version clusters. Hadoop tasks will error out in weird ways on a cluster with some nodes 2.1.4 and some 2.1.5. This is actually very unfortunate that split serialization doesn't write a length or version header first, so we could detect it properly on the clients. Are you sure we want to merge this feature in the middle of 2.1.x? was (Author: pkolaczk): {noformat} @@ -79,6 +90,7 @@ public class ColumnFamilySplit extends InputSplit implements Writable, org.apach { out.writeUTF(startToken); out.writeUTF(endToken); + out.writeBoolean(partitionKeyEqQuery); out.writeInt(dataNodes.length); {noformat} This is going to break mixed-version clusters. Hadoop tasks will error out in weird ways on a cluster with some nodes 2.1.4 and some 2.1.5. This is actually very unfortunate that split serialization doesn't write a length or version header first, so we could detect it properly on the clients. Are you sure we want to merge this feature in the middle of 2.1.x? Are we > Primary Key Pushdown For Hadoop > ------------------------------- > > Key: CASSANDRA-8576 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 > Project: Cassandra > Issue Type: Improvement > Components: Hadoop > Reporter: Russell Alexander Spitzer > Assignee: Alex Liu > Fix For: 2.1.5 > > Attachments: 8576-2.1-branch.txt, 8576-trunk.txt > > > I've heard reports from several users that they would like to have predicate > pushdown functionality for hadoop (Hive in particular) based services. > Example usecase > Table with wide partitions, one per customer > Application team has HQL they would like to run on a single customer > Currently time to complete scales with number of customers since Input Format > can't pushdown primary key predicate > Current implementation requires a full table scan (since it can't recognize > that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)