[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

JIRA Mon, 13 Apr 2015 10:38:04 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492709#comment-14492709
 ]


Piotr Kołaczkowski edited comment on CASSANDRA-8576 at 4/13/15 5:37 PM:
------------------------------------------------------------------------

{noformat}
@@ -79,6 +90,7 @@ public class ColumnFamilySplit extends InputSplit implements 
Writable, org.apach
     {
         out.writeUTF(startToken);
         out.writeUTF(endToken);
+        out.writeBoolean(partitionKeyEqQuery);
         out.writeInt(dataNodes.length);
{noformat}

This is going to break mixed-version clusters. Hadoop tasks will error out in 
weird ways on a cluster with some nodes 2.1.4 and some 2.1.5. This is actually 
very unfortunate that split serialization doesn't write a length or version 
header first, so we could detect it properly on the clients. Are you sure we 
want to merge this feature in the middle of 2.1.x? 


was (Author: pkolaczk):
{noformat}
@@ -79,6 +90,7 @@ public class ColumnFamilySplit extends InputSplit implements 
Writable, org.apach
     {
         out.writeUTF(startToken);
         out.writeUTF(endToken);
+        out.writeBoolean(partitionKeyEqQuery);
         out.writeInt(dataNodes.length);
{noformat}

This is going to break mixed-version clusters. Hadoop tasks will error out in 
weird ways on a cluster with some nodes 2.1.4 and some 2.1.5. This is actually 
very unfortunate that split serialization doesn't write a length or version 
header first, so we could detect it properly on the clients. Are you sure we 
want to merge this feature in the middle of 2.1.x? 
Are we 

> Primary Key Pushdown For Hadoop
> -------------------------------
>
>                 Key: CASSANDRA-8576
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Russell Alexander Spitzer
>            Assignee: Alex Liu
>             Fix For: 2.1.5
>
>         Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

Reply via email to