[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2017-08-28 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144184#comment-16144184
 ] 

Jon Haddad commented on CASSANDRA-8576:
---

Unfortunately this patch is pretty stale now as 2.x is no longer getting 
feature improvements.  Is there anything here that would be relevant for 4.0? 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Russell Spitzer
>Assignee: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v1-2.2-branch.txt, CASSANDRA-8576-v2-2.1-branch.txt, 
> CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-06-29 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605809#comment-14605809
 ] 

Aleksey Yeschenko commented on CASSANDRA-8576:
--

bq. (Edit: Piotr +1'd v3 already)

Doh, you are right.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-06-29 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605803#comment-14605803
 ] 

Aleksey Yeschenko commented on CASSANDRA-8576:
--

Piotr's approval and +1 as the reviewer.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-06-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605771#comment-14605771
 ] 

Jeremy Hanna commented on CASSANDRA-8576:
-

Is there anything else that needs to happen on this before committing?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-06-02 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569713#comment-14569713
 ] 

Philip Thompson commented on CASSANDRA-8576:


This does not break any of the existing pig tests. I ran some additional tests, 
and found no major issues.

As far as a mixed version cluster, I spun up a 3 node cluster of C*, with two 
nodes running this patch, the third without. I connected Pig to the cluster, 
using the unmodified node as the initial address. I then performed some map 
reduce jobs to select data from the cluster. The jobs succeeded, and I did see 
any errors in the log.

+1 from me.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564440#comment-14564440
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

Yes, it would be good to test it in a mixed version cluster. If cassandra.jar 
is part of the Hadoop job classpath, then there shouldn't be any problems. 
Problems might happen if cassandra.jar is on the classpath of Hadoop TT 
(inherited by all jobs), and different TTs used mixed versions of it (with / 
without this patch).

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-27 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561632#comment-14561632
 ] 

Philip Thompson commented on CASSANDRA-8576:


Reading [~jjordan]'s comment, does this need a test of a hadoop job while in a 
mixed version cluster? 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539417#comment-14539417
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

[~jjordan] yes, you're right.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538280#comment-14538280
 ] 

Jonathan Ellis commented on CASSANDRA-8576:
---

So the ball is [~philipthompson]'s now for test?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-11 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538044#comment-14538044
 ] 

Alex Liu commented on CASSANDRA-8576:
-

It's no much different,but I will use your changes :)

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536484#comment-14536484
 ] 

Jeremiah Jordan commented on CASSANDRA-8576:


Bq. It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it).

This is only an issue if the jobs are pulling the C* jar off of the nodes and 
the jar isn't part of the job itself?  So if this is a problem for someone, 
they have a work around.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536352#comment-14536352
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it). If this is not a problem, +1.


> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536341#comment-14536341
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

Some comments were not addressed.
{noformat}
  boolean containToken;
for (Range subrange : ranges)
{
//make sure subrange contains the token
containToken = false;
if (token != null)
{
if (subrange.contains(token))
containToken = true;
else
continue;
}

ColumnFamilySplit split =
new ColumnFamilySplit(
factory.toString(subrange.left),
factory.toString(subrange.right),
subSplit.getRow_count(),
endpoints);

if (containToken)
split.setPartitionKeyEqQuery(containToken);
logger.debug("adding {}", split);
{noformat}
Multiple code smells in this fragment:
* boolean flag declared in a needlessly broad scope. If something is used only 
inside a loop, it should be declared only inside the loop.
* continue controlled by a boolean flag
* redundant if (the code is equivalent without if (containToken)

I simplified it for you:
{noformat}
for (Range subrange : ranges)
{
boolean containsToken = token != null && 
subrange.contains(token);
if (token == null || containsToken) {
ColumnFamilySplit split =
new ColumnFamilySplit(
factory.toString(subrange.left),
factory.toString(subrange.right),
subSplit.getRow_count(),
endpoints);
split.setPartitionKeyEqQuery(containsToken);
logger.debug("adding {}", split);
splits.add(split);
}
}
{noformat}





> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
> CASSANDRA-8576-v2-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520395#comment-14520395
 ] 

Alex Liu commented on CASSANDRA-8576:
-

if token == null, containToken is false. All other comments will be addressed 
in the new patch

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520112#comment-14520112
 ] 

Jonathan Ellis commented on CASSANDRA-8576:
---

2.1.x means the next 2.1 release.  (2.1.5 is already released.)

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520084#comment-14520084
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Which branch should this go into? Is it still going into 2.1.5 ? or other 
release?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519685#comment-14519685
 ] 

Jonathan Ellis commented on CASSANDRA-8576:
---

Alex, is this still on your radar?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.x
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496256#comment-14496256
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

I finished the review.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496197#comment-14496197
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

CqlTableTest L336 and L368
{noformat}
count = 0;
while (it.hasNext())
{
it.next();
count ++;
}
{noformat}

Use Guava Iterators.size(it).

---

Code style issues:
getToken, retrieveKeys: unused exceptions reported
getToken: too big and too nested for my taste

---

retrieveKeys L492:
{noformat}
CqlRow cqlRow = result.rows.get(0);
{noformat}

Will fail in a very cryptic way if the keyspace / table doesn't exist.
It is good to give the user hints what went wrong.

---

retrievKeys L503:
{noformat}
   for (CfDef cfDef : ksDef.cf_defs)
{
if (cfDef.name.equalsIgnoreCase(cfName))   
{
CFMetaData cfMeta = ThriftConversion.fromThrift(cfDef);
{noformat}
Why equalsIgnoreCase?

--

retrieveKeys L512:
{noformat}
return 
Pair.create(parseType(ByteBufferUtil.string(ByteBuffer.wrap(cqlRow.columns.get(1).getValue(,
 keys);
{noformat}
Code style: Expression too complex, too many nesting levels, hard to read.


--
getToken L410:
{noformat}
int i = 0;
{noformat}
This should be declared in the first branch of the following if, because it is 
used only there, in order not to pollute the wider scope.

--
{noformat}
catch (Exception e)
{
   //not a Terminal term
}
{noformat}

Are you sure you really want to swallow all the exceptions here? Or did you 
have some specific exception in mind like {{InvalidRequestException}}?
Swallowing exceptions by a very general catch-all clause is very dangerous.

--
getToken L456-L462:
{noformat}
for (String key : validators.keySet())
keyValues[i++] = eqColumns.get(key);
IPartitioner partitioner = ConfigHelper.getInputPartitioner(conf);
if (keyValidator instanceof CompositeType)
return partitioner.getToken(((CompositeType) 
keyValidator).build(keyValues));
else
return partitioner.getToken(eqColumns.get(keys.get(0)));

{noformat}

validators is a HashMap and HashMaps do not preserve key order. The order of 
items in the keyValues array here may not match the order of the key columns in 
the keyValidator, therefore the values may be misplaced. If all key components 
are of the same type, this may fail in a very subtle / silent way.

Besides that: Cassandra style of writing this would be to use a ternary 
operator:
{noformat}
return (keyValidator instanceof CompositeType) 
  ?  ...
  : ...
{noformat}

-
getSplits L140-L147
{noformat}
try
{
token = getToken(conf);
}
catch (Exception e)
{
throw new IOException(e);
}
{noformat}
Given that this change is going to be included in a patch version of Cassandra, 
we should not increase the likelihood of failure here by throwing some 
additional exceptions, that previously could never happen. If getting a token 
fails, we should log the failure with the exception at ERROR level and continue 
without the token, because all this token thing is only an optimization. 

-

ColumnFamilySplit L74:
getPartitionKeyQuery should be called isPartitionKeyQuery

-

SplitCallable#call L293:
{noformat}
if (containToken)
split.setPartitionKeyEqQuery(containToken);
{noformat}
Can be simplified to:
{noformat}
  split.setPartitionKeyEqQuery(true);
{noformat}

containToken is always true at the point of reaching the if statement.
Therefore you really don't need the containToken variable at all, and you can 
remove some earlier code related to setting it as well.

==
Overall I vote against putting this into 2.1.5, because it is a too big feature 
which may have effects on correctness and performance.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of cust

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496182#comment-14496182
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

I know this was this way from the beginning, but it is not a reason we 
shouldn't change it for better :)


> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493123#comment-14493123
 ] 

Alex Liu commented on CASSANDRA-8576:
-

ome one from Product Management should be able to answer it.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493122#comment-14493122
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Some one from Product Management should be able to answer it. 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493117#comment-14493117
 ] 

Alex Liu commented on CASSANDRA-8576:
-

It's been this way for very beginning. Internally, url decoding is used. I 
think it's not an easy way around here.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492934#comment-14492934
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

{noformat}
  pig.registerQuery("composite_rows = LOAD 'cql://cql3ks/compositekeytable?" + 
defaultParameters + nativeParameters +  
"&where_clause=key1%20%3D%20%27key1%27%20and%20key2%20%3D%20111%20and%20column1%3D100&page_size=2'
 USING CqlNativeStorage();");
{noformat}

Things like this make my eyes cry. I know, this already was like this, but why 
can't we just specify the query in a human readable form and call a function to 
url encode it?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492929#comment-14492929
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

The whole {{AbstractColumnFamilyInputFormat#getToken}} thing - this is quite a 
complex piece of logic, and always invoked. Not sure if we really want to 
really merge it into 2.1.5. I'm afraid this may destabilize things.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492720#comment-14492720
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

AbstractColumnFamilyInputFormat#getToken:
{noformat}
 if (keyValidator instanceof CompositeType)
return partitioner.getToken(((CompositeType) 
keyValidator).build(keyValues));   /// <<< should be CompositeType.build, 
because this is a static method
else
return partitioner.getToken(eqColumns.get(keys.get(0)));
{noformat}}

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492709#comment-14492709
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

{noformat}
@@ -79,6 +90,7 @@ public class ColumnFamilySplit extends InputSplit implements 
Writable, org.apach
 {
 out.writeUTF(startToken);
 out.writeUTF(endToken);
+out.writeBoolean(partitionKeyEqQuery);
 out.writeInt(dataNodes.length);
{noformat}

This is going to break mixed-version clusters. Hadoop tasks will error out in 
weird ways on a cluster with some nodes 2.1.4 and some 2.1.5. This is actually 
very unfortunate that split serialization doesn't write a length or version 
header first, so we could detect it properly on the clients. Are you sure we 
want to merge this feature in the middle of 2.1.x? 
Are we 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-07 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484093#comment-14484093
 ] 

Aleksey Yeschenko commented on CASSANDRA-8576:
--

CASSANDRA-8358 is taking a bit longer than I expected to review/commit. Could 
be delayed by a week or so more.

Can you guys go ahead and review/commit this without 8358?

I'll rebase CASSANDRA-8358 afterwards.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-01 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391765#comment-14391765
 ] 

Alex Liu commented on CASSANDRA-8576:
-

pending on CASSANDRA-8358 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.5
>
> Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-03-05 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349527#comment-14349527
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Pig-test on trunk fails, Philip Thompson is fixing it. I attach the patch on 
trunk, but we need merge it with Philip Thompson's fix.

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.4
>
> Attachments: 8576-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-26 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339343#comment-14339343
 ] 

Brandon Williams commented on CASSANDRA-8576:
-

LGTM, can you attach a version for trunk as well?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.4
>
> Attachments: 8576-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329142#comment-14329142
 ] 

Alex Liu commented on CASSANDRA-8576:
-

[~brandon.williams] Do u have time to review this ticket? 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>Assignee: Alex Liu
> Fix For: 2.1.4
>
> Attachments: 8576-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-01-13 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276183#comment-14276183
 ] 

Russell Alexander Spitzer commented on CASSANDRA-8576:
--

For this particular use-case they only need EQ, but IN would be nice as well. 

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-01-13 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276140#comment-14276140
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Should it work only for EQ predicates? Should it also include IN predicates?

> Primary Key Pushdown For Hadoop
> ---
>
> Key: CASSANDRA-8576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Russell Alexander Spitzer
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)