[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956508#comment-14956508
 ] 

qian, chen edited comment on SPARK-6910 at 10/14/15 9:03 AM:
-------------------------------------------------------------

I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very 
interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds

then I ran this, it only took 10s. date, hour and name are partition columns in 
this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information 
from hive metastore? the second query is faster because all partitions are 
cached in memory now?
any suggestions about speeding up the first query?


was (Author: nedqian):
I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very 
interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds

then I ran this, it only took 10s. date, hour and name are partition columns in 
this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information 
from hive metastore? the second query is faster because all partitions are 
cached in memory now?

> Support for pushing predicates down to metastore for partition pruning
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6910
>                 URL: https://issues.apache.org/jira/browse/SPARK-6910
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Cheolsoo Park
>            Priority: Critical
>             Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to