[
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956508#comment-14956508
]
qian, chen edited comment on SPARK-6910 at 10/14/15 9:03 AM:
-
I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very
interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds
then I ran this, it only took 10s. date, hour and name are partition columns in
this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information
from hive metastore? the second query is faster because all partitions are
cached in memory now?
any suggestions about speeding up the first query?
was (Author: nedqian):
I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very
interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds
then I ran this, it only took 10s. date, hour and name are partition columns in
this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information
from hive metastore? the second query is faster because all partitions are
cached in memory now?
> Support for pushing predicates down to metastore for partition pruning
> --
>
> Key: SPARK-6910
> URL: https://issues.apache.org/jira/browse/SPARK-6910
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 1.5.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org