[ https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956508#comment-14956508 ]
qian, chen edited comment on SPARK-6910 at 10/14/15 9:03 AM: ------------------------------------------------------------- I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very interesting thing: in spark-sql shell: at first I ran this, it took about 3 minutes select * from table1 where date='20151010' and hour='12' and name='x' limit 5; Time taken: 164.502 seconds then I ran this, it only took 10s. date, hour and name are partition columns in this hive table. this table has >4000 partitions select * from table1 where date='20151010' and hour='13' limit 5; Time taken: 10.881 seconds is it because that the first time I need to download all partition information from hive metastore? the second query is faster because all partitions are cached in memory now? any suggestions about speeding up the first query? was (Author: nedqian): I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very interesting thing: in spark-sql shell: at first I ran this, it took about 3 minutes select * from table1 where date='20151010' and hour='12' and name='x' limit 5; Time taken: 164.502 seconds then I ran this, it only took 10s. date, hour and name are partition columns in this hive table. this table has >4000 partitions select * from table1 where date='20151010' and hour='13' limit 5; Time taken: 10.881 seconds is it because that the first time I need to download all partition information from hive metastore? the second query is faster because all partitions are cached in memory now? > Support for pushing predicates down to metastore for partition pruning > ---------------------------------------------------------------------- > > Key: SPARK-6910 > URL: https://issues.apache.org/jira/browse/SPARK-6910 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Michael Armbrust > Assignee: Cheolsoo Park > Priority: Critical > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org