[ https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956508#comment-14956508 ]
qian, chen commented on SPARK-6910: ----------------------------------- I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very interesting thing: in spark-sql shell: at first I ran this, it took about 3 minutes select * from table1 where date='20151010' and hour='12' and name='x' limit 5; Time taken: 164.502 seconds then I ran this, it only took 10s. date, hour and name are partition columns in this hive table. this table has >4000 partitions select * from table1 where date='20151010' and hour='13' limit 5; Time taken: 10.881 seconds is it because that the first time I need to download all partition information from hive metastore? the second query is faster because all partitions are cached in memory now? > Support for pushing predicates down to metastore for partition pruning > ---------------------------------------------------------------------- > > Key: SPARK-6910 > URL: https://issues.apache.org/jira/browse/SPARK-6910 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Michael Armbrust > Assignee: Cheolsoo Park > Priority: Critical > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org