Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Thanks Yong for the response. Adding my responses inline On Tue, Jan 17, 2017 at 10:27 PM, Yong Zhang wrote: > What DB you are using for your Hive meta store, and what types are your > partition columns? > I am using MySql for Hive metastore. Partition columns are

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Yong Zhang
What DB you are using for your Hive meta store, and what types are your partition columns? You maybe want to read the discussion in SPARK-6910, and especially the comments in PR. There are some limitation about partition pruning in Hive/Spark, maybe yours is one of them. Yong

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Had a high level look into the code. Seems getHiveQlPartitions method from HiveMetastoreCatalog is getting called irrespective of metastorePartitionPruning conf value. It should not fetch all partitions if we set metastorePartitionPruning to true (Default value for this is false) def

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-15 Thread Raju Bairishetti
Waiting for suggestions/help on this... On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote: > Hello, > >Spark sql is generating query plan with all partitions information even > though if we apply filters on partitions in the query. Due to this, spark > driver/hive

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti
Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, spark driver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to