[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235370#comment-15235370
 ] 

Mick Davies commented on SPARK-6910:
------------------------------------

Hi, 

We are seeing something similar, but in our case subsequent queries are still 
expensive. Looking at HiveMetastoreCatalog.lookupRelation (we are using 1.5, 
but 1.6 looks the same) we seem to create a new MetastoreRelation for each 
query. Part of the analysis phase tries to convert this to a ParquetRelation 
using convertToParquetRelation which always calls 
metastoreRelation.getHiveQlPartitions() which gets all partition information. 
So every query incurs the cost of retrieving all partition info.

We don't understand how the code can use the cachedDataSourceTables effectively 
in the circumstances just described.

We changed HiveMetastoreCatalog.lookupRelation to use cache even if Hive table 
property "spark.sql.sources.provider" is unset which caused subsequent queries 
to use cached relation and therfore run more quickly.

Eg, changed 
{code}
if (table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

to 
{code}
if (cachedDataSourceTables.getIfPresent(QualifiedTableName(databaseName, 
tblName).toLowerCase) != null ||
      table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

Are we doing something wrong?




> Support for pushing predicates down to metastore for partition pruning
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6910
>                 URL: https://issues.apache.org/jira/browse/SPARK-6910
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Cheolsoo Park
>            Priority: Critical
>             Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to