[GitHub] spark pull request #13686: [SPARK-15968][SQL] HiveMetastoreCatalog does not ...

2016-06-15 Thread mallman
Github user mallman closed the pull request at:

https://github.com/apache/spark/pull/13686


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13686: [SPARK-15968][SQL] HiveMetastoreCatalog does not ...

2016-06-15 Thread mallman
GitHub user mallman opened a pull request:

https://github.com/apache/spark/pull/13686

[SPARK-15968][SQL] HiveMetastoreCatalog does not correctly validate

## What changes were proposed in this pull request?

The `getCached` method of `HiveMetastoreCatalog` computes 
`pathsInMetastore` from the metastore relation's catalog table. This only 
returns the table base path, which is not correct for partitioned
tables. As a result, cached lookups on partitioned tables always miss, and 
these relations are always recomputed.

Rather than get `pathsInMetastore` from

metastoreRelation.catalogTable.storage.locationUri.toSeq

I modified the `getCached` method to take a `pathsInMetastore` argument. 
Calls to this method pass in the paths computed from calls to the Hive 
metastore. This is how `getCached` was implemented in Spark 1.5:


https://github.com/apache/spark/blob/e0c3212a9b42e3e704b070da4ac25b68c584427f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L444.

## How was this patch tested?

I tested by (temporarily) adding logging to the `getCached` method and ran 
`spark.table("...")` on a partitioned table in a spark-shell before and after 
this patch. Before this patch, the value of `useCached` in `getCached` was 
`false`. After the patch it was `true`. I also validated that caching still 
works for unpartitioned tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VideoAmp/spark-public spark-15968

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13686


commit 60bfe10e350d245632e940aa758cec4f0d2c4006
Author: Michael Allman 
Date:   2016-06-15T16:52:17Z

[SPARK-15968][SQL] HiveMetastoreCatalog does not correctly validate
partitioned metastore relation when searching the internal table cache

The `getCached` method of `HiveMetastoreCatalog` computes
`pathsInMetastore` from the metastore relation's catalog table. This
only returns the table base path, which is not correct for partitioned
tables. As a result, cached lookups on partitioned tables always miss,
  and these relations are always recomputed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org