[2/2] spark git commit: [SPARK-16980][SQL] Load only catalog table partition metadata required to answer a query

2016-10-14 Thread rxin
[SPARK-16980][SQL] Load only catalog table partition metadata required to answer a query (This PR addresses https://issues.apache.org/jira/browse/SPARK-16980.) ## What changes were proposed in this pull request? In a new Spark session, when a partitioned Hive table is converted to use Spark's

[1/2] spark git commit: [SPARK-16980][SQL] Load only catalog table partition metadata required to answer a query

2016-10-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2d96d35dc -> 6ce1b675e http://git-wip-us.apache.org/repos/asf/spark/blob/6ce1b675/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala -- diff --git

spark git commit: [SPARK-17946][PYSPARK] Python crossJoin API similar to Scala

2016-10-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 72adfbf94 -> 2d96d35dc [SPARK-17946][PYSPARK] Python crossJoin API similar to Scala ## What changes were proposed in this pull request? Add a crossJoin function to the DataFrame API similar to that in Scala. Joins with no condition

spark git commit: [SPARK-17900][SQL] Graduate a list of Spark SQL APIs to stable

2016-10-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f00df40cf -> 72adfbf94 [SPARK-17900][SQL] Graduate a list of Spark SQL APIs to stable ## What changes were proposed in this pull request? This patch graduates a list of Spark SQL APIs and mark them stable. The following are marked stable:

spark git commit: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Java UDF

2016-10-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5aeb7384c -> f00df40cf [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Java UDF Currently pyspark can only call the builtin java UDF, but can not call custom java UDF. It would be better to allow that. 2 benefits: * Leverage the

spark git commit: [SPARK-16063][SQL] Add storageLevel to Dataset

2016-10-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master da9aeb0fd -> 5aeb7384c [SPARK-16063][SQL] Add storageLevel to Dataset [SPARK-11905](https://issues.apache.org/jira/browse/SPARK-11905) added support for `persist`/`cache` for `Dataset`. However, there is no user-facing API to check if a

spark git commit: [SPARK-17863][SQL] should not add column into Distinct

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 d7fa3e324 -> c53b83749 [SPARK-17863][SQL] should not add column into Distinct ## What changes were proposed in this pull request? We are trying to resolve the attribute in sort by pulling up some column for grandchild into child, but

spark git commit: [SPARK-17863][SQL] should not add column into Distinct

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 522dd0d0e -> da9aeb0fd [SPARK-17863][SQL] should not add column into Distinct ## What changes were proposed in this pull request? We are trying to resolve the attribute in sort by pulling up some column for grandchild into child, but

spark git commit: Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables"

2016-10-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7ab86244e -> 522dd0d0e Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables" This reverts commit 7ab86244e30ca81eb4fa40ea77b4c2b8881cbab2. Project:

spark git commit: [SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables

2016-10-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master de1c1ca5c -> 7ab86244e [SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables ## What changes were proposed in this pull request? Make sure the hive.default.fileformat is used to when creating the

spark git commit: [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights.

2016-10-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 05800b4b4 -> de1c1ca5c [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression

spark git commit: [TEST] Ignore flaky test in StreamingQueryListenerSuite

2016-10-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master fa37877af -> 05800b4b4 [TEST] Ignore flaky test in StreamingQueryListenerSuite ## What changes were proposed in this pull request? Ignoring the flaky test introduced in #15307

spark git commit: [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type

2016-10-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 18b173cfc -> 745c5e70f [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type ## What changes were proposed in this pull request? This change adds a check in castToInterval method of Cast

spark git commit: Typo: form -> from

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master a0ebcb3a3 -> fa37877af Typo: form -> from ## What changes were proposed in this pull request? Minor typo fix ## How was this patch tested? Existing unit tests on Jenkins Author: Andrew Ash Closes #15486 from

spark git commit: [DOC] Fix typo in sql hive doc

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7486442fe -> a0ebcb3a3 [DOC] Fix typo in sql hive doc Change is too trivial to file a JIRA. Author: Dhruve Ashar Closes #15485 from dhruve/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-17073][SQL][FOLLOWUP] generate column-level statistics

2016-10-14 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 28b645b1e -> 7486442fe [SPARK-17073][SQL][FOLLOWUP] generate column-level statistics ## What changes were proposed in this pull request? This pr adds some test cases for statistics: case sensitive column names, non ascii column names,

spark git commit: [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master a1b136d05 -> c8b612dec [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference ## What changes were proposed in this pull request? For feature selection method ChiSquareSelector,

spark git commit: [SPARK-14634][ML] Add BisectingKMeansSummary

2016-10-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1db8feab8 -> a1b136d05 [SPARK-14634][ML] Add BisectingKMeansSummary ## What changes were proposed in this pull request? Add BisectingKMeansSummary ## How was this patch tested? unit test Author: Zheng RuiFeng

spark git commit: [SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/load

2016-10-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2fb12b0a3 -> 1db8feab8 [SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/load ## What changes were proposed in this pull request? Since ```ml.evaluation``` has supported save/load at Scala side, supporting it at Python

spark git commit: [SPARK-17903][SQL] MetastoreRelation should talk to external catalog instead of hive client

2016-10-14 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 6c29b3de7 -> 2fb12b0a3 [SPARK-17903][SQL] MetastoreRelation should talk to external catalog instead of hive client ## What changes were proposed in this pull request? `HiveExternalCatalog` should be the only interface to talk to the hive

spark git commit: [SPARK-17925][SQL] Break fileSourceInterfaces.scala into multiple pieces

2016-10-14 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 8543996c3 -> 6c29b3de7 [SPARK-17925][SQL] Break fileSourceInterfaces.scala into multiple pieces ## What changes were proposed in this pull request? This patch does a few changes to the file structure of data sources: - Break