spark git commit: [SPARK-19618][SQL] Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

2017-02-15 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 8487902a9 -> f041e55ee [SPARK-19618][SQL] Inconsistency wrt max. buckets allowed from Dataframe API vs SQL ## What changes were proposed in this pull request? Jira: https://issues.apache.org/jira/browse/SPARK-19618 Moved the check for

spark git commit: [SPARK-19399][SPARKR][BACKPORT-2.1] fix tests broken by merge

2017-02-15 Thread felixcheung
Repository: spark Updated Branches: refs/heads/branch-2.1 db7adb61b -> 252dd05f0 [SPARK-19399][SPARKR][BACKPORT-2.1] fix tests broken by merge ## What changes were proposed in this pull request? fix test broken by git merge for #16739 ## How was this patch tested? manual Author: Felix

spark git commit: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 4th batch

2017-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fc02ef95c -> 8487902a9 [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 4th batch ## What changes were proposed in this pull request? This is 4th batch of test case for IN/NOT IN subquery. In this PR, it has these test

spark git commit: [SPARK-19603][SS] Fix StreamingQuery explain command

2017-02-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 08c1972a0 -> fc02ef95c [SPARK-19603][SS] Fix StreamingQuery explain command ## What changes were proposed in this pull request? `StreamingQuery.explain` doesn't show the correct streaming physical plan right now because `ExplainCommand`

spark git commit: [SPARK-19603][SS] Fix StreamingQuery explain command

2017-02-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 b9ab4c0e9 -> db7adb61b [SPARK-19603][SS] Fix StreamingQuery explain command ## What changes were proposed in this pull request? `StreamingQuery.explain` doesn't show the correct streaming physical plan right now because

spark git commit: [SPARK-19604][TESTS] Log the start of every Python test

2017-02-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.1 88c43f4fb -> b9ab4c0e9 [SPARK-19604][TESTS] Log the start of every Python test ## What changes were proposed in this pull request? Right now, we only have info level log after we finish the tests of a Python test file. We should also

spark git commit: [SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing

2017-02-15 Thread yliang
Repository: spark Updated Branches: refs/heads/master 21b4ba2d6 -> 08c1972a0 [SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The API changes was

spark git commit: [SPARK-19599][SS] Clean up HDFSMetadataLog

2017-02-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6c3539906 -> 88c43f4fb [SPARK-19599][SS] Clean up HDFSMetadataLog ## What changes were proposed in this pull request? SPARK-19464 removed support for Hadoop 2.5 and earlier, so we can do some cleanup for HDFSMetadataLog. This PR

spark git commit: [SPARK-19599][SS] Clean up HDFSMetadataLog

2017-02-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f6c3bba22 -> 21b4ba2d6 [SPARK-19599][SS] Clean up HDFSMetadataLog ## What changes were proposed in this pull request? SPARK-19464 removed support for Hadoop 2.5 and earlier, so we can do some cleanup for HDFSMetadataLog. This PR

spark git commit: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-15 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 6a9a85b84 -> 865b2fd84 [SPARK-18937][SQL] Timezone support in CSV/JSON parsing ## What changes were proposed in this pull request? This is a follow-up pr of #16308. This pr enables timezone support in CSV/JSON parsing. We should

spark git commit: [SPARK-19329][SQL] Reading from or writing to a datasource table with a non pre-existing location should succeed

2017-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 59dc26e37 -> 6a9a85b84 [SPARK-19329][SQL] Reading from or writing to a datasource table with a non pre-existing location should succeed ## What changes were proposed in this pull request? when we insert data into a datasource table use

spark git commit: [SPARK-19607][HOTFIX] Finding QueryExecution that matches provided executionId

2017-02-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3755da76c -> 59dc26e37 [SPARK-19607][HOTFIX] Finding QueryExecution that matches provided executionId ## What changes were proposed in this pull request? #16940 adds a test case which does not stop the spark job. It causes many failures

spark git commit: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column

2017-02-15 Thread felixcheung
Repository: spark Updated Branches: refs/heads/branch-2.1 8ee4ec812 -> 6c3539906 [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column manual, unit tests Author: Felix Cheung

spark git commit: [SPARK-19331][SQL][TESTS] Improve the test coverage of SQLViewSuite

2017-02-15 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 671bc08ed -> 3755da76c [SPARK-19331][SQL][TESTS] Improve the test coverage of SQLViewSuite Move `SQLViewSuite` from `sql/hive` to `sql/core`, so we can test the view supports without hive metastore. Also moved the test cases that

spark git commit: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column

2017-02-15 Thread felixcheung
Repository: spark Updated Branches: refs/heads/master c97f4e17d -> 671bc08ed [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column ## What changes were proposed in this pull request? Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column ## How

spark git commit: [SPARK-19160][PYTHON][SQL] Add udf decorator

2017-02-15 Thread holden
Repository: spark Updated Branches: refs/heads/master 6eca21ba8 -> c97f4e17d [SPARK-19160][PYTHON][SQL] Add udf decorator ## What changes were proposed in this pull request? This PR adds `udf` decorator syntax as proposed in [SPARK-19160](https://issues.apache.org/jira/browse/SPARK-19160).

spark git commit: [SPARK-19590][PYSPARK][ML] Update the document for QuantileDiscretizer in pyspark

2017-02-15 Thread holden
Repository: spark Updated Branches: refs/heads/master acf71c63c -> 6eca21ba8 [SPARK-19590][PYSPARK][ML] Update the document for QuantileDiscretizer in pyspark ## What changes were proposed in this pull request? This PR is to document the changes on QuantileDiscretizer in pyspark for PR:

spark git commit: [SPARK-16475][SQL] broadcast hint for SQL queries - disallow space as the delimiter

2017-02-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master a8a139820 -> acf71c63c [SPARK-16475][SQL] broadcast hint for SQL queries - disallow space as the delimiter ## What changes were proposed in this pull request? A follow-up to disallow space as the delimiter in broadcast hint. ## How was

spark git commit: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subquery (Joins + CTE)

2017-02-15 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 5ad10c531 -> a8a139820 [SPARK-18872][SQL][TESTS] New test cases for EXISTS subquery (Joins + CTE) ## What changes were proposed in this pull request? This PR adds the third and final set of tests for EXISTS subquery. File name

spark git commit: [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 2 of 2) - scalar subquery in predicate context

2017-02-15 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master d22db6278 -> 5ad10c531 [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 2 of 2) - scalar subquery in predicate context ## What changes were proposed in this pull request? This PR adds new test cases for scalar subquery in

spark git commit: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 2nd batch

2017-02-15 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 601b9c3e6 -> d22db6278 [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 2nd batch ## What changes were proposed in this pull request? This is 2nd batch of test case for IN/NOT IN subquery. In this PR, it has these test

spark git commit: [SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics

2017-02-15 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 8b75f8c1c -> 601b9c3e6 [SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics ## What changes were proposed in this pull request? Support cardinality estimation and stats propagation for all join types.

spark git commit: [SPARK-19587][SQL] bucket sorting columns should not be picked from partition columns

2017-02-15 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 733c59ec1 -> 8b75f8c1c [SPARK-19587][SQL] bucket sorting columns should not be picked from partition columns ## What changes were proposed in this pull request? We will throw an exception if bucket columns are part of partition columns,

spark git commit: [SPARK-16475][SQL] broadcast hint for SQL queries - follow up

2017-02-15 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master b55563c17 -> 733c59ec1 [SPARK-16475][SQL] broadcast hint for SQL queries - follow up ## What changes were proposed in this pull request? A small update to https://github.com/apache/spark/pull/16925 1. Rename SubstituteHints ->

spark git commit: [SPARK-19607] Finding QueryExecution that matches provided executionId

2017-02-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3973403d5 -> b55563c17 [SPARK-19607] Finding QueryExecution that matches provided executionId ## What changes were proposed in this pull request? Implementing a mapping between executionId and corresponding QueryExecution in

spark git commit: [SPARK-19456][SPARKR] Add LinearSVC R API

2017-02-15 Thread felixcheung
Repository: spark Updated Branches: refs/heads/master 447b2b530 -> 3973403d5 [SPARK-19456][SPARKR] Add LinearSVC R API ## What changes were proposed in this pull request? Linear SVM classifier is newly added into ML and python API has been added. This JIRA is to add R side API. Marked as