spark git commit: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 and UTF-32

2018-06-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4e7d8678a -> c7e2742f9 [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 and UTF-32 ## What changes were proposed in this pull request? Currently, restrictions in JSONOptions for `encoding` and `lineSep` are the same for read and

spark git commit: [SPARK-24588][SS] streaming join should require HashClusteredPartitioning from children

2018-06-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 3a4b6f3be -> a1e964007 [SPARK-24588][SS] streaming join should require HashClusteredPartitioning from children ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/19080 we simplified the

spark git commit: [SPARK-24588][SS] streaming join should require HashClusteredPartitioning from children

2018-06-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b9a6f7499 -> dc8a6befa [SPARK-24588][SS] streaming join should require HashClusteredPartitioning from children ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/19080 we simplified the

spark git commit: [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches

2018-06-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c8e909cd4 -> b9a6f7499 [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches ## What changes were proposed in this pull request? Wrap the logical plan with a `AnalysisBarrier` for execution plan

spark git commit: [SPARK-24571][SQL] Support Char literals

2018-06-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9de11d3f9 -> 54fcaafb0 [SPARK-24571][SQL] Support Char literals ## What changes were proposed in this pull request? In the PR, I propose to automatically convert a `Literal` with `Char` type to a `Literal` of `String` type. Currently,

spark git commit: [SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand

2018-06-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2cb976355 -> bc0498d58 [SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand ## What changes were proposed in this pull request? Change insert input schema type: "insertRelationType" -> "insertRelationType.asNullable", in

spark git commit: [SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand

2018-06-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 50cdb4138 -> d687d97b1 [SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand ## What changes were proposed in this pull request? Change insert input schema type: "insertRelationType" -> "insertRelationType.asNullable",

spark git commit: [SPARK-24521][SQL][TEST] Fix ineffective test in CachedTableSuite

2018-06-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9a75c1829 -> a78a90464 [SPARK-24521][SQL][TEST] Fix ineffective test in CachedTableSuite ## What changes were proposed in this pull request? test("withColumn doesn't invalidate cached dataframe") in CachedTableSuite doesn't not work

spark git commit: [SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys

2018-06-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 a2f65eb79 -> e6bf325de [SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys `EnsureRequirement` in its `reorder` method currently assumes that the same key appears only once in the join condition. This

spark git commit: [SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys

2018-06-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 534065efe -> fdadc4be0 [SPARK-24495][SQL] EnsureRequirement returns wrong plan when reordering equal keys ## What changes were proposed in this pull request? `EnsureRequirement` in its `reorder` method currently assumes that the same key

spark git commit: [SPARK-24531][TESTS] Replace 2.3.0 version with 2.3.1

2018-06-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1b46f41c5 -> 3bf76918f [SPARK-24531][TESTS] Replace 2.3.0 version with 2.3.1 ## What changes were proposed in this pull request? The PR updates the 2.3 version tested to the new release 2.3.1. ## How was this patch tested? existing UTs

spark git commit: [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite

2018-06-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 c306a8461 -> bf0b21298 [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite Removing version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite as it is not present anymore

spark git commit: [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite

2018-06-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 bf5868757 -> 63e1da162 [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Removing version 2.2.0 from testing versions in

spark git commit: [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite

2018-06-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5d6a53d98 -> 2824f1436 [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Removing version 2.2.0 from testing versions in

spark git commit: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 416cd1fd9 -> 1d9338bb1 [SPARK-23786][SQL] Checking column names of csv headers ## What changes were proposed in this pull request? Currently column names of headers in CSV files are not checked against provided schema of CSV data. It

spark git commit: [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster

2018-06-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 09e78c1ea -> 8ef167a5f [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster ## What changes were proposed in this pull request? Currently we only clean up the local

spark git commit: [SPARK-24337][CORE] Improve error messages for Spark conf values

2018-05-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 24ef7fbfa -> 0053e153f [SPARK-24337][CORE] Improve error messages for Spark conf values ## What changes were proposed in this pull request? Improve the exception messages when retrieving Spark conf values to include the key name when the

spark git commit: [SPARK-24276][SQL] Order of literals in IN should not affect semantic equality

2018-05-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1b36f1488 -> 24ef7fbfa [SPARK-24276][SQL] Order of literals in IN should not affect semantic equality ## What changes were proposed in this pull request? When two `In` operators are created with the same list of values, but different

spark git commit: [SPARK-24366][SQL] Improving of error messages for type converting

2018-05-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fd315f588 -> 1b1528a50 [SPARK-24366][SQL] Improving of error messages for type converting ## What changes were proposed in this pull request? Currently, users are getting the following error messages on type conversions: ```

spark git commit: [SPARK-24244][SPARK-24368][SQL] Passing only required columns to the CSV parser

2018-05-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3b20b34ab -> 64fad0b51 [SPARK-24244][SPARK-24368][SQL] Passing only required columns to the CSV parser ## What changes were proposed in this pull request? uniVocity parser allows to specify only required column names or indexes for

spark git commit: [SPARK-24350][SQL] Fixes ClassCastException in the "array_position" function

2018-05-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f45793329 -> 230f14419 [SPARK-24350][SQL] Fixes ClassCastException in the "array_position" function ## What changes were proposed in this pull request? ### Fixes `ClassCastException` in the `array_position` function -

spark git commit: [SPARK-24294] Throw SparkException when OOM in BroadcastExchangeExec

2018-05-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 84557bc9f -> b7a036b75 [SPARK-24294] Throw SparkException when OOM in BroadcastExchangeExec ## What changes were proposed in this pull request? When OutOfMemoryError thrown from BroadcastExchangeExec, scala.concurrent.Future will hit

spark git commit: [SPARK-24206][SQL] Improve DataSource read benchmark code

2018-05-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5a5a868dc -> 84557bc9f [SPARK-24206][SQL] Improve DataSource read benchmark code ## What changes were proposed in this pull request? This pr added benchmark code `DataSourceReadBenchmark` for `orc`, `paruqet`, `csv`, and `json` based on

spark git commit: Revert "[SPARK-24244][SQL] Passing only required columns to the CSV parser"

2018-05-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master df125062c -> 5a5a868dc Revert "[SPARK-24244][SQL] Passing only required columns to the CSV parser" This reverts commit 8086acc2f676a04ce6255a621ffae871bd09ceea. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f9f055afa -> bc6ea614a [SPARK-24348][SQL] "element_at" error fix ## What changes were proposed in this pull request? ### Fixes a `scala.MatchError` in the `element_at` operation -

spark git commit: [SPARK-24325] Tests for Hadoop's LinesReader

2018-05-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ffaefe755 -> b550b2a1a [SPARK-24325] Tests for Hadoop's LinesReader ## What changes were proposed in this pull request? The tests cover basic functionality of [Hadoop

spark git commit: [SPARK-24308][SQL] Handle DataReaderFactory to InputPartition rename in left over classes

2018-05-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a53ea70c1 -> 710e4e81a [SPARK-24308][SQL] Handle DataReaderFactory to InputPartition rename in left over classes ## What changes were proposed in this pull request? SPARK-24073 renames DataReaderFactory -> InputPartition and DataReader

spark git commit: [SPARK-24312][SQL] Upgrade to 2.3.3 for Hive Metastore Client 2.3

2018-05-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1c4553d67 -> 7f82c4a47 [SPARK-24312][SQL] Upgrade to 2.3.3 for Hive Metastore Client 2.3 ## What changes were proposed in this pull request? Hive 2.3.3 was [released on April

spark git commit: Revert "[SPARK-24277][SQL] Code clean up in SQL module: HadoopMapReduceCommitProtocol"

2018-05-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ed7ba7db8 -> 1c4553d67 Revert "[SPARK-24277][SQL] Code clean up in SQL module: HadoopMapReduceCommitProtocol" This reverts commit 7b2dca5b12164b787ec4e8e7e9f92c60a3f9563e. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-24027][SQL] Support MapType with StringType for keys as the root type by from_json

2018-05-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 075d678c8 -> 8cd83acf4 [SPARK-24027][SQL] Support MapType with StringType for keys as the root type by from_json ## What changes were proposed in this pull request? Currently, the from_json function support StructType or ArrayType as the

spark git commit: [SPARK-24246][SQL] Improve AnalysisException by setting the cause when it's available

2018-05-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1430fa80e -> c26f67325 [SPARK-24246][SQL] Improve AnalysisException by setting the cause when it's available ## What changes were proposed in this pull request? If there is an exception, it's better to set it as the cause of

spark git commit: [SPARK-24246][SQL] Improve AnalysisException by setting the cause when it's available

2018-05-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 88003f02c -> 2f60df09d [SPARK-24246][SQL] Improve AnalysisException by setting the cause when it's available ## What changes were proposed in this pull request? If there is an exception, it's better to set it as the cause of

spark git commit: [SPARK-24172][SQL] we should not apply operator pushdown to data source v2 many times

2018-05-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 54032682b -> 928845a42 [SPARK-24172][SQL] we should not apply operator pushdown to data source v2 many times ## What changes were proposed in this pull request? In `PushDownOperatorsToDataSource`, we use `transformUp` to match

spark git commit: [SPARK-24171] Adding a note for non-deterministic functions

2018-05-10 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 94d671448 -> f4fed0512 [SPARK-24171] Adding a note for non-deterministic functions ## What changes were proposed in this pull request? I propose to add a clear statement for functions like `collect_list()` about non-deterministic

[2/2] spark git commit: [SPARK-24073][SQL] Rename DataReaderFactory to InputPartition.

2018-05-09 Thread lixiao
[SPARK-24073][SQL] Rename DataReaderFactory to InputPartition. ## What changes were proposed in this pull request? Renames: * `DataReaderFactory` to `InputPartition` * `DataReader` to `InputPartitionReader` * `createDataReaderFactories` to `planInputPartitions` *

[1/2] spark git commit: [SPARK-24073][SQL] Rename DataReaderFactory to InputPartition.

2018-05-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9341c951e -> 62d01391f http://git-wip-us.apache.org/repos/asf/spark/blob/62d01391/sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSchemaRequiredDataSource.java

spark git commit: [SPARK-23852][SQL] Add test that fails if PARQUET-1217 is not fixed

2018-05-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9e3bb3136 -> 9341c951e [SPARK-23852][SQL] Add test that fails if PARQUET-1217 is not fixed ## What changes were proposed in this pull request? Add a new test that triggers if PARQUET-1217 - a predicate pushdown bug - is not fixed in

spark git commit: [SPARK-24017][SQL] Refactor ExternalCatalog to be an interface

2018-05-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dd4b1b9c7 -> f38ea00e8 [SPARK-24017][SQL] Refactor ExternalCatalog to be an interface ## What changes were proposed in this pull request? This refactors the external catalog to be an interface. It can be easier for the future work in the

spark git commit: [SPARK-24168][SQL] WindowExec should not access SQLConf at executor side

2018-05-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8509284e1 -> d35eb2f9b [SPARK-24168][SQL] WindowExec should not access SQLConf at executor side ## What changes were proposed in this pull request? This PR is extracted from #21190 , to make it easier to backport.

spark git commit: [SPARK-24168][SQL] WindowExec should not access SQLConf at executor side

2018-05-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e3201e165 -> e646ae67f [SPARK-24168][SQL] WindowExec should not access SQLConf at executor side ## What changes were proposed in this pull request? This PR is extracted from #21190 , to make it easier to backport.

spark git commit: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite should verify the downloaded file

2018-05-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 154bbc959 -> 768d0b7ce [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite should verify the downloaded file ## What changes were proposed in this pull request? This is a backport of #21210 because `branch-2.2` also

spark git commit: [SPARK-24035][SQL] SQL syntax for Pivot

2018-05-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 94641fe6c -> e3201e165 [SPARK-24035][SQL] SQL syntax for Pivot ## What changes were proposed in this pull request? Add SQL support for Pivot according to Pivot grammar defined by Oracle

spark git commit: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark

2018-05-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5be8aab14 -> e4c91c089 [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark ## What changes were proposed in this pull request? This pr added the TPCDS v2.7 (latest) queries in `TPCDSQueryBenchmark`. These query

spark git commit: [SPARK-24123][SQL] Fix precision issues in monthsBetween with more than 8 digits

2018-05-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8bd27025b -> 504c9cfd2 [SPARK-24123][SQL] Fix precision issues in monthsBetween with more than 8 digits ## What changes were proposed in this pull request? SPARK-23902 introduced the ability to retrieve more than 8 digits in

spark git commit: [SPARK-24133][SQL] Check for integer overflows when resizing WritableColumnVectors

2018-05-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8dbf56c05 -> 8bd27025b [SPARK-24133][SQL] Check for integer overflows when resizing WritableColumnVectors ## What changes were proposed in this pull request? `ColumnVector`s store string data in one big byte array. Since the array size

spark git commit: [SPARK-23971][BACKPORT-2.3] Should not leak Spark sessions across test suites

2018-05-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 88abf7b9b -> b3adb5300 [SPARK-23971][BACKPORT-2.3] Should not leak Spark sessions across test suites This PR is to backport the PR https://github.com/apache/spark/pull/21058 to Apache 2.3. This should be the cause why we saw the test

spark git commit: [SPARK-24013][SQL] Remove unneeded compress in ApproximatePercentile

2018-05-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 152eaf6ae -> 8dbf56c05 [SPARK-24013][SQL] Remove unneeded compress in ApproximatePercentile ## What changes were proposed in this pull request? `ApproximatePercentile` contains a workaround logic to compress the samples since at the

spark git commit: [SPARK-24072][SQL] clearly define pushed filters

2018-04-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3121b411f -> b42ad165b [SPARK-24072][SQL] clearly define pushed filters ## What changes were proposed in this pull request? filters like parquet row group filter, which is actually pushed to the data source but still to be evaluated by

spark git commit: [SPARK-24085][SQL] Query returns UnsupportedOperationException when scalar subquery is present in partitioning expression

2018-04-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2824f12b8 -> 3fd297af6 [SPARK-24085][SQL] Query returns UnsupportedOperationException when scalar subquery is present in partitioning expression ## What changes were proposed in this pull request? In this case, the partition pruning

spark git commit: [SPARK-24012][SQL][TEST][FOLLOWUP] add unit test

2018-04-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 396938ef0 -> ac4ca7c4d [SPARK-24012][SQL][TEST][FOLLOWUP] add unit test ## What changes were proposed in this pull request? a followup of https://github.com/apache/spark/pull/21100 ## How was this patch tested? N/A Author: Wenchen Fan

spark git commit: Revert "[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics"

2018-04-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8eb9a411d -> 1c3e8205d Revert "[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics" This reverts commit c2f4ee7baf07501cc1f8a23dd21d14aea53606c7. Project:

spark git commit: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

2018-04-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 d91410029 -> c2f4ee7ba [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics >What changes were proposed in this pull request? During evaluation of IN

spark git commit: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

2018-04-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7bc853d08 -> c48085aa9 [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics >What changes were proposed in this pull request? During evaluation of IN conditions, if

spark git commit: [SPARK-24033][SQL] Fix Mismatched of Window Frame specifiedwindowframe(RowFrame, -1, -1)

2018-04-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8eb64a5e2 -> d91410029 [SPARK-24033][SQL] Fix Mismatched of Window Frame specifiedwindowframe(RowFrame, -1, -1) ## What changes were proposed in this pull request? When the OffsetWindowFunction's frame is `UnaryMinus(Literal(1))` but

spark git commit: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

2018-04-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 fb968215c -> be184d16e [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3 ## What changes were proposed in this pull request? This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release

spark git commit: [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table

2018-04-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 a1c56b669 -> 5bcb7bdcc [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table ## What changes were proposed in this pull request? TableReader would get disproportionately slower as the number of

spark git commit: [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.

2018-04-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 e957c4e88 -> a902323fb [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen. `EqualNullSafe` for `FloatType` and `DoubleType` might generate a wrong result by codegen. ```scala

spark git commit: [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.

2018-04-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f81fa478f -> f09a9e941 [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen. ## What changes were proposed in this pull request? `EqualNullSafe` for `FloatType` and `DoubleType` might

spark git commit: [SPARK-24002][SQL] Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes

2018-04-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 310a8cd06 -> cce469435 [SPARK-24002][SQL] Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes ## What changes were proposed in this pull request? ``` Py4JJavaError: An error occurred while

spark git commit: [SPARK-23917][SQL] Add array_max function

2018-04-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c0964935d -> 693102203 [SPARK-23917][SQL] Add array_max function ## What changes were proposed in this pull request? The PR adds the SQL function `array_max`. It takes an array as argument and returns the maximum value in it. ## How was

spark git commit: [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table

2018-04-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 25892f3cc -> 558f31b31 [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table ## What changes were proposed in this pull request? TableReader would get disproportionately slower as the number of

spark git commit: [SPARK-23905][SQL] Add UDF weekday

2018-04-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4b0703679 -> 0323e6146 [SPARK-23905][SQL] Add UDF weekday ## What changes were proposed in this pull request? Add UDF weekday ## How was this patch tested? A new test Author: yucai Closes #21009 from yucai/SPARK-23905.

spark git commit: [SPARK-23971] Should not leak Spark sessions across test suites

2018-04-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ab7b961a4 -> 1018be44d [SPARK-23971] Should not leak Spark sessions across test suites ## What changes were proposed in this pull request? Many suites currently leak Spark sessions (sometimes with stopped SparkContexts) via the

spark git commit: Revert "[SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars as transient"

2018-04-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9d960de08 -> e904dfaf0 Revert "[SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars as transient" This reverts commit 271c891b91917d660d1f6b995de397c47c7a6058. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-19724][SQL][FOLLOW-UP] Check location of managed table when ignoreIfExists is true

2018-04-10 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3323b156f -> e17965891 [SPARK-19724][SQL][FOLLOW-UP] Check location of managed table when ignoreIfExists is true ## What changes were proposed in this pull request? In the PR #20886, I mistakenly check the table location only when

spark git commit: [SPARK-23898][SQL] Simplify add & subtract code generation

2018-04-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f94f3624e -> 649888415 [SPARK-23898][SQL] Simplify add & subtract code generation ## What changes were proposed in this pull request? Code generation for the `Add` and `Subtract` expressions was not done using the

spark git commit: [SPARK-23947][SQL] Add hashUTF8String convenience method to hasher classes

2018-04-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 61b724724 -> f94f3624e [SPARK-23947][SQL] Add hashUTF8String convenience method to hasher classes ## What changes were proposed in this pull request? Add `hashUTF8String()` to the hasher classes to allow Spark SQL codegen to generate

spark git commit: [SPARK-22856][SQL] Add wrappers for codegen output and nullability

2018-04-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 10f45bb82 -> 7c1654e21 [SPARK-22856][SQL] Add wrappers for codegen output and nullability ## What changes were proposed in this pull request? The codegen output of `Expression`, aka `ExprCode`, now encapsulates only strings of output

spark git commit: [SPARK-23881][CORE][TEST] Fix flaky test JobCancellationSuite."interruptible iterator of shuffle reader"

2018-04-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 1a537a2ad -> bf1dabede [SPARK-23881][CORE][TEST] Fix flaky test JobCancellationSuite."interruptible iterator of shuffle reader" ## What changes were proposed in this pull request? The test case JobCancellationSuite."interruptible

spark git commit: [SPARK-23881][CORE][TEST] Fix flaky test JobCancellationSuite."interruptible iterator of shuffle reader"

2018-04-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 32471ba0a -> d81f29eca [SPARK-23881][CORE][TEST] Fix flaky test JobCancellationSuite."interruptible iterator of shuffle reader" ## What changes were proposed in this pull request? The test case JobCancellationSuite."interruptible

spark git commit: [SPARK-23809][SQL][BACKPORT] Active SparkSession should be set by getOrCreate

2018-04-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 ccc4a2045 -> 1a537a2ad [SPARK-23809][SQL][BACKPORT] Active SparkSession should be set by getOrCreate This backports https://github.com/apache/spark/pull/20927 to branch-2.3 ## What changes were proposed in this pull request?

spark git commit: [SPARK-23822][SQL] Improve error message for Parquet schema mismatches

2018-04-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f93667f84 -> ccc4a2045 [SPARK-23822][SQL] Improve error message for Parquet schema mismatches ## What changes were proposed in this pull request? This pull request tries to improve the error message for spark while reading parquet

spark git commit: [SPARK-23822][SQL] Improve error message for Parquet schema mismatches

2018-04-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6ade5cbb4 -> 945240193 [SPARK-23822][SQL] Improve error message for Parquet schema mismatches ## What changes were proposed in this pull request? This pull request tries to improve the error message for spark while reading parquet files

spark git commit: [SPARK-19724][SQL] create a managed table with an existed default table should throw an exception

2018-04-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d65e531b4 -> 249007e37 [SPARK-19724][SQL] create a managed table with an existed default table should throw an exception ## What changes were proposed in this pull request? This PR is to finish https://github.com/apache/spark/pull/17272

spark git commit: [SPARK-23823][SQL] Keep origin in transformExpression

2018-04-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 0b7b8cced -> f93667f84 [SPARK-23823][SQL] Keep origin in transformExpression Fixes https://issues.apache.org/jira/browse/SPARK-23823 Keep origin for all the methods using transformExpression ## What changes were proposed in this pull

spark git commit: [SPARK-23823][SQL] Keep origin in transformExpression

2018-04-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f2ac08795 -> d65e531b4 [SPARK-23823][SQL] Keep origin in transformExpression Fixes https://issues.apache.org/jira/browse/SPARK-23823 Keep origin for all the methods using transformExpression ## What changes were proposed in this pull

spark git commit: [SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unresolved state

2018-04-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f36bdb401 -> 28c9adbd6 [SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unresolved state ## What changes were proposed in this pull request? Add cast to nulls introduced by PropagateEmptyRelation so in cases they're

spark git commit: [SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unresolved state

2018-04-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 359375eff -> 5cfd5fabc [SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unresolved state ## What changes were proposed in this pull request? Add cast to nulls introduced by PropagateEmptyRelation so in cases they're part

spark git commit: [SPARK-23809][SQL] Active SparkSession should be set by getOrCreate

2018-04-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1035aaa61 -> 359375eff [SPARK-23809][SQL] Active SparkSession should be set by getOrCreate ## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to

spark git commit: [SPARK-23808][SQL] Set default Spark session in test-only spark sessions.

2018-03-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a7755fd8c -> b34890119 [SPARK-23808][SQL] Set default Spark session in test-only spark sessions. ## What changes were proposed in this pull request? Set default Spark session in the TestSparkSession and TestHiveSparkSession constructors.

spark git commit: [SPARK-23808][SQL] Set default Spark session in test-only spark sessions.

2018-03-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 516304521 -> 1365d739d [SPARK-23808][SQL] Set default Spark session in test-only spark sessions. ## What changes were proposed in this pull request? Set default Spark session in the TestSparkSession and TestHiveSparkSession

spark git commit: Revert "[SPARK-23096][SS] Migrate rate source to V2"

2018-03-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 34c4b9c57 -> 761565a3c Revert "[SPARK-23096][SS] Migrate rate source to V2" This reverts commit c68ec4e6a1ed9ea13345c7705ea60ff4df7aec7b. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-23549][SQL] Cast to timestamp when comparing timestamp with date

2018-03-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5f653d4f7 -> e4bec7cb8 [SPARK-23549][SQL] Cast to timestamp when comparing timestamp with date ## What changes were proposed in this pull request? This PR fixes an incorrect comparison in SQL between timestamp and date. This is because

[1/2] spark git commit: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQuerySuite

2018-03-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 816a5496b -> 5f653d4f7 http://git-wip-us.apache.org/repos/asf/spark/blob/5f653d4f/sql/core/src/test/resources/tpcds-v2.7.0/q98.sql -- diff --git

[2/2] spark git commit: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQuerySuite

2018-03-25 Thread lixiao
[SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQuerySuite ## What changes were proposed in this pull request? This pr added TPCDS v2.7 (latest) queries in `TPCDSQuerySuite` because the current `TPCDSQuerySuite` tests older one (v1.4) and some queries are different from v1.4 and v2.7. Since

spark git commit: [SPARK-23500][SQL] Fix complex type simplification rules to apply to entire plan

2018-03-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2c4b9962f -> 477d6bd72 [SPARK-23500][SQL] Fix complex type simplification rules to apply to entire plan ## What changes were proposed in this pull request? Complex type simplification optimizer rules were not applied to the entire plan,

spark git commit: [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default`

2018-03-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 d9e1f7040 -> 21b6de459 [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` ## What changes were proposed in this pull request? Currently, some tests have an assumption that

spark git commit: [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default`

2018-03-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c95200048 -> 5414abca4 [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` ## What changes were proposed in this pull request? Currently, some tests have an assumption that

spark git commit: [SPARK-23523][SQL][BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery

2018-03-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 a8e357ada -> 33ba8db8d [SPARK-23523][SQL][BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery This PR is to backport https://github.com/apache/spark/pull/20684 and

spark git commit: [SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2

2018-03-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 b083bd107 -> 5bd306c38 [SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2 ## What changes were proposed in this pull request? Revise doc of method pushFilters in

spark git commit: [SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2

2018-03-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2ca9bb083 -> 10b0657b0 [SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2 ## What changes were proposed in this pull request? Revise doc of method pushFilters in SupportsPushDownFilters/SupportsPushDownCatalystFilters

spark git commit: [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON

2018-03-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 3ec25d5a8 -> b083bd107 [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON ## What changes were proposed in this pull request? The from_json() function accepts an additional parameter, where the user

spark git commit: [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON

2018-03-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2c3673680 -> 2ca9bb083 [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON ## What changes were proposed in this pull request? The from_json() function accepts an additional parameter, where the user might

spark git commit: [SPARK-23525][BACKPORT][SQL] Support ALTER TABLE CHANGE COLUMN COMMENT for external hive table

2018-03-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 4864d2104 -> 175b221bc [SPARK-23525][BACKPORT][SQL] Support ALTER TABLE CHANGE COLUMN COMMENT for external hive table ## What changes were proposed in this pull request? The following query doesn't work as expected: ``` CREATE

spark git commit: [SPARK-23490][BACKPORT][SQL] Check storage.locationUri with existing table in CreateTable

2018-03-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 86ca91551 -> 1dd37ff3b [SPARK-23490][BACKPORT][SQL] Check storage.locationUri with existing table in CreateTable Backport #20660 to branch 2.3 = ## What changes were proposed in this pull request?

spark git commit: [SPARK-23550][CORE] Cleanup `Utils`.

2018-03-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 53561d27c -> c99fc9ad9 [SPARK-23550][CORE] Cleanup `Utils`. A few different things going on: - Remove unused methods. - Move JSON methods to the only class that uses them. - Move test-only methods to TestUtils. - Make getMaxResultSize() a

spark git commit: [MINOR][DOCS] Fix a link in "Compatibility with Apache Hive"

2018-03-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 c8aa6fbb0 -> 88dd335f6 [MINOR][DOCS] Fix a link in "Compatibility with Apache Hive" ## What changes were proposed in this pull request? This PR fixes a broken link as below: **Before:**

spark git commit: [MINOR][DOCS] Fix a link in "Compatibility with Apache Hive"

2018-03-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7965c91d8 -> 269cd5359 [MINOR][DOCS] Fix a link in "Compatibility with Apache Hive" ## What changes were proposed in this pull request? This PR fixes a broken link as below: **Before:**

spark git commit: [SPARK-23570][SQL] Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite

2018-03-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8fe20e151 -> f12fa13f1 [SPARK-23570][SQL] Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite since Spark 2.3.0 is released for

spark git commit: [SPARK-23570][SQL] Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite

2018-03-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 707e6506d -> 487377e69 [SPARK-23570][SQL] Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite since Spark 2.3.0 is released for

<    2   3   4   5   6   7   8   9   10   11   >