spark git commit: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix

2018-09-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d25f425c9 -> 4a1120953 [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix One more legacy config to go ... Closes #22515 from rxin/allowCreatingManagedTableUsingNonemptyLocation. Authored-by:

spark git commit: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0.10

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5d25e1544 -> 596af211a [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0.10 ## What changes were proposed in this pull request? This PR upgrades Spark's use of Janino from 3.0.9 to 3.0.10. Note that 3.0.10 is a out-of-band release

spark git commit: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 43c62e797 -> 51f3659b7 [SPARK-24777][SQL] Add write benchmark for AVRO ## What changes were proposed in this pull request? Refactor `DataSourceWriteBenchmark` and add write benchmark for AVRO. ## How was this patch tested? Build and

spark git commit: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 77e52448e -> 950ab7995 [SPARK-24777][SQL] Add write benchmark for AVRO ## What changes were proposed in this pull request? Refactor `DataSourceWriteBenchmark` and add write benchmark for AVRO. ## How was this patch tested? Build and run

spark git commit: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 dad5c48b2 -> 7edfdfcec [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation ## What changes were proposed in this pull request?

spark git commit: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 fc036729c -> c67c597b6 [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation ## What changes were proposed in this pull request?

spark git commit: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation

2018-09-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 88e7e87bd -> 88446b6ad [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation ## What changes were proposed in this pull request? The

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

2018-09-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 06efed290 -> dfcff3839 [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested schema pruning. ## How was this

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

2018-09-19 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 47d6e80a2 -> 76399d75e [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested schema pruning. ## How was this patch

spark git commit: [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6c7db7fd1 -> 4193c7623 [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation ## What changes were proposed in this pull request? To be more consistent with other statistics based configs. ## How was this patch

spark git commit: [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 00ede120e -> f11f44548 [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation ## What changes were proposed in this pull request? To be more consistent with other statistics based configs. ## How was this patch

spark git commit: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 ba8560a96 -> 00ede120e [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? `spark.sql.fromJsonForceNullableSchema` -> `spark.sql.function.fromJson.forceNullable` ## How

spark git commit: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 497f00f62 -> 6c7db7fd1 [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? `spark.sql.fromJsonForceNullableSchema` -> `spark.sql.function.fromJson.forceNullable` ## How was

spark git commit: [SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CURRENT_TIMESTAMP

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 c40375190 -> ffd448bb0 [SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CURRENT_TIMESTAMP ## What changes were proposed in this pull request? SPARK-22333 introduced a regression in the resolution of `CURRENT_DATE`

spark git commit: [SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CURRENT_TIMESTAMP

2018-09-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master acc645257 -> ba838fee0 [SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CURRENT_TIMESTAMP ## What changes were proposed in this pull request? SPARK-22333 introduced a regression in the resolution of `CURRENT_DATE` and

spark git commit: [SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c_nationkey to BIGINT according to spec

2018-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fefaa3c30 -> 02c2963f8 [SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c_nationkey to BIGINT according to spec ## What changes were proposed in this pull request? Fixes TPCH DDL datatype of `customer.c_nationkey` from

spark git commit: [SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c_nationkey to BIGINT according to spec

2018-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 b40e5feec -> b839721f3 [SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c_nationkey to BIGINT according to spec ## What changes were proposed in this pull request? Fixes TPCH DDL datatype of `customer.c_nationkey`

spark git commit: [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT

2018-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5ebef33c8 -> bb2f069cf [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT ## What changes were proposed in this pull request? In the dev list, we can still discuss whether the next version is 2.5.0 or 3.0.0. Let us first bump the

spark git commit: [SPARK-25426][SQL] Remove the duplicate fallback logic in UnsafeProjection

2018-09-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master be454a7ce -> 5ebef33c8 [SPARK-25426][SQL] Remove the duplicate fallback logic in UnsafeProjection ## What changes were proposed in this pull request? This pr removed the duplicate fallback logic in `UnsafeProjection`. This pr comes from

spark git commit: [SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the format of the example results.

2018-09-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 8cdf7f4c9 -> 59054fa89 [SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the format of the example results. ## What changes were proposed in this pull request? There are some mistakes in examples of newly added functions.

spark git commit: [SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the format of the example results.

2018-09-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a81ef9e1f -> 9c25d7f73 [SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the format of the example results. ## What changes were proposed in this pull request? There are some mistakes in examples of newly added functions. Also

spark git commit: [SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties.

2018-09-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9deddbb13 -> a81ef9e1f [SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties. ## What changes were proposed in this pull request? When Hive support enabled, Hive catalog puts extra

spark git commit: [SPARK-25415][SQL] Make plan change log in RuleExecutor configurable by SQLConf

2018-09-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 08c76b5d3 -> 8b702e1e0 [SPARK-25415][SQL] Make plan change log in RuleExecutor configurable by SQLConf ## What changes were proposed in this pull request? In RuleExecutor, after applying a rule, if the plan has changed, the before and

[5/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java -- diff --git

[3/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/MicroBatchWritSupport.scala -- diff --git

[4/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala -- diff --git

[1/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 4c1428fa2 -> 15d2e9d7d http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala

[7/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
[SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4 ## What changes were proposed in this pull request? As discussed in the dev list, we don't want to include https://github.com/apache/spark/pull/22009 in Spark 2.4, as it needs data source v2 users to change the

[6/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchReadSupportProvider.java -- diff --git

[2/7] spark git commit: [SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4

2018-09-12 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/15d2e9d7/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriteSupportSuite.scala -- diff --git

spark git commit: Revert "[SPARK-25072][PYSPARK] Forbid extra value for custom Row"

2018-09-10 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 5ad644a4c -> 4b578184f Revert "[SPARK-25072][PYSPARK] Forbid extra value for custom Row" This reverts commit 31dab7140a4b271e7b976762af7a36f8bfbb8381. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark-website git commit: Fix the case.

2018-09-10 Thread lixiao
Repository: spark-website Updated Branches: refs/heads/asf-site 2f6290154 -> 633724167 Fix the case. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/63372416 Tree:

spark-website git commit: adding myself to committers.md

2018-09-10 Thread lixiao
Repository: spark-website Updated Branches: refs/heads/asf-site 9d5aa3ea4 -> 2f6290154 adding myself to committers.md Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/2f629015 Tree:

spark git commit: [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result

2018-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 5b8b6b4e9 -> 5ad644a4c [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result How to reproduce: ```scala val df1 = spark.createDataFrame(Seq( (1, 1) )).toDF("a", "b").withColumn("c", lit(null).cast("int")) val df2 =

spark git commit: [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result

2018-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 6b7ea78ae -> c1c1bda3c [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result ## What changes were proposed in this pull request? How to reproduce: ```scala val df1 = spark.createDataFrame(Seq( (1, 1) )).toDF("a",

spark git commit: [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result

2018-09-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 88a930dfa -> 77c996403 [SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result ## What changes were proposed in this pull request? How to reproduce: ```scala val df1 = spark.createDataFrame(Seq( (1, 1) )).toDF("a",

spark git commit: [SPARK-20636] Add new optimization rule to transpose adjacent Window expressions.

2018-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 26f74b7cb -> 78981efc2 [SPARK-20636] Add new optimization rule to transpose adjacent Window expressions. ## What changes were proposed in this pull request? Add new optimization rule to eliminate unnecessary shuffling by flipping

spark git commit: [SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in UDFSuite

2018-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 904192ad1 -> 8f7d8a097 [SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in UDFSuite ## What changes were proposed in this pull request? At Spark 2.0.0, SPARK-14335 adds some [commented-out test

spark git commit: [SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in UDFSuite

2018-09-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 08c02e637 -> 26f74b7cb [SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in UDFSuite ## What changes were proposed in this pull request? At Spark 2.0.0, SPARK-14335 adds some [commented-out test

spark git commit: [SPARK-12321][SQL][FOLLOW-UP] Add tests for fromString

2018-09-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6d7bc5af4 -> f96a8bf8f [SPARK-12321][SQL][FOLLOW-UP] Add tests for fromString ## What changes were proposed in this pull request? Add test cases for fromString ## How was this patch tested? N/A Closes #22345 from gatorsmile/addTest.

spark git commit: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive

2018-09-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ed249db9c -> 6d7bc5af4 [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive ## What changes were proposed in this pull request? In SharedSparkSession and TestHive, we need to disable the rule

spark git commit: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive

2018-09-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.4 f9b476c6a -> 872bad161 [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive ## What changes were proposed in this pull request? In SharedSparkSession and TestHive, we need to disable the

spark git commit: [SPARK-23243][CORE] Fix RDD.repartition() data correctness issue

2018-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 559b899ac -> 71bd79651 [SPARK-23243][CORE] Fix RDD.repartition() data correctness issue ## What changes were proposed in this pull request? An alternative fix for https://github.com/apache/spark/pull/21698 When Spark rerun tasks for an

spark git commit: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` in FilterPushdownBenchmark

2018-09-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 39a02d8f7 -> c66eef844 [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` in FilterPushdownBenchmark ## What changes were proposed in this pull request? This is a follow-up of #22313 and aim to ignore the micro benchmark test which

spark-website git commit: Added Yinan Li to the list of committers

2018-09-04 Thread lixiao
Repository: spark-website Updated Branches: refs/heads/asf-site 92a85c6c2 -> afdb6cbb8 Added Yinan Li to the list of committers Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/afdb6cbb Tree:

spark git commit: [SPARK-19355][SQL][FOLLOWUP][TEST] Properly recycle SparkSession on TakeOrderedAndProjectSuite finishes

2018-09-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0b9b6b7d1 -> 3aa60282c [SPARK-19355][SQL][FOLLOWUP][TEST] Properly recycle SparkSession on TakeOrderedAndProjectSuite finishes ## What changes were proposed in this pull request? Previously in `TakeOrderedAndProjectSuite` the

spark git commit: [SPARK-25286][CORE] Removing the dangerous parmap

2018-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7fc8881b0 -> 32da87dfa [SPARK-25286][CORE] Removing the dangerous parmap ## What changes were proposed in this pull request? I propose to remove one of `parmap` methods which accepts an execution context as a parameter. The method should

spark git commit: [SPARK-25296][SQL][TEST] Create ExplainSuite

2018-08-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 339859c4e -> 7fc8881b0 [SPARK-25296][SQL][TEST] Create ExplainSuite ## What changes were proposed in this pull request? Move the output verification of Explain test cases to a new suite ExplainSuite. ## How was this patch tested? N/A

spark git commit: [DOC] Fix comment on SparkPlanGraphEdge

2018-08-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 20b7c684c -> 6b1b10ca8 [DOC] Fix comment on SparkPlanGraphEdge ## What changes were proposed in this pull request? `fromId` is the child, and `toId` is the parent, see line 127 in `buildSparkPlanGraphNode` above. The edges in Spark UI

spark git commit: [SPARK-25212][SQL] Support Filter in ConvertToLocalRelation

2018-08-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7ad18ee9f -> 103854028 [SPARK-25212][SQL] Support Filter in ConvertToLocalRelation ## What changes were proposed in this pull request? Support Filter in ConvertToLocalRelation, similar to how Project works. Additionally, in Optimizer, run

spark git commit: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS

2018-08-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4e3f3cebe -> aff8f15c1 [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS ## What changes were proposed in this pull request? In the PR, I propose to not perform recursive parallel listening of files in the `scanPartitions`

spark git commit: [SPARK-23997][SQL] Configurable maximum number of buckets

2018-08-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1149c4efb -> de46df549 [SPARK-23997][SQL] Configurable maximum number of buckets ## What changes were proposed in this pull request? This PR implements the possibility of the user to override the maximum number of buckets when saving to a

spark git commit: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.1] Shuffle+Repartition on a DataFrame could lead to incorrect answers

2018-08-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 09f70f5fd -> 4d2d3d47e [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.1] Shuffle+Repartition on a DataFrame could lead to incorrect answers ## What changes were proposed in this pull request? Back port of

spark git commit: [SPARK-25164][SQL] Avoid rebuilding column and path list for each column in parquet reader

2018-08-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 d7c3aae20 -> af41dedc6 [SPARK-25164][SQL] Avoid rebuilding column and path list for each column in parquet reader ## What changes were proposed in this pull request? VectorizedParquetRecordReader::initializeInternal rebuilds the

spark git commit: [SPARK-25164][SQL] Avoid rebuilding column and path list for each column in parquet reader

2018-08-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f5983823e -> 8db935f97 [SPARK-25164][SQL] Avoid rebuilding column and path list for each column in parquet reader ## What changes were proposed in this pull request? VectorizedParquetRecordReader::initializeInternal rebuilds the

spark git commit: [SPARK-25029][BUILD][CORE] Janino "Two non-abstract methods ..." errors

2018-08-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f2d35427e -> 9b6baeb7b [SPARK-25029][BUILD][CORE] Janino "Two non-abstract methods ..." errors ## What changes were proposed in this pull request? Update to janino 3.0.9 to address Java 8 + Scala 2.12 incompatibility. The error manifests

spark git commit: [SPARK-4502][SQL] Parquet nested column pruning - foundation

2018-08-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master cd6dff78b -> f2d35427e [SPARK-4502][SQL] Parquet nested column pruning - foundation (Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502) _N.B. This is a restart of PR #16578 which includes a subset of that code. Relevant

spark git commit: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.2] Shuffle+Repartition on a DataFrame could lead to incorrect answers

2018-08-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 124789b62 -> d7c3aae20 [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.2] Shuffle+Repartition on a DataFrame could lead to incorrect answers ## What changes were proposed in this pull request? Back port of

[6/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala -- diff --git

[5/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousInputPartitionReader.java -- diff --git

[3/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ForeachWriteSupportProvider.scala -- diff --git

[1/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 55f36641f -> e75488718 http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --

[4/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala -- diff --git

[2/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/e7548871/sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleDataSourceV2.java -- diff --git

[7/7] spark git commit: [SPARK-24882][SQL] improve data source v2 API

2018-08-22 Thread lixiao
[SPARK-24882][SQL] improve data source v2 API ## What changes were proposed in this pull request? Improve the data source v2 API according to the [design doc](https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing) summary of the changes 1. rename

spark git commit: [SPARK-25159][SQL] json schema inference should only trigger one job

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 07737c87d -> 4a9c9d8f9 [SPARK-25159][SQL] json schema inference should only trigger one job ## What changes were proposed in this pull request? This fixes a perf regression caused by https://github.com/apache/spark/pull/21376 . We

spark git commit: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions when expr codegen fails

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a998e9d82 -> 07737c87d [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions when expr codegen fails ## What changes were proposed in this pull request? This pr is to fix bugs when expr codegen fails; we need to catch

spark git commit: [SPARK-25129][SQL] Make the mapping of com.databricks.spark.avro to built-in module configurable

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6c5cb8585 -> ac0174e55 [SPARK-25129][SQL] Make the mapping of com.databricks.spark.avro to built-in module configurable ## What changes were proposed in this pull request? In https://issues.apache.org/jira/browse/SPARK-24924, the data

spark git commit: [SPARK-25114][2.3][CORE][FOLLOWUP] Fix RecordBinaryComparatorSuite build failure

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8bde46781 -> 9cb9d7201 [SPARK-25114][2.3][CORE][FOLLOWUP] Fix RecordBinaryComparatorSuite build failure ## What changes were proposed in this pull request? Fix RecordBinaryComparatorSuite build failure ## How was this patch tested?

spark git commit: [SPARK-25114][CORE] Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 9702bb637 -> 8bde46781 [SPARK-25114][CORE] Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE. https://github.com/apache/spark/pull/22079#discussion_r209705612 It is possible for two

spark git commit: [SPARK-25114][CORE] Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.

2018-08-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f984ec75e -> 4fb96e510 [SPARK-25114][CORE] Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE. ## What changes were proposed in this pull request?

spark git commit: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 14d7c1c3e -> a8a1ac01c [SPARK-24959][SQL] Speed up count() for JSON and CSV ## What changes were proposed in this pull request? In the PR, I propose to skip invoking of the CSV/JSON parser per each line in the case if the required schema

spark git commit: [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelatedScalarSubquery in list of nonExcludableRules

2018-08-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e50192494 -> e59dd8fa0 [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelatedScalarSubquery in list of nonExcludableRules ## What changes were proposed in this pull request? Add RewriteCorrelatedScalarSubquery in the list of

spark git commit: [SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit

2018-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b81e3031f -> 3c614d056 [SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit ## What changes were proposed in this pull request? Add logging for all generated methods from

spark git commit: [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier

2018-08-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 0856b82b3 -> 34191e663 [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier ## What changes were proposed in this pull request? The introduction of `AnalysisBarrier` prevented `FixNullability` to go through all the

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page (branch-2.2)

2018-08-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 051ea3a62 -> 1e73ee248 [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport

spark git commit: [SPARK-25092] Add RewriteExceptAll and RewriteIntersectAll in the list of nonExcludableRules

2018-08-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8ec25cd67 -> c3be2cd34 [SPARK-25092] Add RewriteExceptAll and RewriteIntersectAll in the list of nonExcludableRules ## What changes were proposed in this pull request? Add RewriteExceptAll and RewriteIntersectAll in the list of

spark git commit: [SPARK-25068][SQL] Add exists function.

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fec67ed7e -> 9b8521e53 [SPARK-25068][SQL] Add exists function. ## What changes were proposed in this pull request? This pr adds `exists` function which tests whether a predicate holds for one or more elements in the array. ```sql >

spark git commit: [SPARK-25076][SQL] SQLConf should not be retrieved from a stopped SparkSession

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 7d465d8f4 -> 9bfc55b1b [SPARK-25076][SQL] SQLConf should not be retrieved from a stopped SparkSession ## What changes were proposed in this pull request? When a `SparkSession` is stopped, `SQLConf.get` should use the fallback conf to

spark git commit: [SPARK-25076][SQL] SQLConf should not be retrieved from a stopped SparkSession

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bd6db1505 -> fec67ed7e [SPARK-25076][SQL] SQLConf should not be retrieved from a stopped SparkSession ## What changes were proposed in this pull request? When a `SparkSession` is stopped, `SQLConf.get` should use the fallback conf to

spark git commit: [SPARK-25077][SQL] Delete unused variable in WindowExec

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master eb9a696dd -> bd6db1505 [SPARK-25077][SQL] Delete unused variable in WindowExec ## What changes were proposed in this pull request? Just delete the unused variable `inputFields` in WindowExec, avoid making others confused while reading

spark git commit: [SPARK-24626][SQL] Improve location size calculation in Analyze Table command

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2949a835f -> d36539741 [SPARK-24626][SQL] Improve location size calculation in Analyze Table command ## What changes were proposed in this pull request? Currently, Analyze table calculates table size sequentially for each partition. We

spark git commit: [SPARK-25063][SQL] Rename class KnowNotNull to KnownNotNull

2018-08-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1a7e747ce -> 2949a835f [SPARK-25063][SQL] Rename class KnowNotNull to KnownNotNull ## What changes were proposed in this pull request? Correct the class name typo checked in through SPARK-24891 ## How was this patch tested? Passed all

spark git commit: [SPARK-25046][SQL] Fix Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8c13cb2ae -> f6356f9bc [SPARK-25046][SQL] Fix Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" ## What changes were proposed in this pull request? Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should

spark git commit: [SPARK-25031][SQL] Fix MapType schema print

2018-08-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master cb6cb3136 -> 8c13cb2ae [SPARK-25031][SQL] Fix MapType schema print ## What changes were proposed in this pull request? The PR fix the bug in `buildFormattedString` function in `MapType`, which makes the printed schema misleading. ## How

spark git commit: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp

2018-08-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6a143e3eb -> 1a29fec8e [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/21822 Similar to `TreeNode`, `AnalysisHelper`

spark git commit: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate

2018-08-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 408a3ff2c -> 0f3fa2f28 [SPARK-24996][SQL] Use DSL in DeclarativeAggregate ## What changes were proposed in this pull request? The PR refactors the aggregate expressions which were not using DSL in order to simplify them. ## How was this

spark git commit: [SPARK-25036][SQL] Should compare ExprValue.isNull with LiteralTrue/LiteralFalse

2018-08-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 87ca7396c -> 408a3ff2c [SPARK-25036][SQL] Should compare ExprValue.isNull with LiteralTrue/LiteralFalse ## What changes were proposed in this pull request? This PR fixes a comparison of `ExprValue.isNull` with `String`.

spark git commit: [SPARK-25025][SQL] Remove the default value of isAll in INTERSECT/EXCEPT

2018-08-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d063e3a47 -> c1760da5d [SPARK-25025][SQL] Remove the default value of isAll in INTERSECT/EXCEPT ## What changes were proposed in this pull request? Having the default value of isAll in the logical plan nodes INTERSECT/EXCEPT could

spark git commit: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesceHints

2018-08-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 64ad7b841 -> d063e3a47 [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesceHints ## What changes were proposed in this pull request? Follow up to fix an unmerged review comment. ## How was this patch tested? Unit test

spark git commit: [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries

2018-08-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 41c2227a2 -> 36ea55e97 [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries ## What changes were proposed in this pull request? Many Spark SQL users in my company have asked for a way to control the number of output files in

spark git commit: [SPARK-24997][SQL] Enable support of MINUS ALL

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b0d6967d4 -> 19a453191 [SPARK-24997][SQL] Enable support of MINUS ALL ## What changes were proposed in this pull request? Enable support for MINUS ALL which was gated at AstBuilder. ## How was this patch tested? Added tests in

spark git commit: [SPARK-24788][SQL] RelationalGroupedDataset.toString with unresolved exprs should not fail

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f45d60a5a -> b0d6967d4 [SPARK-24788][SQL] RelationalGroupedDataset.toString with unresolved exprs should not fail ## What changes were proposed in this pull request? In the current master, `toString` throws an exception when

spark git commit: [SPARK-24966][SQL] Implement precedence rules for set operations.

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b3f2911ee -> 73dd6cf9b [SPARK-24966][SQL] Implement precedence rules for set operations. ## What changes were proposed in this pull request? Currently the set operations INTERSECT, UNION and EXCEPT are assigned the same precedence. This

spark git commit: [SPARK-24705][SQL] ExchangeCoordinator broken when duplicate exchanges reused

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 02f967795 -> efef55388 [SPARK-24705][SQL] ExchangeCoordinator broken when duplicate exchanges reused ## What changes were proposed in this pull request? In the current master, `EnsureRequirements` sets the number of exchanges in

spark git commit: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0df6bf882 -> 02f967795 [SPARK-23908][SQL] Add transform function. ## What changes were proposed in this pull request? This pr adds `transform` function which transforms elements in an array using the function. Optionally we can take the

spark git commit: [SPARK-24598][DOCS] State in the documentation the behavior when arithmetic operations cause overflow

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 15fc23722 -> ad2e63662 [SPARK-24598][DOCS] State in the documentation the behavior when arithmetic operations cause overflow ## What changes were proposed in this pull request? According to the discussion in

spark git commit: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c9914cf04 -> 166f34618 [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE ## What changes were proposed in this pull request? This PR is to refactor the code in AVERAGE by dsl. ## How was this patch tested? N/A Author: Xiao Li

spark git commit: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal followed by aggregation returns wrong result

2018-08-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 c4b37696f -> 22ce8051f [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal followed by aggregation returns wrong result ## What changes were proposed in this pull request? When we do an average, the result is computed dividing the

spark git commit: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema

2018-08-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9f558601e -> ce084d3e0 [SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema ## What changes were proposed in this pull request? Regarding user-specified schema, data sources may have 3 different behaviors: 1. must have a

<    1   2   3   4   5   6   7   8   9   10   >