spark git commit: [SPARK-23518][SQL] Avoid metastore access when the users only want to read and write data frames

2018-03-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0b6ceadeb -> 3a4d15e5d [SPARK-23518][SQL] Avoid metastore access when the users only want to read and write data frames ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/18944 added one patch, which

[2/3] spark-website git commit: spark summit 2018 agenda

2018-03-02 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark-website/blob/5885a07f/site/news/spark-2-3-0-released.html -- diff --git a/site/news/spark-2-3-0-released.html b/site/news/spark-2-3-0-released.html index 0297b18..0ed39bb 100644 ---

[1/3] spark-website git commit: spark summit 2018 agenda

2018-03-02 Thread lixiao
Repository: spark-website Updated Branches: refs/heads/asf-site fefc3ba29 -> 5885a07fd http://git-wip-us.apache.org/repos/asf/spark-website/blob/5885a07f/site/releases/spark-release-1-2-2.html -- diff --git

[3/3] spark-website git commit: spark summit 2018 agenda

2018-03-02 Thread lixiao
spark summit 2018 agenda Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/5885a07f Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/5885a07f Diff:

spark git commit: [SPARK-23523][SQL][FOLLOWUP] Minor refactor of OptimizeMetadataOnlyQuery

2018-02-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 476a7f026 -> 25c2776dd [SPARK-23523][SQL][FOLLOWUP] Minor refactor of OptimizeMetadataOnlyQuery ## What changes were proposed in this pull request? Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate

spark git commit: [SPARK-23514] Use SessionState.newHadoopConf() to propage hadoop configs set in SQLConf.

2018-02-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fab563b9b -> 476a7f026 [SPARK-23514] Use SessionState.newHadoopConf() to propage hadoop configs set in SQLConf. ## What changes were proposed in this pull request? A few places in `spark-sql` were using `sc.hadoopConfiguration` directly.

spark git commit: [SPARK-23523][SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery

2018-02-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master eac0b0672 -> 414ee867b [SPARK-23523][SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery ## What changes were proposed in this pull request? ```Scala val tablePath = new

[3/3] spark git commit: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread lixiao
[SPARK-23445] ColumnStat refactoring ## What changes were proposed in this pull request? Refactor ColumnStat to be more flexible. * Split `ColumnStat` and `CatalogColumnStat` just like `CatalogStatistics` is split from `Statistics`. This detaches how the statistics are stored from how they

[1/3] spark git commit: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7ec83658f -> 8077bb04f http://git-wip-us.apache.org/repos/asf/spark/blob/8077bb04/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala -- diff --git

[2/3] spark git commit: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark/blob/8077bb04/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala -- diff --git

spark git commit: [SPARK-23459][SQL] Improve the error message when unknown column is specified in partition columns

2018-02-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 855ce13d0 -> 1a198ce8f [SPARK-23459][SQL] Improve the error message when unknown column is specified in partition columns ## What changes were proposed in this pull request? This PR avoids to print schema internal information when

spark git commit: [SPARK-23490][SQL] Check storage.locationUri with existing table in CreateTable

2018-02-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c5abb3c2d -> 049f243c5 [SPARK-23490][SQL] Check storage.locationUri with existing table in CreateTable ## What changes were proposed in this pull request? For CreateTable with Append mode, we should check if `storage.locationUri` is the

spark git commit: [SPARK-23475][WEBUI] Skipped stages should be evicted before completed stages

2018-02-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 23ba4416e -> a0d794989 [SPARK-23475][WEBUI] Skipped stages should be evicted before completed stages ## What changes were proposed in this pull request? The root cause of missing completed stages is because `cleanupStages` will never

spark git commit: [SPARK-23475][WEBUI] Skipped stages should be evicted before completed stages

2018-02-21 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 744d5af65 -> 45cf714ee [SPARK-23475][WEBUI] Skipped stages should be evicted before completed stages ## What changes were proposed in this pull request? The root cause of missing completed stages is because `cleanupStages` will never

spark git commit: [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default

2018-02-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 189f56f3d -> 83c008762 [SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default ## What changes were proposed in this pull request? Apache Spark 2.3 introduced `native` ORC supports with vectorization and many fixes.

spark git commit: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 15ad4a7f1 -> 3ee3b2ae1 [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3 ## What changes were proposed in this pull request? This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes

spark git commit: [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations

2018-02-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0a73aa31f -> d5ed2108d [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations ## What changes were proposed in this pull request? Murmur3 hash generates a different value from the original and other

spark git commit: [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations

2018-02-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 ccb0a59d7 -> 8360da071 [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations ## What changes were proposed in this pull request? Murmur3 hash generates a different value from the original and other

spark git commit: [SPARK-23446][PYTHON] Explicitly check supported types in toPandas

2018-02-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1dc2c1d5e -> c5857e496 [SPARK-23446][PYTHON] Explicitly check supported types in toPandas ## What changes were proposed in this pull request? This PR explicitly specifies and checks the types we supported in `toPandas`. This was a hole.

spark git commit: [SPARK-23446][PYTHON] Explicitly check supported types in toPandas

2018-02-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 75bb19a01 -> ccb0a59d7 [SPARK-23446][PYTHON] Explicitly check supported types in toPandas ## What changes were proposed in this pull request? This PR explicitly specifies and checks the types we supported in `toPandas`. This was a

spark git commit: [MINOR][SQL] Fix an error message about inserting into bucketed tables

2018-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 bae4449ad -> 03960faa6 [MINOR][SQL] Fix an error message about inserting into bucketed tables ## What changes were proposed in this pull request? This replaces `Sparkcurrently` to `Spark currently` in the following error message.

spark git commit: [MINOR][SQL] Fix an error message about inserting into bucketed tables

2018-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2f0498d1e -> 6968c3cfd [MINOR][SQL] Fix an error message about inserting into bucketed tables ## What changes were proposed in this pull request? This replaces `Sparkcurrently` to `Spark currently` in the following error message.

spark git commit: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0

2018-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 d24d13179 -> bae4449ad [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0 ## What changes were proposed in this pull request? To prevent any regressions, this PR changes ORC implementation to `hive` by default

spark git commit: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0

2018-02-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f217d7d9b -> 2f0498d1e [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0 ## What changes were proposed in this pull request? To prevent any regressions, this PR changes ORC implementation to `hive` by default like

spark git commit: [SPARK-23421][SPARK-22356][SQL] Document the behavior change in

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 658d9d9d7 -> a77ebb092 [SPARK-23421][SPARK-22356][SQL] Document the behavior change in ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/19579 introduces a behavior change. We need to document it in

spark git commit: [SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 bd83f7ba0 -> 129fd45ef [SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource ## What changes were proposed in this pull request? This PR is to revert the PR https://github.com/apache/spark/pull/20302, because it

spark git commit: [SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a77ebb092 -> 95e4b4916 [SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource ## What changes were proposed in this pull request? This PR is to revert the PR https://github.com/apache/spark/pull/20302, because it causes

spark git commit: [SPARK-23421][SPARK-22356][SQL] Document the behavior change in

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 a5a8a86e2 -> bd83f7ba0 [SPARK-23421][SPARK-22356][SQL] Document the behavior change in ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/19579 introduces a behavior change. We need to document it

spark git commit: Revert "[SPARK-23249][SQL] Improved block merging logic for partitions"

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 fd66a3b7b -> a5a8a86e2 Revert "[SPARK-23249][SQL] Improved block merging logic for partitions" This reverts commit f5f21e8c4261c0dfe8e3e788a30b38b188a18f67. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Revert "[SPARK-23249][SQL] Improved block merging logic for partitions"

2018-02-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 140f87533 -> 400a1d9e2 Revert "[SPARK-23249][SQL] Improved block merging logic for partitions" This reverts commit 8c21170decfb9ca4d3233e1ea13bd1b6e3199ed9. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-23230][SQL][BRANCH-2.2] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 73263b215 -> a95c3e29d [SPARK-23230][SQL][BRANCH-2.2] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error When hive.default.fileformat is other kinds of file types, create textfile

spark git commit: Revert "[SPARK-23303][SQL] improve the explain result for data source v2 relations"

2018-02-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a5a4b8350 -> d6f5e172b Revert "[SPARK-23303][SQL] improve the explain result for data source v2 relations" This reverts commit f17b936f0ddb7d46d1349bd42f9a64c84c06e48d. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-23316][SQL] AnalysisException after max iteration reached for IN query

2018-02-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 dbb1b399b -> ab01ba718 [SPARK-23316][SQL] AnalysisException after max iteration reached for IN query ## What changes were proposed in this pull request? Added flag ignoreNullability to DataType.equalsStructurally. The previous semantic

spark git commit: [SPARK-23316][SQL] AnalysisException after max iteration reached for IN query

2018-02-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 263531466 -> 05d051293 [SPARK-23316][SQL] AnalysisException after max iteration reached for IN query ## What changes were proposed in this pull request? Added flag ignoreNullability to DataType.equalsStructurally. The previous semantic is

spark git commit: [SPARK-23303][SQL] improve the explain result for data source v2 relations

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ed4e78bd6 -> f17b936f0 [SPARK-23303][SQL] improve the explain result for data source v2 relations ## What changes were proposed in this pull request? The current explain result for data source v2 relation is unreadable: ``` == Parsed

spark git commit: [SPARK-23379][SQL] skip when setting the same current database in HiveClientImpl

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c1bcef876 -> ed4e78bd6 [SPARK-23379][SQL] skip when setting the same current database in HiveClientImpl ## What changes were proposed in this pull request? If the target database name is as same as the current database, we should be able

spark git commit: [SPARK-23352][PYTHON][BRANCH-2.3] Explicitly specify supported types in Pandas UDFs

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 befb22de8 -> 43f5e4067 [SPARK-23352][PYTHON][BRANCH-2.3] Explicitly specify supported types in Pandas UDFs ## What changes were proposed in this pull request? This PR backports https://github.com/apache/spark/pull/20531: It

spark git commit: [SPARK-23230][SQL] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 2b80571e2 -> befb22de8 [SPARK-23230][SQL] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error When hive.default.fileformat is other kinds of file types, create textfile table cause a

spark git commit: [SPARK-23230][SQL] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6cb59708c -> 4104b68e9 [SPARK-23230][SQL] When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error When hive.default.fileformat is other kinds of file types, create textfile table cause a serde

spark git commit: [SPARK-23313][DOC] Add a migration guide for ORC

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 9632c461e -> 2b80571e2 [SPARK-23313][DOC] Add a migration guide for ORC ## What changes were proposed in this pull request? This PR adds a migration guide documentation for ORC.

spark git commit: [SPARK-23313][DOC] Add a migration guide for ORC

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fba01b9a6 -> 6cb59708c [SPARK-23313][DOC] Add a migration guide for ORC ## What changes were proposed in this pull request? This PR adds a migration guide documentation for ORC.

spark git commit: [SPARK-23378][SQL] move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0c66fe4f2 -> fba01b9a6 [SPARK-23378][SQL] move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl ## What changes were proposed in this pull request? This removes the special case that `alterPartitions` call from

spark git commit: [SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the original schema doesn't have metadata.

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 4e138207e -> 9632c461e [SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the original schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of #19231 which modified the

spark git commit: [SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the original schema doesn't have metadata.

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5bb11411a -> 0c66fe4f2 [SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the original schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of #19231 which modified the behavior

spark git commit: [SPARK-23388][SQL] Support for Parquet Binary DecimalType in VectorizedColumnReader

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 70be6038d -> 4e138207e [SPARK-23388][SQL] Support for Parquet Binary DecimalType in VectorizedColumnReader ## What changes were proposed in this pull request? Re-add support for parquet binary DecimalType in VectorizedColumnReader

spark git commit: [SPARK-23388][SQL] Support for Parquet Binary DecimalType in VectorizedColumnReader

2018-02-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4a4dd4f36 -> 5bb11411a [SPARK-23388][SQL] Support for Parquet Binary DecimalType in VectorizedColumnReader ## What changes were proposed in this pull request? Re-add support for parquet binary DecimalType in VectorizedColumnReader ##

spark git commit: [SPARK-23275][SQL] fix the thread leaking in hive/tests

2018-02-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 49771ac8d -> f3a9a7f6b [SPARK-23275][SQL] fix the thread leaking in hive/tests ## What changes were proposed in this pull request? This is a follow up of https://github.com/apache/spark/pull/20441. The two lines actually can trigger

spark git commit: [SPARK-23275][SQL] fix the thread leaking in hive/tests

2018-02-09 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 557938e28 -> 6d7c38330 [SPARK-23275][SQL] fix the thread leaking in hive/tests ## What changes were proposed in this pull request? This is a follow up of https://github.com/apache/spark/pull/20441. The two lines actually can trigger the

spark git commit: [SPARK-23348][SQL] append data using saveAsTable should adjust the data types

2018-02-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 053830256 -> 0c2a2100d [SPARK-23348][SQL] append data using saveAsTable should adjust the data types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data

spark git commit: [SPARK-23348][SQL] append data using saveAsTable should adjust the data types

2018-02-08 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3473fda6d -> 7f5f5fb12 [SPARK-23348][SQL] append data using saveAsTable should adjust the data types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data

spark git commit: [SPARK-23345][SQL] Remove open stream record even closing it fails

2018-02-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 cb22e830b -> 05239afc9 [SPARK-23345][SQL] Remove open stream record even closing it fails ## What changes were proposed in this pull request? When `DebugFilesystem` closes opened stream, if any exception occurs, we still need to

spark git commit: [SPARK-23345][SQL] Remove open stream record even closing it fails

2018-02-07 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 71cfba04a -> 9841ae031 [SPARK-23345][SQL] Remove open stream record even closing it fails ## What changes were proposed in this pull request? When `DebugFilesystem` closes opened stream, if any exception occurs, we still need to remove

spark git commit: [SPARK-23327][SQL] Update the description and tests of three external API or functions

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b96a083b1 -> c36fecc3b [SPARK-23327][SQL] Update the description and tests of three external API or functions ## What changes were proposed in this pull request? Update the description and tests of three external API or functions

spark git commit: [SPARK-23327][SQL] Update the description and tests of three external API or functions

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f9c913263 -> 874d3f89f [SPARK-23327][SQL] Update the description and tests of three external API or functions ## What changes were proposed in this pull request? Update the description and tests of three external API or functions

spark git commit: [SPARK-23315][SQL] failed to get output from canonicalized data source v2 related plans

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 775e1 -> f9c913263 [SPARK-23315][SQL] failed to get output from canonicalized data source v2 related plans ## What changes were proposed in this pull request? `DataSourceV2Relation` keeps a `fullOutput` and resolves the real

spark git commit: [SPARK-23315][SQL] failed to get output from canonicalized data source v2 related plans

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master caf304456 -> b96a083b1 [SPARK-23315][SQL] failed to get output from canonicalized data source v2 related plans ## What changes were proposed in this pull request? `DataSourceV2Relation` keeps a `fullOutput` and resolves the real output

spark git commit: [MINOR][TEST] Fix class name for Pandas UDF tests

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 036a04b29 -> 775e1 [MINOR][TEST] Fix class name for Pandas UDF tests In https://github.com/apache/spark/commit/b2ce17b4c9fea58140a57ca1846b2689b15c0d61, I mistakenly renamed `VectorizedUDFTests` to `ScalarPandasUDF`. This PR

spark git commit: [MINOR][TEST] Fix class name for Pandas UDF tests

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ac7454cac -> caf304456 [MINOR][TEST] Fix class name for Pandas UDF tests ## What changes were proposed in this pull request? In https://github.com/apache/spark/commit/b2ce17b4c9fea58140a57ca1846b2689b15c0d61, I mistakenly renamed

spark git commit: [SPARK-23312][SQL][FOLLOWUP] add a config to turn off vectorized cache reader

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 7782fd03a -> 036a04b29 [SPARK-23312][SQL][FOLLOWUP] add a config to turn off vectorized cache reader ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/20483 tried to provide a way to turn off

spark git commit: [SPARK-23312][SQL][FOLLOWUP] add a config to turn off vectorized cache reader

2018-02-06 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7db9979ba -> ac7454cac [SPARK-23312][SQL][FOLLOWUP] add a config to turn off vectorized cache reader ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/20483 tried to provide a way to turn off the

spark git commit: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql

2018-02-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 45f0f4ff7 -> 430025cba [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql ## What changes were proposed in this pull request? Fix decimalArithmeticOperations.sql test ## How was this patch tested? N/A Author: Yuming

spark git commit: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql

2018-02-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 715047b02 -> 6fb3fd153 [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql ## What changes were proposed in this pull request? Fix decimalArithmeticOperations.sql test ## How was this patch tested? N/A Author: Yuming Wang

spark git commit: [SPARK-21658][SQL][PYSPARK] Revert "[] Add default None for value in na.replace in PySpark"

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 be3de8791 -> 45f0f4ff7 [SPARK-21658][SQL][PYSPARK] Revert "[] Add default None for value in na.replace in PySpark" This reverts commit 0fcde87aadc9a92e138f11583119465ca4b5c518. See the discussion in

spark git commit: [SPARK-21658][SQL][PYSPARK] Revert "[] Add default None for value in na.replace in PySpark"

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4aaa7d40b -> 551dff2bc [SPARK-21658][SQL][PYSPARK] Revert "[] Add default None for value in na.replace in PySpark" This reverts commit 0fcde87aadc9a92e138f11583119465ca4b5c518. See the discussion in

spark git commit: [MINOR][DOC] Use raw triple double quotes around docstrings where there are occurrences of backslashes.

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 4de206182 -> be3de8791 [MINOR][DOC] Use raw triple double quotes around docstrings where there are occurrences of backslashes. >From [PEP 257](https://www.python.org/dev/peps/pep-0257/): > For consistency, always use """triple double

spark git commit: [MINOR][DOC] Use raw triple double quotes around docstrings where there are occurrences of backslashes.

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 522e0b186 -> 4aaa7d40b [MINOR][DOC] Use raw triple double quotes around docstrings where there are occurrences of backslashes. >From [PEP 257](https://www.python.org/dev/peps/pep-0257/): > For consistency, always use """triple double

spark git commit: [SPARK-23305][SQL][TEST] Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 1bcb3728d -> 4de206182 [SPARK-23305][SQL][TEST] Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources ## What changes were proposed in this pull request? Like Parquet, all file-based data source handles

spark git commit: [SPARK-23305][SQL][TEST] Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 63b49fa2e -> 522e0b186 [SPARK-23305][SQL][TEST] Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources ## What changes were proposed in this pull request? Like Parquet, all file-based data source handles

spark git commit: [SPARK-23311][SQL][TEST] add FilterFunction test case for test CombineTypedFilters

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 b614c083a -> 1bcb3728d [SPARK-23311][SQL][TEST] add FilterFunction test case for test CombineTypedFilters ## What changes were proposed in this pull request? In the current test case for CombineTypedFilters, we lack the test of

spark git commit: [SPARK-23311][SQL][TEST] add FilterFunction test case for test CombineTypedFilters

2018-02-03 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fe73cb4b4 -> 63b49fa2e [SPARK-23311][SQL][TEST] add FilterFunction test case for test CombineTypedFilters ## What changes were proposed in this pull request? In the current test case for CombineTypedFilters, we lack the test of

spark git commit: [SPARK-23317][SQL] rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3ff83ad43 -> fe73cb4b4 [SPARK-23317][SQL] rename ContinuousReader.setOffset to setStartOffset ## What changes were proposed in this pull request? In the document of `ContinuousReader.setOffset`, we say this method is used to specify the

spark git commit: [SPARK-23317][SQL] rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 dcd0af4be -> b614c083a [SPARK-23317][SQL] rename ContinuousReader.setOffset to setStartOffset ## What changes were proposed in this pull request? In the document of `ContinuousReader.setOffset`, we say this method is used to specify

spark git commit: [SQL] Minor doc update: Add an example in DataFrameReader.schema

2018-02-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/master eaf35de24 -> 3ff83ad43 [SQL] Minor doc update: Add an example in DataFrameReader.schema ## What changes were proposed in this pull request? This patch adds a small example to the schema string definition of schema function. It isn't

spark git commit: [SQL] Minor doc update: Add an example in DataFrameReader.schema

2018-02-02 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 56eb9a310 -> dcd0af4be [SQL] Minor doc update: Add an example in DataFrameReader.schema ## What changes were proposed in this pull request? This patch adds a small example to the schema string definition of schema function. It isn't

spark git commit: [SPARK-23301][SQL] data source column pruning should work for arbitrary expressions

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 7baae3aef -> 2b07452ca [SPARK-23301][SQL] data source column pruning should work for arbitrary expressions This PR fixes a mistake in the `PushDownOperatorsToDataSource` rule, the column pruning logic is incorrect about `Project`. a

spark git commit: [SPARK-23301][SQL] data source column pruning should work for arbitrary expressions

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b3a04283f -> 19c7c7ebd [SPARK-23301][SQL] data source column pruning should work for arbitrary expressions ## What changes were proposed in this pull request? This PR fixes a mistake in the `PushDownOperatorsToDataSource` rule, the

spark git commit: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 2db7e49db -> 07a8f4ddf [SPARK-23293][SQL] fix data source v2 self join `DataSourceV2Relation` should extend `MultiInstanceRelation`, to take care of self-join. a new test Author: Wenchen Fan Closes #20466

spark git commit: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f051f8340 -> 73da3b696 [SPARK-23293][SQL] fix data source v2 self join ## What changes were proposed in this pull request? `DataSourceV2Relation` should extend `MultiInstanceRelation`, to take care of self-join. ## How was this patch

spark git commit: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.0

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 2549beae2 -> 2db7e49db [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.0 ## What changes were proposed in this pull request? `--hiveconf` and `--hivevar` variables no longer work

spark git commit: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.0

2018-02-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ec63e2d07 -> f051f8340 [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.0 ## What changes were proposed in this pull request? `--hiveconf` and `--hivevar` variables no longer work since

spark git commit: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRegexp` instead of `assertRaisesRegex`.

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 4b7cd479a -> 07cee3373 [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRegexp` instead of `assertRaisesRegex`. ## What changes were proposed in this pull request? This is a follow-up pr of #19872 which uses `assertRaisesRegex` but

spark git commit: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFrame().

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master cc41245fa -> b6b50efc8 [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFrame(). ## What changes were proposed in this pull request? SpecifiedWindowFrame.defaultWindowFrame(hasOrderSpecification, acceptWindowFrame) was designed to

spark git commit: [SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 56ae32657 -> b2e7677f4 [SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver Signed-off-by: Atallah Hezbor ## What changes were proposed in this pull request? This PR proposes modifying the match statement

spark git commit: [SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 59e89a299 -> 871fd48dc [SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver Signed-off-by: Atallah Hezbor ## What changes were proposed in this pull request? This PR proposes modifying the match

[1/2] spark git commit: [SPARK-23268][SQL] Reorganize packages in data source V2

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2ac895be9 -> 56ae32657 http://git-wip-us.apache.org/repos/asf/spark/blob/56ae3265/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala

[2/2] spark git commit: [SPARK-23268][SQL] Reorganize packages in data source V2

2018-01-31 Thread lixiao
[SPARK-23268][SQL] Reorganize packages in data source V2 ## What changes were proposed in this pull request? 1. create a new package for partitioning/distribution related classes. As Spark will add new concrete implementations of `Distribution` in new releases, it is good to have a new

[1/2] spark git commit: [SPARK-23268][SQL] Reorganize packages in data source V2

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 0d0f57936 -> 59e89a299 http://git-wip-us.apache.org/repos/asf/spark/blob/59e89a29/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala

[2/2] spark git commit: [SPARK-23268][SQL] Reorganize packages in data source V2

2018-01-31 Thread lixiao
[SPARK-23268][SQL] Reorganize packages in data source V2 ## What changes were proposed in this pull request? 1. create a new package for partitioning/distribution related classes. As Spark will add new concrete implementations of `Distribution` in new releases, it is good to have a new

spark git commit: revert the removal of import in SPARK-23281

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 5273cc791 -> cb73ecd2f revert the removal of import in SPARK-23281 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb73ecd2 Tree:

spark git commit: [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 0e58fee9d -> 5273cc791 [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases ## What changes were proposed in this pull request? Here is the test

spark git commit: [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f5f21e8c4 -> 8ee3a71c9 [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases ## What changes were proposed in this pull request? Here is the test

spark git commit: [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases

2018-01-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dd242bad3 -> 9ff1d96f0 [SPARK-23281][SQL] Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases ## What changes were proposed in this pull request? Here is the test snippet.

spark git commit: [SPARK-23274][SQL] Fix ReplaceExceptWithFilter when the right's Filter contains the references that are not in the left output

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 6ed0d57f8 -> b8778321b [SPARK-23274][SQL] Fix ReplaceExceptWithFilter when the right's Filter contains the references that are not in the left output ## What changes were proposed in this pull request? This PR is to fix the

spark git commit: [SPARK-23274][SQL] Fix ReplaceExceptWithFilter when the right's Filter contains the references that are not in the left output

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 778661673 -> ca04c3ff2 [SPARK-23274][SQL] Fix ReplaceExceptWithFilter when the right's Filter contains the references that are not in the left output ## What changes were proposed in this pull request? This PR is to fix the

spark git commit: [SPARK-23276][SQL][TEST] Enable UDT tests in (Hive)OrcHadoopFsRelationSuite

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9623a9824 -> 778661673 [SPARK-23276][SQL][TEST] Enable UDT tests in (Hive)OrcHadoopFsRelationSuite ## What changes were proposed in this pull request? Like Parquet, ORC test suites should enable UDT tests. ## How was this patch tested?

spark git commit: [SPARK-23276][SQL][TEST] Enable UDT tests in (Hive)OrcHadoopFsRelationSuite

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 7b9fe0865 -> 6ed0d57f8 [SPARK-23276][SQL][TEST] Enable UDT tests in (Hive)OrcHadoopFsRelationSuite ## What changes were proposed in this pull request? Like Parquet, ORC test suites should enable UDT tests. ## How was this patch

spark git commit: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 f4802dc88 -> 7b9fe0865 [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs This PR is to backport https://github.com/apache/spark/pull/20428 to Spark 2.3 without adding the changes regarding `GROUPED AGG PANDAS UDF` --- ## What

spark git commit: [SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 31c00ad8b -> 58fcb5a95 [SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM ## What changes were proposed in this pull request? hive tests have been failing when they are run locally (Mac Os) after

spark git commit: [SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 2e0c1e5f3 -> f4802dc88 [SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM ## What changes were proposed in this pull request? hive tests have been failing when they are run locally (Mac Os)

spark git commit: [SPARK-23267][SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535

2018-01-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 7d96dc1ac -> 2e0c1e5f3 [SPARK-23267][SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 ## What changes were proposed in this pull request? Still saw the performance regression introduced by `spark.sql.codegen.hugeMethodLimit`

<    3   4   5   6   7   8   9   10   11   12   >