spark git commit: [SPARK-24937][SQL] Datasource partition table should load empty static partitions

2018-08-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f5113ea8d -> 9f558601e [SPARK-24937][SQL] Datasource partition table should load empty static partitions ## What changes were proposed in this pull request? How to reproduce: ```sql spark-sql> CREATE TABLE tbl AS SELECT 1; spark-sql>

spark git commit: [SPARK-24982][SQL] UDAF resolution should not throw AssertionError

2018-08-01 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1f7e22c72 -> 1efffb799 [SPARK-24982][SQL] UDAF resolution should not throw AssertionError ## What changes were proposed in this pull request? When user calls anUDAF with the wrong number of arguments, Spark previously throws an

spark git commit: [SPARK-24951][SQL] Table valued functions should throw AnalysisException

2018-07-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5f3441e54 -> 1f7e22c72 [SPARK-24951][SQL] Table valued functions should throw AnalysisException ## What changes were proposed in this pull request? Previously TVF resolution could throw IllegalArgumentException if the data type is null

spark git commit: [SPARK-24536] Validate that an evaluated limit clause cannot be null

2018-07-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 25ea27b09 -> fc3df4517 [SPARK-24536] Validate that an evaluated limit clause cannot be null It proposes a version in which nullable expressions are not valid in the limit clause It was tested with unit and e2e tests. Please review

spark git commit: [SPARK-24536] Validate that an evaluated limit clause cannot be null

2018-07-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b4fd75fb9 -> 4ac2126bc [SPARK-24536] Validate that an evaluated limit clause cannot be null ## What changes were proposed in this pull request? It proposes a version in which nullable expressions are not valid in the limit clause ## How

spark git commit: [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types

2018-07-31 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8141d5592 -> b4fd75fb9 [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types ## What changes were proposed in this pull request? When the pivot column is of a complex type, the eval() result will be an UnsafeRow,

spark git commit: [SPARK-24865][SQL] Remove AnalysisBarrier addendum

2018-07-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d6b7545b5 -> abbb4ab4d [SPARK-24865][SQL] Remove AnalysisBarrier addendum ## What changes were proposed in this pull request? I didn't want to pollute the diff in the previous PR and left some TODOs. This is a follow-up to address those

spark git commit: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column

2018-07-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b90bfe3c4 -> 47d84e4d0 [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column ## What changes were proposed in this pull request? This pr supported Date/Timestamp in a JDBC partition column (a numeric column is only

spark git commit: [SPARK-24771][BUILD] Upgrade Apache AVRO to 1.8.2

2018-07-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fca0b8528 -> b90bfe3c4 [SPARK-24771][BUILD] Upgrade Apache AVRO to 1.8.2 ## What changes were proposed in this pull request? Upgrade Apache Avro from 1.7.7 to 1.8.2. The major new features: 1. More logical types. From the spec of 1.8.2

spark git commit: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 6690924c4 -> 65a4bc143 [SPARK-21274][SQL] Implement INTERSECT ALL clause ## What changes were proposed in this pull request? Implements INTERSECT ALL clause through query rewrites using existing operators in Spark. Please refer to

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.1 7d50fec3f -> a3eb07db3 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 d5f340f27 -> 71eb7d468 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 73764737d -> f52d0c451 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8fe5d2c39 -> 2c54aae1b [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch will

spark git commit: [MINOR] Update docs for functions.scala to make it clear not all the built-in functions are defined there

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 34ebcc6b5 -> 6424b146c [MINOR] Update docs for functions.scala to make it clear not all the built-in functions are defined there The title summarizes the change. Author: Reynold Xin Closes #21318 from rxin/functions. Project:

spark git commit: [MINOR] Improve documentation for HiveStringType's

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 10f1f1965 -> 34ebcc6b5 [MINOR] Improve documentation for HiveStringType's The diff should be self-explanatory. Author: Reynold Xin Closes #21897 from rxin/hivestringtypedoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5828f41a5 -> 10f1f1965 [SPARK-21274][SQL] Implement EXCEPT ALL clause. ## What changes were proposed in this pull request? Implements EXCEPT ALL clause through query rewrites using existing operators in Spark. In this PR, an internal UDTF

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ef6c8395c -> c9bec1d37 [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]:

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.2 f339e2fd7 -> 73764737d [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]:

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 fa552c3c1 -> d5f340f27 [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]:

spark git commit: [SPARK-24288][SQL] Add a JDBC Option to enable preventing predicate pushdown

2018-07-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e6e9031d7 -> 21fcac164 [SPARK-24288][SQL] Add a JDBC Option to enable preventing predicate pushdown ## What changes were proposed in this pull request? Add a JDBC Option "pushDownPredicate" (default `true`) to allow/disallow predicate

spark git commit: [SPARK-24919][BUILD] New linter rule for sparkContext.hadoopConfiguration

2018-07-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2c8274568 -> fa09d9192 [SPARK-24919][BUILD] New linter rule for sparkContext.hadoopConfiguration ## What changes were proposed in this pull request? In most cases, we should use `spark.sessionState.newHadoopConf()` instead of

spark git commit: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e3486e1b9 -> 2c8274568 [SPARK-24307][CORE] Add conf to revert to old code. In case there are any issues in converting FileSegmentManagedBuffer to ChunkedByteBuffer, add a conf to go back to old code path. Followup to

spark git commit: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 5ed7660d1 -> e3486e1b9 [SPARK-24795][CORE] Implement barrier execution mode ## What changes were proposed in this pull request? Propose new APIs and modify job/task scheduling to support barrier execution mode, which requires all tasks

spark git commit: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule Exclusion

2018-07-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 58353d7f4 -> 5ed7660d1 [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule Exclusion ## What changes were proposed in this pull request? This is an extension to the original PR, in which rule exclusion did not work for

spark git commit: [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 740606eb8 -> fa552c3c1 [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter ```Scala val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a") .withColumn("b", udf1($"a", udf1($"a",

spark git commit: [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 17f469bc8 -> d2e7deb59 [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter ## What changes were proposed in this pull request? ```Scala val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a")

spark git commit: [SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0c83f718e -> 17f469bc8 [SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame ## What changes were proposed in this pull request? Besides spark setting

spark git commit: [SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 571a6f057 -> 2f77616e1 [SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string ## What changes were proposed in this pull request? In the PR, I propose to extend the `StructType`/`StructField` classes by new

[2/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d86cffd1/site/news/spark-2-2-1-released.html -- diff --git a/site/news/spark-2-2-1-released.html b/site/news/spark-2-2-1-released.html index df7c2f0..b9d465f 100644 ---

[1/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao
Repository: spark-website Updated Branches: refs/heads/asf-site f5d7dfafe -> d86cffd19 http://git-wip-us.apache.org/repos/asf/spark-website/blob/d86cffd1/site/releases/spark-release-1-1-1.html -- diff --git

[3/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao
spark summit eu 2018 Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/d86cffd1 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/d86cffd1 Diff:

spark git commit: [SPARK-24768][FOLLOWUP][SQL] Avro migration followup: change artifactId to spark-avro

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 7a5fd4a91 -> c44eb561e [SPARK-24768][FOLLOWUP][SQL] Avro migration followup: change artifactId to spark-avro ## What changes were proposed in this pull request? After rethinking on the artifactId, I think it should be `spark-avro` instead

spark git commit: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatched message

2018-07-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 78e0a725e -> 7a5fd4a91 [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatched message ## What changes were proposed in this pull request? Improvement `IN` predicate type mismatched message: ```sql Mismatched columns: [(, t, 4, ., `, t,

spark git commit: [SPARK-23957][SQL] Sorts in subqueries are redundant and can be removed

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d4c341589 -> afb062753 [SPARK-23957][SQL] Sorts in subqueries are redundant and can be removed ## What changes were proposed in this pull request? Thanks to henryr for the original idea at https://github.com/apache/spark/pull/21049

spark git commit: [SPARK-24890][SQL] Short circuiting the `if` condition when `trueValue` and `falseValue` are the same

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c26b09216 -> d4c341589 [SPARK-24890][SQL] Short circuiting the `if` condition when `trueValue` and `falseValue` are the same ## What changes were proposed in this pull request? When `trueValue` and `falseValue` are semantic equivalence,

spark git commit: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 740a23d7d -> 6a5999286 [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a difference between the same plan being analyzed once

spark git commit: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 15fff7903 -> c26b09216 [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule ## What changes were proposed in this pull request? The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a

spark git commit: [SPARK-24812][SQL] Last Access Time in the table description is not valid

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9d27541a8 -> d4a277f0c [SPARK-24812][SQL] Last Access Time in the table description is not valid ## What changes were proposed in this pull request? Last Access Time will always displayed wrong date Thu Jan 01 05:30:00 IST 1970 when user

spark git commit: [SPARK-23325] Use InternalRow when reading with DataSourceV2.

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3d5c61e5f -> 9d27541a8 [SPARK-23325] Use InternalRow when reading with DataSourceV2. ## What changes were proposed in this pull request? This updates the DataSourceV2 API to use InternalRow instead of Row for the default case with no

spark git commit: [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL

2018-07-24 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d2436a852 -> 13a67b070 [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL ## What changes were proposed in this pull request? Modified the canonicalized to not case-insensitive. Before the PR, cache can't work

spark git commit: [SPARK-24339][SQL] Prunes the unused columns from child of ScriptTransformation

2018-07-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 61f0ca4f1 -> cfc3e1aaa [SPARK-24339][SQL] Prunes the unused columns from child of ScriptTransformation ## What changes were proposed in this pull request? Modify the strategy in ColumnPruning to add a Project between ScriptTransformation

spark git commit: [SPARK-24850][SQL] fix str representation of CachedRDDBuilder

2018-07-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 08e315f63 -> 2edf17eff [SPARK-24850][SQL] fix str representation of CachedRDDBuilder ## What changes were proposed in this pull request? As of https://github.com/apache/spark/pull/21018, InMemoryRelation includes its cacheBuilder when

spark git commit: [SPARK-24887][SQL] Avro: use SerializableConfiguration in Spark utils to deduplicate code

2018-07-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 434319e73 -> 08e315f63 [SPARK-24887][SQL] Avro: use SerializableConfiguration in Spark utils to deduplicate code ## What changes were proposed in this pull request? To implement the method `buildReader` in `FileFormat`, it is required to

spark git commit: [SPARK-24802][SQL] Add a new config for Optimization Rule Exclusion

2018-07-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ab18b02e6 -> 434319e73 [SPARK-24802][SQL] Add a new config for Optimization Rule Exclusion ## What changes were proposed in this pull request? Since Spark has provided fairly clear interfaces for adding user-defined optimization rules,

spark git commit: [SPARK-24811][SQL] Avro: add new function from_avro and to_avro

2018-07-22 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 81af88687 -> 8817c68f5 [SPARK-24811][SQL] Avro: add new function from_avro and to_avro ## What changes were proposed in this pull request? 1. Add a new function from_avro for parsing a binary column of avro format and converting it into

spark git commit: [SPARK-24836][SQL] New option for Avro datasource - ignoreExtension

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bbd6f0c25 -> 106880edc [SPARK-24836][SQL] New option for Avro datasource - ignoreExtension ## What changes were proposed in this pull request? I propose to add new option for AVRO datasource which should control ignoring of files without

spark git commit: [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 db1f3cc76 -> bd6bfacb2 [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown ## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`.

spark git commit: [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 96f312076 -> bbd6f0c25 [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown ## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`. This

spark git commit: [PYSPARK][TEST][MINOR] Fix UDFInitializationTests

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 597bdeff2 -> 96f312076 [PYSPARK][TEST][MINOR] Fix UDFInitializationTests ## What changes were proposed in this pull request? Fix a typo in pyspark sql tests Author: William Sheu Closes #21833 from PenguinToast/fix-test-typo. Project:

spark git commit: [SPARK-24880][BUILD] Fix the group id for spark-kubernetes-integration-tests

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 00b864aa7 -> f765bb782 [SPARK-24880][BUILD] Fix the group id for spark-kubernetes-integration-tests ## What changes were proposed in this pull request? The correct group id should be `org.apache.spark`. This is causing the nightly build

spark git commit: [SPARK-24876][SQL] Avro: simplify schema serialization

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 2333a34d3 -> 00b864aa7 [SPARK-24876][SQL] Avro: simplify schema serialization ## What changes were proposed in this pull request? Previously in the refactoring of Avro Serializer and Deserializer, a new class SerializableSchema is

spark git commit: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 9ad77b303 -> 2333a34d3 [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource This commit adds the `cascadeTruncate` option to the JDBC datasource API, for databases that support this functionality (PostgreSQL and Oracle at the

spark git commit: Revert "[SPARK-24811][SQL] Avro: add new function from_avro and to_avro"

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3cb1b5780 -> 9ad77b303 Revert "[SPARK-24811][SQL] Avro: add new function from_avro and to_avro" This reverts commit 244bcff19463d82ec72baf15bc0a5209f21f2ef3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-24811][SQL] Avro: add new function from_avro and to_avro

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master cc4d64bb1 -> 244bcff19 [SPARK-24811][SQL] Avro: add new function from_avro and to_avro ## What changes were proposed in this pull request? Add a new function from_avro for parsing a binary column of avro format and converting it into its

spark git commit: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for GROUPING SET

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a5925c163 -> 2b91d9918 [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for GROUPING SET ## What changes were proposed in this pull request? Enhances the parser and analyzer to support ANSI compliant syntax for GROUPING SET. As part

[2/2] spark git commit: [SPARK-24268][SQL] Use datatype.catalogString in error messages

2018-07-20 Thread lixiao
[SPARK-24268][SQL] Use datatype.catalogString in error messages ## What changes were proposed in this pull request? As stated in https://github.com/apache/spark/pull/21321, in the error messages we should use `catalogString`. This is not the case, as SPARK-22893 used `simpleString` in order to

[1/2] spark git commit: [SPARK-24268][SQL] Use datatype.catalogString in error messages

2018-07-20 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1462b1766 -> a5925c163 http://git-wip-us.apache.org/repos/asf/spark/blob/a5925c16/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java

spark git commit: [SPARK-24163][SPARK-24164][SQL] Support column list as the pivot column in Pivot

2018-07-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1272b2034 -> cd203e0df [SPARK-24163][SPARK-24164][SQL] Support column list as the pivot column in Pivot ## What changes were proposed in this pull request? 1. Extend the Parser to enable parsing a column list as the pivot column. 2.

spark git commit: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-18 Thread lixiao
Repository: spark Updated Branches: refs/heads/master fc2e18963 -> 3b59d326c [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2 ## What changes were proposed in this pull request? This issue aims to upgrade Apache ORC library from 1.4.4 to 1.5.2 in order to bring the following benefits into

spark git commit: [SPARK-24681][SQL] Verify nested column names in Hive metastore

2018-07-17 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 912634b00 -> 2a4dd6f06 [SPARK-24681][SQL] Verify nested column names in Hive metastore ## What changes were proposed in this pull request? This pr added code to check if nested column names do not include ',', ':', and ';' because Hive

spark git commit: [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty

2018-07-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ba437fc5c -> 0f0d1865f [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty ## What changes were proposed in this pull request? Two new rules in the logical plan optimizers are added.

spark git commit: [SPARK-24805][SQL] Do not ignore avro files without extensions by default

2018-07-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b0c95a1d6 -> ba437fc5c [SPARK-24805][SQL] Do not ignore avro files without extensions by default ## What changes were proposed in this pull request? In the PR, I propose to change default behaviour of AVRO datasource which currently

spark git commit: [SPARK-23901][SQL] Removing masking functions

2018-07-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b045315e5 -> b0c95a1d6 [SPARK-23901][SQL] Removing masking functions The PR reverts #21246. Author: Marek Novotny Closes #21786 from mn-mikke/SPARK-23901. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-24810][SQL] Fix paths to test files in AvroSuite

2018-07-16 Thread lixiao
Repository: spark Updated Branches: refs/heads/master d463533de -> 9f929458f [SPARK-24810][SQL] Fix paths to test files in AvroSuite ## What changes were proposed in this pull request? In the PR, I propose to move `testFile()` to the common trait `SQLTestUtilsBase` and wrap test files in

spark git commit: [SPARK-24676][SQL] Project required data from CSV parsed data when column pruning disabled

2018-07-15 Thread lixiao
Repository: spark Updated Branches: refs/heads/master bcf7121ed -> d463533de [SPARK-24676][SQL] Project required data from CSV parsed data when column pruning disabled ## What changes were proposed in this pull request? This pr modified code to project required data from CSV parsed data when

spark git commit: [SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note

2018-07-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3e7dc8296 -> 69993217f [SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note ## What changes were proposed in this pull request? In the PR, I propose to output an warning if the `addFile()` or `addJar()` methods

spark git commit: [SPARK-24776][SQL] Avro unit test: deduplicate code and replace deprecated methods

2018-07-14 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 43e4e851b -> 3e7dc8296 [SPARK-24776][SQL] Avro unit test: deduplicate code and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest/SharedSQLContext/SQLTestUtils, instead

spark git commit: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader

2018-07-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 3b6005b8a -> a75571b46 [SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader ## What changes were proposed in this pull request? Add `org.apache.derby` to `IsolatedClientLoader`, otherwise it may throw an exception: ```scala

spark git commit: Revert "[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods"

2018-07-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c1b62e420 -> 3bcb1b481 Revert "[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods" This reverts commit c1b62e420a43aa7da36733ccdbec057d87ac1b43. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods

2018-07-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dfd7ac988 -> c1b62e420 [SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest/SharedSQLContext/SQLTestUtils, instead

spark git commit: [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work

2018-07-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 32429256f -> 9cf375f5b [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work ## What changes were proposed in this pull request? When we use a reference from Dataset in filter or sort, which was not used in

spark git commit: [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work

2018-07-13 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 0f24c6f8a -> dfd7ac988 [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work ## What changes were proposed in this pull request? When we use a reference from Dataset in filter or sort, which was not used in the

spark git commit: [SPARK-23486] cache the function name from the external catalog for lookupFunctions

2018-07-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e0f4f206b -> 0ce11d0e3 [SPARK-23486] cache the function name from the external catalog for lookupFunctions ## What changes were proposed in this pull request? This PR will cache the function name from external catalog, it is used by

spark git commit: [SPARK-24790][SQL] Allow complex aggregate expressions in Pivot

2018-07-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 11384893b -> 75725057b [SPARK-24790][SQL] Allow complex aggregate expressions in Pivot ## What changes were proposed in this pull request? Relax the check to allow complex aggregate expressions, like `ceil(sum(col1))` or `sum(col1) + 1`,

spark git commit: [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper locations

2018-07-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 07704c971 -> 11384893b [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper locations ## What changes were proposed in this pull request? The PR is a followup to move the test cases introduced by the original PR in their proper

spark git commit: [SPARK-23007][SQL][TEST] Add read schema suite for file-based data sources

2018-07-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 395860a98 -> 07704c971 [SPARK-23007][SQL][TEST] Add read schema suite for file-based data sources ## What changes were proposed in this pull request? The reader schema is said to be evolved (or projected) when it changed after the data

[2/2] spark git commit: [SPARK-24768][SQL] Have a built-in AVRO data source implementation

2018-07-12 Thread lixiao
[SPARK-24768][SQL] Have a built-in AVRO data source implementation ## What changes were proposed in this pull request? Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines.

[1/2] spark git commit: [SPARK-24768][SQL] Have a built-in AVRO data source implementation

2018-07-12 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1055c94cd -> 395860a98 http://git-wip-us.apache.org/repos/asf/spark/blob/395860a9/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala -- diff --git

spark git commit: [SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig

2018-07-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e008ad175 -> 3ab48f985 [SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig ## What changes were proposed in this pull request? In the PR, I propose to extend `RuntimeConfig` by new method `isModifiable()` which returns `true` if

spark git commit: [SPARK-24782][SQL] Simplify conf retrieval in SQL expressions

2018-07-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ff7f6ef75 -> e008ad175 [SPARK-24782][SQL] Simplify conf retrieval in SQL expressions ## What changes were proposed in this pull request? The PR simplifies the retrieval of config in `size`, as we can access them from tasks too thanks to

spark git commit: [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas

2018-07-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 86457a16d -> 32429256f [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas A self-join on a dataset which contains a `FlatMapGroupsInPandas` fails because of duplicate attributes. This happens because we are not

spark git commit: [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas

2018-07-11 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 592cc8458 -> ebf4bfb96 [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas ## What changes were proposed in this pull request? A self-join on a dataset which contains a `FlatMapGroupsInPandas` fails because of

spark git commit: [SPARK-24675][SQL] Rename table: validate existence of new location

2018-07-05 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ac78bcce0 -> 33952cfa8 [SPARK-24675][SQL] Rename table: validate existence of new location ## What changes were proposed in this pull request? If table is renamed to a existing new location, data won't show up. ``` scala>

spark git commit: [SPARK-22384][SQL][FOLLOWUP] Refine partition pruning when attribute is wrapped in Cast

2018-07-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master ca8243f30 -> bf764a33b [SPARK-22384][SQL][FOLLOWUP] Refine partition pruning when attribute is wrapped in Cast ## What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/21586 ,

spark git commit: [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project

2018-06-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 8ff4b9727 -> 3c0af793f [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project The ColumnPruning rule tries adding an extra Project if an input node produces fields more than needed, but as a post-processing step, it needs

spark git commit: simplify rand in dsl/package.scala

2018-06-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 0f534d3da -> 8ff4b9727 simplify rand in dsl/package.scala (cherry picked from commit d54d8b86301581142293341af25fd78b3278a2e8) Signed-off-by: Xiao Li Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

[1/2] spark git commit: [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project

2018-06-30 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 03545ce6d -> d54d8b863 [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project ## What changes were proposed in this pull request? The ColumnPruning rule tries adding an extra Project if an input node produces fields more

[2/2] spark git commit: simplify rand in dsl/package.scala

2018-06-30 Thread lixiao
simplify rand in dsl/package.scala Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d54d8b86 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d54d8b86 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d54d8b86

spark git commit: [SPARK-24553][WEB-UI] http 302 fixes for href redirect

2018-06-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 893ea224c -> c5aa54d54 [SPARK-24553][WEB-UI] http 302 fixes for href redirect ## What changes were proposed in this pull request? Updated URL/href links to include a '/' before '?id' to make links consistent and avoid http 302 redirect

spark git commit: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFileFormat

2018-06-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 221d03acc -> 893ea224c [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFileFormat ## What changes were proposed in this pull request? This pr added code to verify a schema in Json/Orc/ParquetFileFormat along with CSVFileFormat. ##

spark git commit: [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches

2018-06-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/branch-2.3 db538b25a -> 6e1f5e018 [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches Wrap the logical plan with a `AnalysisBarrier` for execution plan compilation in CacheManager, in order to avoid the plan

spark git commit: [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

2018-06-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 78ecb6d45 -> c04cb2d1b [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition ## What changes were proposed in this pull request? Set createTime for every hive partition created in Spark SQL, which could be used to manage

spark git commit: [SPARK-24215][PYSPARK][FOLLOW UP] Implement eager evaluation for DataFrame APIs in PySpark

2018-06-27 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a1a64e358 -> 6a0b77a55 [SPARK-24215][PYSPARK][FOLLOW UP] Implement eager evaluation for DataFrame APIs in PySpark ## What changes were proposed in this pull request? Address comments in #21370 and add more test. ## How was this patch

spark git commit: [SPARK-24423][SQL] Add a new option for JDBC sources

2018-06-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master dcaa49ff1 -> 02f8781fa [SPARK-24423][SQL] Add a new option for JDBC sources ## What changes were proposed in this pull request? Here is the description in the JIRA - Currently, our JDBC connector provides the option `dbtable` for users

spark git commit: [SPARK-24658][SQL] Remove workaround for ANTLR bug

2018-06-26 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e07aee216 -> dcaa49ff1 [SPARK-24658][SQL] Remove workaround for ANTLR bug ## What changes were proposed in this pull request? Issue antlr/antlr4#781 has already been fixed, so the workaround of extracting the pattern into a separate rule

spark git commit: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 8ab8ef773 -> bac50aa37 [SPARK-24596][SQL] Non-cascading Cache Invalidation ## What changes were proposed in this pull request? 1. Add parameter 'cascade' in CacheManager.uncacheQuery(). Under 'cascade=false' mode, only invalidate the

spark git commit: [SPARK-23931][SQL][FOLLOW-UP] Make `arrays_zip` in function.scala `@scala.annotation.varargs`.

2018-06-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master f596ebe4d -> 6e0596e26 [SPARK-23931][SQL][FOLLOW-UP] Make `arrays_zip` in function.scala `@scala.annotation.varargs`. ## What changes were proposed in this pull request? This is a follow-up pr of #21045 which added `arrays_zip`. The

spark git commit: [SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema

2018-06-25 Thread lixiao
Repository: spark Updated Branches: refs/heads/master a5849ad9a -> f596ebe4d [SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema ## What changes were proposed in this pull request? This pr modified JDBC datasource code to verify and normalize a

spark git commit: [SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code

2018-06-23 Thread lixiao
Repository: spark Updated Branches: refs/heads/master c7e2742f9 -> 98f363b77 [SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code ## What changes were proposed in this pull request? This pr added benchmark code `FilterPushdownBenchmark` for string pushdown and updated

<    1   2   3   4   5   6   7   8   9   10   >