[GitHub] spark pull request #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-251...

2018-09-11 Thread henryr
Github user henryr closed the pull request at: https://github.com/apache/spark/pull/22211 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-09-11 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/22211 Merged to 2.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-251...

2018-08-23 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/22211 [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.… ## What changes were proposed in this pull request? Back port of #20393 and #22079. Currently

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-21 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21482 Any further comments here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-21 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197234189 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21482 I think consistency in Spark's naming convention (and therefore increased discoverability by users) outweighs the advantage of naming it exactly for the Impala equivalent. I do agree

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193898812 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap",

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21482 @rxin, that in itself is a bit weird, but there are ways to express inf values in Scala and thus inf values can show up flowing through Spark plans. I'm not sure MySQL has any such facility

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-06 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21482 @rxin Other engines are all over the place: * MySQL doesn't have support for infinity (based on my cursory look) - 1.0 / 0.0 is written as `null`. Also seems to be true of SQLite

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192522027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520713 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -56,6 +56,16 @@ class

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192521881 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520566 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -24,7 +24,7 @@ import

[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread henryr
Github user henryr closed the pull request at: https://github.com/apache/spark/pull/21302 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21302 Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21302#discussion_r188042296 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -602,6 +602,16 @@ class

[GitHub] spark pull request #21323: [SPARK-23582][SQL] Add withSQLConf(...) to test c...

2018-05-14 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/21323 [SPARK-23582][SQL] Add withSQLConf(...) to test case ## What changes were proposed in this pull request? Add a `withSQLConf(...)` wrapper to force Parquet filter pushdown for a test

[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21302 Sounds good, done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/21302 [SPARK-23852][SQL] Upgrade to Parquet 1.8.3 ## What changes were proposed in this pull request? Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217 ## How was this patch

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-05-09 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 @dilipbiswal I tried hard to find something in the SQL standard that clarified the situation, but couldn't (of course that could be because the standard is pretty hard to parse... :)). So

[GitHub] spark pull request #21284: [SPARK-23852][SQL] Add test that fails if PARQUET...

2018-05-09 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/21284 [SPARK-23852][SQL] Add test that fails if PARQUET-1217 is not fixed ## What changes were proposed in this pull request? Add a new test that triggers if PARQUET-1217 - a predicate pushdown

[GitHub] spark issue #21201: [SPARK-24128][SQL] Mention configuration option in impli...

2018-05-07 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21201 Any chance to get this merged now the tests are working again? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-05-04 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 I might be a bit of a hardliner on this, but I think it's correct to eliminate the {{ORDER BY}} from common table expressions (e.g. MSSQL agrees with me, see [this link](https://docs.microsoft.com

[GitHub] spark pull request #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to R...

2018-05-04 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21145#discussion_r186143060 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -22,20 +22,20 @@ import

[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...

2018-05-02 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21145 I don't mind `ReadTask`. It's imperfect because 'task' implies that this is a thing that can be executed, whereas this interface doesn't have a way to pass control to the task object. It's more like

[GitHub] spark issue #21201: [SPARK-24128][SQL] Mention configuration option in impli...

2018-05-02 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21201 No problem! It's a small usability fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21201: [SPARK-24128][SQL] Mention configuration option in impli...

2018-05-01 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21201 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21201: [SPARK-24128][SQL] Mention configuration option i...

2018-04-30 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/21201 [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error ## What changes were proposed in this pull request? Mention `spark.sql.crossJoin.enabled` in error message when

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-04-27 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 @dilipbiswal thanks for the clarification. I agree that this particular case - where the alias is the root of a logical plan - might need special handling. Is there any reason to actually

[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...

2018-04-27 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21122 cc @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-25 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 yes, thanks @maropu! +1 to the idea of making this a jenkins job. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-24 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21073 @gatorsmile this looks ready for your review (asking because you filed the JIRA) if you time, thanks! --- - To unsubscribe, e

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-24 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 Ok, thanks for the context. I've worked on other projects where it's sometimes ok to take a small risk of a perf hit on trunk as long as the community is committed to addressing the issues before

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-24 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 @cloud-fan since there’s probably quite some time before this lands in a release, what do you think about merging this now if it’s ready, and filing the perf jira as a blocker against 2.4? My

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183560436 --- Diff: python/pyspark/sql/functions.py --- @@ -2186,6 +2186,29 @@ def map_values(col): return Column(sc._jvm.functions.map_values(_to_java_column

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183558371 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,154 @@ case class MapValues

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183559826 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -376,6 +376,35 @@ class DataFrameFunctionsSuite extends

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183559190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,154 @@ case class MapValues

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183560201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,154 @@ case class MapValues

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183559429 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,154 @@ case class MapValues

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183559663 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -56,6 +58,26 @@ class

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-04-23 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 @dilipbiswal Thanks! Although Spark doesn't necessarily parse the query in the `from` clause as a subquery, is it fair to say it plans it as one? (Since the planner puts the alias under

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-23 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 This looks pretty good to me - are there any committers that can give it a (hopefully) final review? --- - To unsubscribe, e

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-18 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r182547477 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -115,6 +116,62 @@ case class MapValues

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-17 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 @scottcarey I agree that's important. Perhaps it could be done as a follow-up PR? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r181915827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -115,6 +116,62 @@ case class MapValues

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181908529 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala --- @@ -0,0 +1,155

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181911744 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala --- @@ -0,0 +1,80

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181910914 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala --- @@ -0,0 +1,80

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181907316 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala --- @@ -0,0 +1,155

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181912300 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala --- @@ -0,0 +1,80

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181907395 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala --- @@ -0,0 +1,155

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181901031 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +59,159 @@ public final

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181902045 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +59,159 @@ public final

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181889287 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181882476 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21072#discussion_r181847445 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala --- @@ -98,4 +98,31 @@ class

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21072#discussion_r181851593 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -736,12 +736,22 @@ object EliminateSorts extends Rule

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21072#discussion_r181849951 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala --- @@ -98,4 +98,31 @@ class

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181846514 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

2018-04-16 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21049#discussion_r181839715 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -307,6 +309,32 @@ object RemoveRedundantProject

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-04-16 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 In SQL, the sort in a subquery doesn't make sense because of the relational model - the output of a subquery is an unordered bag of tuples. Some engines still allow the sort, some silently drop

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21072#discussion_r181563918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -736,12 +736,15 @@ object EliminateSorts extends Rule

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181540952 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181528729 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-12 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r181253369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -733,6 +735,17 @@ object EliminateSorts extends Rule

[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

2018-04-11 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21049#discussion_r180964957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -307,6 +309,32 @@ object RemoveRedundantProject

[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

2018-04-11 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/21049 [SPARK-23957][SQL] Remove redundant sort operators from subqueries ## What changes were proposed in this pull request? Subqueries (at least in SQL) have 'bag of tuples' semantics. Ordering

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r179003707 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -274,3 +279,7 @@ abstract class BinaryNode

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180601594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -733,6 +735,17 @@ object EliminateSorts extends Rule

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180592716 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -22,10 +22,11 @@ import org.apache.spark.sql.{execution, Row

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 @gatorsmile ok, I think the coverage right now is a reasonable start - the other test cases I can think of would act more like they're exercising the expression-walking code, not the actual

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-18 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 @gatorsmile thank you for the reviews! Are there specific test cases you'd like to see? I've checked correlated and uncorrelated subqueries, various flavours of join, aggregates with HAVING clauses

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-14 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r174637789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,54 +22,34 @@ import

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-09 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173561557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,32 +22,24 @@ import

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-09 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173561516 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,32 +22,24 @@ import

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-09 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173561415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,32 +22,24 @@ import

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-08 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173323846 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +330,31 @@ class ComplexTypesSuite

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-08 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173268999 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +330,31 @@ class ComplexTypesSuite

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-07 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173013094 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +330,31 @@ class ComplexTypesSuite

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-07 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173007679 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +330,31 @@ class ComplexTypesSuite

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-07 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r172996211 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,32 +22,24 @@ import

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-07 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r172992262 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +330,30 @@ class ComplexTypesSuite

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-06 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 This failing because of SPARK-23606, which seems unrelated (I haven't been able to trigger it in local builds, at least

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-06 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20740: [SPARK-23604][SQL] Change Statistics.isEmpty to !Statist...

2018-03-05 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20740 argh, wrong PR, sorry for retest-spam. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #20740: [SPARK-23604][SQL] Change Statistics.isEmpty to !Statist...

2018-03-05 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20740 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20740: [SPARK-23604][SQL] Change Statistics.isEmpty to !...

2018-03-05 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20740 [SPARK-23604][SQL] Change Statistics.isEmpty to !Statistics.hasNonNul… …lValue ## What changes were proposed in this pull request? Parquet 1.9 will change the semantics

[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

2018-03-05 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r172300096 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -331,4 +331,24 @@ class ComplexTypesSuite

[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

2018-02-28 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

2018-02-27 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20687 [SPARK-25000][SQL] Fix complex type simplification rules to apply to entire plan ## What changes were proposed in this pull request? Complex type simplification optimizer rules were

[GitHub] spark pull request #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkD...

2018-01-30 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20443 [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFrame in R comment You can merge this pull request into a Git repository by running: $ git pull https://github.com/henryr/spark SPARK-23

[GitHub] spark pull request #20429: [SPARK-23157][SQL] Explain restriction on column ...

2018-01-29 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20429 [SPARK-23157][SQL] Explain restriction on column expression in withColumn() ## What changes were proposed in this pull request? It's not obvious from the comments that any added column must

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

2018-01-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163455727 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

2018-01-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163422934 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

2018-01-23 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163411267 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala --- @@ -172,6 +172,14 @@ class TextSuite extends

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

2018-01-22 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163146077 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala --- @@ -172,6 +172,14 @@ class TextSuite extends

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

2018-01-22 Thread henryr
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20355 SPARK-23148: [SQL] Allow pathnames with special characters for CSV / … …JSON / text ## What changes were proposed in this pull request? Fix for JSON and CSV data sources when

[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-16 Thread henryr
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20254 Thanks all for the pointers and feedback! I've removed the references to the behavior before 2.0, and now the changes just make it explicit that this is `EXCEPT DISTINCT` (I appreciate that that's

  1   2   >