[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-04 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 As @rxin said, if we introduce a separate expression for the switch-based approach, then we will need to modify other places. For example, `DataSourceStrategy$translateFilter`. So, integrating

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-29 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 To sum up, I would set the goal of this PR is to make `In` expressions as efficient as possible for bytes/shorts/ints. Then we can do benchmarks for `In` vs `InSet` in [SPARK-26203](https

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-29 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 @dbtsai @mgaido91 I think we can come back to this question once [SPARK-26203](https://issues.apache.org/jira/browse/SPARK-26203) is resolved. That JIRA will give us enough information about

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-29 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 @cloud-fan, yeah, let’s see if this PR is useful. The original idea wasn’t to avoid fixing autoboxing in `InSet`. `In` was tested on 250 numbers to prove O(1) time complexity

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-28 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 @gatorsmile @cloud-fan @dongjoon-hyun @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts,...

2018-11-28 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/23171 [SPARK-26205][SQL] Optimize In for bytes, shorts, ints ## What changes were proposed in this pull request? This PR optimizes `In` expressions for byte, short, integer types

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-26 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23139#discussion_r236466962 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala --- @@ -0,0 +1,110

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-26 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23139#discussion_r236467423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala --- @@ -0,0 +1,110

[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

2018-11-19 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23079 LGTM as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234467085 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala --- @@ -298,6 +299,45

[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

2018-11-18 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23079 @rednaxelafx I am glad the rule gets more adoption. Renaming also makes sense to me. Shall we extend `ReplaceNullWithFalseEndToEndSuite` as well

[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark

2018-11-08 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/22966 I also think having a performance trend would be useful. I'll be glad to help with this effort. --- - To unsubscribe, e

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-31 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229705741 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229449496 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229449194 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229445682 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229445313 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229442843 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-29 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229133793 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-29 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229133550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228741884 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228741800 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends

[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

2018-10-27 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/22857 @dbtsai @gatorsmile @cloud-fan could you guys, please, take a look? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-27 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/22857 [SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible ## What changes were proposed in this pull request? This PR proposes a new optimization rule

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-09-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 Hi @dongjoon-hyun. Yep, I'll close this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2018-09-13 Thread aokolnychyi
Github user aokolnychyi closed the pull request at: https://github.com/apache/spark/pull/19193 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21580: [SPARK-24575][SQL] Prohibit window expressions in...

2018-06-17 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/21580 [SPARK-24575][SQL] Prohibit window expressions inside WHERE and HAVING clauses ## What changes were proposed in this pull request? As discussed [before](https://github.com/apache

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-05-31 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 @cloud-fan @hvanhovell I created PR #21473 that fixes StackOverflow. Apart from that, I think we might have other potential problems. **1. Window functions inside WHERE

[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/21473#discussion_r192234621 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1744,11 +1744,14 @@ class Analyzer

[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/21473 [SPARK-21896][SQL] Fix StackOverflow caused by window functions inside aggregate functions ## What changes were proposed in this pull request? This PR explicitly prohibits window

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-05-20 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 @hvanhovell @cloud-fan I think it would be safer to be consistent with other databases and what Spark does for nested aggregate functions. It is really simple to write a subquery to work around

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-05-19 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 I checked PostgreSQL(10.3), MySQL(8.0), Hive(2.1.0). **1. PostgreSQL** ``` postgres=# CREATE TABLE t1 (c1 integer, c2 integer); postgres=# INSERT INTO t1 VALUES (1, 2

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-04-06 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 Let me check other databases and come up with a summary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2017-12-17 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 @hvanhovell here is a summary of tried scenarios: ``` val df = Seq((1, 2), (1, 3), (2, 4), (5, 5)).toDF("a", "b") val window1 = Window.orderBy

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in (0, 2, 3, 4)``. table 'a' ``` a1 a2 1 2 3 3 4 5 ``` table 'b

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 Yeah, correct. So, we should revert then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 I took a look at ``JoinSelection``. It seems we will not get ``BroadcastHashJoin`` or ``ShuffledHashJoin`` if we revert this rule

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 Sure, if you guys think it does not give any performance benefits, then let's revert it. I also had similar concerns but my understanding was that having an inner join with some

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-12 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156495899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-12 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156493072 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2017-12-08 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 @gatorsmile @cloud-fan could you provide any input? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-30 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r154164912 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala --- @@ -27,6 +27,8 @@ object ExpressionSet

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-28 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r153420088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,99 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-27 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r153329031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,99 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-26 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r153066992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,99 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-22 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r152660385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-21 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @SimonBin The initial solution handled your case but then there was a decision to restrict the proposed rule to cross joins only. You can find the reason in this [comment](https://github.com

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-10-18 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r145498671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,79 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-10-15 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r144722742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends

[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...

2017-09-17 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19252 @gatorsmile thanks for the feedback. I also covered ``TruncateTableCommand`` with additional tests. However, I see a bit strange behavior while creating a test

[GitHub] spark pull request #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats ...

2017-09-16 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/19252 [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable ## What changes were proposed in this pull request? Tables in the catalog cache are not invalidated once

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-09-11 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/19193 [WIP][SPARK-21896][SQL] Fix Stack Overflow when window function is nested inside an aggregate function ## What changes were proposed in this pull request? This WIP PR contains

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-09-06 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r137343500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-09-06 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r137343433 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-31 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile what is our decision here? Shall we wait until SPARK-21652 is resolved? In the meantime, I can add some tests and see how the proposed rule works together with all others

[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...

2017-08-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18909 @gatorsmile sure, this PR is only about tests, I was just wondering what is planned regarding cross joins with inequality conditions. I borrowed several tests from PR #16762 and added

[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...

2017-08-11 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18909 @gatorsmile I took a look at both PRs. I quickly scanned PR #14866 and did not find tests for existence joins. Also, `SQLConf.CROSS_JOINS_ENABLED = true` is checked only

[GitHub] spark pull request #18909: [MINOR][SQL] Additional test case for CheckCartes...

2017-08-10 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18909 [MINOR][SQL] Additional test case for CheckCartesianProducts rule ## What changes were proposed in this pull request? While discovering optimization rules and their test coverage, I

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-08 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile I updated the rule to cover cross join cases. Regarding the case with the redundant condition mentioned by you, I opened [SPARK-21652](https://issues.apache.org/jira/browse/SPARK

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-08-01 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r130662925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,72 @@ object EliminateOuterJoin extends

[GitHub] spark issue #18692: [SPARK-21417][SQL] Detect joind conditions via filter ex...

2017-07-31 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile I took a look at the case above. Indeed, the proposed rule triggers this issue but only indirectly. In the example above, the optimizer will never reach a fixed point. Please, find

[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

2017-07-27 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18740#discussion_r129911780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest

[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

2017-07-26 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18740 [SPARK-21538][SQL] Attribute resolution inconsistency in the Dataset API ## What changes were proposed in this pull request? This PR contains a tiny update that removes an attribute

[GitHub] spark issue #18692: [SPARK-21417][SQL] Detect joind conditions via filter ex...

2017-07-24 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile thanks for the input. Let me check that I understood everything correctly. So, I keep it as a separate rule that is applied only if constraint propagation enabled. Inside the rule

[GitHub] spark issue #18692: [SPARK-21417][SQL] Detect joind conditions via filter ex...

2017-07-21 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @cloud-fan which rule do you mean? `PushPredicateThroughJoin` seems to be the closest by logic but it has a slightly different purpose and does not cover this use case. In fact, I used

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Detect joind conditions via fi...

2017-07-20 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18692 [SPARK-21417][SQL] Detect joind conditions via filter expressions ## What changes were proposed in this pull request? This PR adds an optimization rule that infers join conditions

[GitHub] spark issue #18583: [SPARK-21332][SQL] Incorrect result type inferred for so...

2017-07-17 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18583 Can we, please, trigger this one more time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18583: [SPARK-21332][SQL] Incorrect result type inferred...

2017-07-10 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18583 [SPARK-21332][SQL] Incorrect result type inferred for some decimal expressions ## What changes were proposed in this pull request? This PR changes the direction of expression

[GitHub] spark issue #18368: [SPARK-21102][SQL] Refresh command is too aggressive in ...

2017-07-03 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18368 @gatorsmile should be fixed now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-27 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18368 @shaneknapp It seems that the build fails with an exception non-related to the PR. Therefore, I will just close this one and open a new one. --- If your project is set up for it, you can reply

[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-21 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18368 @shaneknapp can we trigger this one more time, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18368: [SPARK-21102][SQL] Make refresh resource command ...

2017-06-20 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18368 [SPARK-21102][SQL] Make refresh resource command less aggressive in p… ### Idea This PR adds validation to REFRESH sql statements. Currently, users can specify whatever they want

[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...

2017-06-12 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18252 @wzhfy @rxin @ueshin can someone, please, merge this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-10 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121252397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -399,13 +399,13 @@ object DateTimeUtils

[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-10 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121251811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -32,7 +32,7 @@ import

[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...

2017-06-09 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18252 @ueshin good point, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-09 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/18252 [SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2017-01-24 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r97589440 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,160 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-22 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93606051 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-21 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93395745 --- Diff: docs/sql-programming-guide.md --- @@ -382,6 +382,52 @@ For example: +## Aggregations + +The [built-in DataFrames

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-20 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/16329 @marmbrus I have updated the pull request. The compiled docs can be found [here](https://aokolnychyi.github.io/spark-docs/sql-programming-guide.html). I did not manage to build

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93019316 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93019035 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-18 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/16329 [SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide ## What changes were proposed in this pull request? - A separate subsection for Aggregations under “Getting

[GitHub] spark pull request #16024: [MINOR][DOCS] Updates to the Accumulator example ...

2016-11-28 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16024#discussion_r89791975 --- Diff: docs/programming-guide.md --- @@ -1378,29 +1378,36 @@ res2: Long = 10 While this code used the built-in support for accumulators

[GitHub] spark pull request #16024: [MINOR][DOCS] Updates to the Accumulator example ...

2016-11-28 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/16024#discussion_r89788527 --- Diff: docs/programming-guide.md --- @@ -1424,29 +1431,38 @@ accum.value(); // returns 10 {% endhighlight %} -Programmers can also

[GitHub] spark pull request #16024: [MINOR][DOCS] Updates to the Accumulator example ...

2016-11-27 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/16024 [MINOR][DOCS] Updates to the Accumulator example in the programming guide. Fixed typos, AccumulatorV2 in Java ## What changes were proposed in this pull request? This pull request

[GitHub] spark pull request #14050: [MINOR][EXAMPLES] Window function examples

2016-10-22 Thread aokolnychyi
Github user aokolnychyi closed the pull request at: https://github.com/apache/spark/pull/14050 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-12 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/14119 **Summary of the updates** - `JavaSparkSQL.java` file was removed. I kept it initially since the file itself was quite old (2+ years) and it was present in your original WIP branch

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-09 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70173180 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSqlExample.java --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-09 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70173131 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-09 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70173058 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -41,43 +35,47 @@ object HiveFromSpark

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-09 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/14119#discussion_r70173035 --- Diff: docs/sql-programming-guide.md --- @@ -1380,17 +949,17 @@ metadata. {% highlight scala %} -// spark is an existing

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-09 Thread aokolnychyi
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/14119 @liancheng could you, please, review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-09 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/14119 [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples ## What changes were proposed in this pull request? - Hard-coded Spark SQL sample snippets were moved

[GitHub] spark pull request #14050: [MINOR][EXAMPLES] Window function examples

2016-07-04 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/14050 [MINOR][EXAMPLES] Window function examples ## What changes were proposed in this pull request? An example that explains the usage of window functions. It shows the difference