[GitHub] spark pull request #23141: [SPARK-26021][SQL][followup] add test for special...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23141#discussion_r239333919 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -165,10 +165,14 @@ public void

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 Yes, the 3 cases I pointed that need to handle NaN and -0.0 do not change the value in `UnsafeRow`. --- - To unsubscribe, e

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 cc @adoron @kiszk @viirya @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23239 [SPARK-26021][SQL][followup] only deal with NaN and -0.0 in UnsafeWriter ## What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/23043

[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23235 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23227 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239090244 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 If we look at test coverage, `wholeStage=false, factoryMode=CODE_ONLY` will go through code paths that wholeStageCodegen doesn't cover. Or did I miss something

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 But whole stage codegen will not test `GenerateUnsafeProject`, `GenerateMutableProject`, etc., right? --- - To unsubscribe, e

[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23222 Jenkins passes, which means the previously added end-to-end test can't not show the benefit of this rule. We should update

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239060606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239059162 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 how about `wholeStage=false, factoryMode=CODE_ONLY`? I think it's different from `wholeStage=false, factoryMode=NO_CODEGEN

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 that's a lot of time... Can we think more about the combination of codegen and wholeStage? When we turn on whole stage codegen but turn off codegen, what will happen

[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23222 That PR also added an end-to-end test, does this mean that test is not valid? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r238950241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -734,6 +734,28 @@ object CollapseWindow extends

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 do you know how long `SQLQueryTestSuite` takes? We are making it longer by 4 times here, so better to know the overhead

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238949362 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf

[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 Hi @MaxGekk , since this changes the result(although makes it better), do you mind adding a migration guide? thanks

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238909822 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238908877 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf

[GitHub] spark issue #23210: [SPARK-26233][SQL] CheckOverflow when encoding a decimal...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23210 a late LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23217 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238899698 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238706120 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,127 @@ case class

[GitHub] spark pull request #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat i...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23217#discussion_r238698103 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -47,13 +48,17 @@ class ArrayBasedMapBuilder

[GitHub] spark pull request #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat i...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23217#discussion_r238651534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -47,13 +48,17 @@ class ArrayBasedMapBuilder

[GitHub] spark pull request #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat i...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23217#discussion_r238651421 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -47,13 +48,17 @@ class ArrayBasedMapBuilder

[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23217 thanks for the cleanup! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238650207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22468 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 We should create such a framework when we need to have per-file config settings for testing. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238634730 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238633725 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238630996 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238630981 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask

[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23215 I think this new behavior makes more sense, but we need to add a migration guide. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238627633 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238626367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625581 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625268 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I think there is a problem, but no one found out because it's only about metrics. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 Can you share some ideas about it? IMO shuffle write metrics is hard, as an RDD can have shuffle dependencies with multiple upstream RDDs. That said, in general the shuffle write metrics should

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 It's easy to track `numKeyLookups` at `HashedRelation`, but it's hard to track `numProbes`. One idea is, we pass a `MutableInt` to `LongToUnsafeRowMap.getValue` as a parameter, and in the method

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by multiple threads. For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread

[GitHub] spark pull request #23214: [SPARK-26155] Optimizing the performance of LongT...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23214#discussion_r238549645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -398,8 +399,8 @@ private[execution] final class

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 How about, we create an `OptimizedIn`, and convert `In` to `OptimizedIn` if the list is all literals? `OptimizedIn` will pick `switch` or hash set based on the length of the list

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238534101 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238533700 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 I think `InSet` is not an optimized version of `In`, but just a way to separate the implementation for different conditions (the length of the list). Maybe we should do the same thing here

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 @rxin I proposed the same thing before, but one problem is that, we only convert `In` to `InSet` when the length of list reaches the threshold. If the `switch` way is faster than hash set when

[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23212 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r238524973 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238524763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238520267 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,100 @@ class

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238515544 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallbackSuite.scala --- @@ -106,4 +106,19

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238515444 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallbackSuite.scala --- @@ -106,4 +106,19

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238515227 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -166,29 +166,40 @@ object UnsafeProjection

[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23212 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238330712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -157,4 +157,22 @@ object InternalRow

[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22512 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238328405 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,25 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23152 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r238323331 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -483,8 +470,6 @@ private[execution] final class

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r238322699 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -483,8 +470,6 @@ private[execution] final class

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238321460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238318256 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class

[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 cc @rdblue @rxin @jose-torres @gatorsmile @HyukjinKwon @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r238313454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,52 +17,49 @@ package

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r238313221 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-03 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23208 [SPARK-25530][SQL] data source v2 API refactor (batch write) ## What changes were proposed in this pull request? Adjust the batch write API to match the read API refactor after https

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r238264122 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -57,12 +57,6 @@ class SQLMetric(val metricType: String

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 I'm fine to revert it if it caused a significant performance regression, we should revisit it later, with different ideas, like updating the metrics for each batch instead of each record

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r23825 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala --- @@ -374,22 +374,6 @@ class

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r238257371 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -63,7 +63,7 @@ case class HashAggregateExec

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23186 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23203 I used to run pyspark test via `python python/pyspark/sql/dataframe.py`, after setting `export PYTHONPATH="$(find "${SPARK_HOME}"/python/lib/ -name 'py4j-*-src.zip' -type f | uniq)

[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23010: [SPARK-26012][SQL]Null and '' values should not cause dy...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23010 The root cause is, `DynamicPartitionDataWriter` treats null and empty string as different partition values, and creates new files. However, null and empty string are converted

[GitHub] spark issue #23190: [SPARK-26117][FOLLOW-UP][SQL]throw SparkOutOfMemoryError...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238172470 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,21 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23186 retest this pleae --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23120#discussion_r238083349 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala --- @@ -243,21 +243,27 @@ class UnivocityParser

[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23154 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238083055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import

[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23178 thanks for the review, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23130 We don't need to block it, but @MaxGekk if you have time, it would great to answer https://github.com/apache/spark/pull/23130#issuecomment-442491582 thanks, merging to master

[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #23187: [SPARK-26211][SQL][TEST][FOLLOW-UP] Combine test cases f...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23187 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23153#discussion_r238082589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -155,19 +155,20 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23176#discussion_r237770687 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala --- @@ -293,6 +293,54 @@ class PredicateSuite

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237756348 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237756394 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237753623 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237753433 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class

[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22957 LGTM, cc @viirya as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

<    1   2   3   4   5   6   7   8   9   10   >