[GitHub] spark issue #20750: [SPARK-23581][SQL] Add interpreted unsafe projection

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20750 cc @cloud-fan @mgaido91 @kiszk @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20749: [SPARK-23590][SQL] Add interpreted execution to CreateEx...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20749 LGTM - pending jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20748: [SPARK-23611][SQL] Add a helper function to check...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20748#discussion_r172514416 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -93,6 +95,40 @@ trait

[GitHub] spark pull request #20748: [SPARK-23611][SQL] Add a helper function to check...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20748#discussion_r172513927 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -93,6 +95,40 @@ trait

spark git commit: [SPARK-23594][SQL] GetExternalRowField should support interpreted execution

2018-03-06 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master ad640a5af -> e8a259d66 [SPARK-23594][SQL] GetExternalRowField should support interpreted execution ## What changes were proposed in this pull request? This pr added interpreted execution for `GetExternalRowField`. ## How was this patch

[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20746 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20746: [SPARK-23594][SQL] GetExternalRowField should sup...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20746#discussion_r172477800 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -84,4 +85,23 @@ class

[GitHub] spark pull request #20746: [SPARK-23594][SQL] GetExternalRowField should sup...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20746#discussion_r172477430 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -84,4 +85,23 @@ class

[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20746 @maropu look pretty good. I left one comment, otherwise this is gtg. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20746: [SPARK-23594][SQL] GetExternalRowField should sup...

2018-03-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20746#discussion_r172455421 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -84,4 +85,23 @@ class

spark git commit: [SPARK-23586][SQL] Add interpreted execution to WrapOption

2018-03-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 7706eea6a -> f6b49f9d1 [SPARK-23586][SQL] Add interpreted execution to WrapOption ## What changes were proposed in this pull request? The PR adds interpreted execution to WrapOption. ## How was this patch tested? added UT Author: Marco

[GitHub] spark issue #20741: [SPARK-23586][SQL] Add interpreted execution to WrapOpti...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20741 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20741: [SPARK-23586][SQL] Add interpreted execution to W...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20741#discussion_r172337853 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -75,4 +75,14 @@ class

[GitHub] spark pull request #20741: [SPARK-23586][SQL] Add interpreted execution to W...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20741#discussion_r172333065 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -75,4 +75,14 @@ class

spark git commit: [SPARK-23585][SQL] Add interpreted execution to UnwrapOption

2018-03-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 98a5c0a35 -> ba622f45c [SPARK-23585][SQL] Add interpreted execution to UnwrapOption ## What changes were proposed in this pull request? The PR adds interpreted execution to UnwrapOption. ## How was this patch tested? added UT Author:

[GitHub] spark issue #20736: [SPARK-23585][SQL] Add interpreted execution to UnwrapOp...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20736 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20723 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

spark git commit: [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.metastore.version

2018-03-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master a366b950b -> 947b4e6f0 [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.metastore.version ## What changes were proposed in this pull request? Update `spark.sql.hive.metastore.version` to 2.3.2, same as HiveUtils.scala:

[GitHub] spark issue #20734: [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.meta...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20734 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20736: [SPARK-23586][SQL] Add interpreted execution to UnwrapOp...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20736 LGTM - pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20736: [SPARK-23586][SQL] Add interpreted execution to U...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20736#discussion_r172194194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -382,8 +382,14 @@ case class UnwrapOption

[GitHub] spark issue #20736: [SPARK-23586][SQL] Add interpreted execution to UnwrapOp...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20736 @mgaido91 you picked this up phenomenally fast! It looks pretty good, let two minor comments. --- - To unsubscribe, e-mail

[GitHub] spark pull request #20736: [SPARK-23586][SQL] Add interpreted execution to U...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20736#discussion_r172191654 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala --- @@ -66,4 +66,16 @@ class

[GitHub] spark pull request #20736: [SPARK-23586][SQL] Add interpreted execution to U...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20736#discussion_r172188241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -382,8 +382,14 @@ case class UnwrapOption

spark git commit: [SPARK-23496][CORE] Locality of coalesced partitions can be severely skewed by the order of input partitions

2018-03-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 2ce37b50f -> 42cf48e20 [SPARK-23496][CORE] Locality of coalesced partitions can be severely skewed by the order of input partitions ## What changes were proposed in this pull request? The algorithm in

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20664 Merging to master. Thanks! @mgaido91 if you feel this should be different, feel free to open a follow-up

[GitHub] spark issue #20734: [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.meta...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20734 LGTM :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20043 @viirya big fan of this change! More structure will make code gen easier & safer to implement. I think we should merge this as is, and then I think it might be good to start adding t

[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20712 @passionke can you elaborate on why we need this? A problem with increasing the cache is that we keep more classes around and those classes can blow out the JVM's internal code caches

[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20699 A very valid use cases for repartitioning is limiting concurrency. For example when you have a UDF that calls some service you don't want to DDOS that service because you make requests from too

[1/2] spark git commit: [SPARK-23546][SQL] Refactor stateless methods/values in CodegenContext

2018-03-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 269cd5359 -> 2ce37b50f http://git-wip-us.apache.org/repos/asf/spark/blob/2ce37b50/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

[2/2] spark git commit: [SPARK-23546][SQL] Refactor stateless methods/values in CodegenContext

2018-03-05 Thread hvanhovell
[SPARK-23546][SQL] Refactor stateless methods/values in CodegenContext ## What changes were proposed in this pull request? A current `CodegenContext` class has immutable value or method without mutable state, too. This refactoring moves them to `CodeGenerator` object class which can be

[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20700 I am having some problems with the merge script, give me a little bit of time. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-05 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20700 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20700 @kiszk can you update the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-03 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20706 @jiangxb1987 keeping dead code around will inevitably lead to code rot. If there is not use for something then we should remove it. Adding things back is not a problem

[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-02 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20712 @mgaido91 Yeah, you are right. This is pretty pointless unless you either set this as a static spark conf or pass it to the executor using local properties. Both aren't very attractive IMO

[GitHub] spark pull request #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20706#discussion_r171683202 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -267,44 +264,20 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20706#discussion_r171676609 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -267,44 +264,20 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20706#discussion_r171674429 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -267,44 +264,20 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #20700: [SPARK-23546][SQL] Refactor stateless methods/val...

2018-03-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20700#discussion_r171667563 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1524,4 +1348,215 @@ object

[GitHub] spark issue #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2018-02-28 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20691 I am not sure that adding such a test is very good for test stability, but we could disable it by default

[GitHub] spark issue #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2018-02-28 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20691 Have you tried serializing an array larger than 2GB? There is a pretty big chance that we do not support on the Spark side

[GitHub] spark issue #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2018-02-28 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20691 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2018-02-28 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20691 hmmm - jenkins seems not to be playing ball --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20693: [SPARK-23523] [SQL] [FOLLOWUP] Minor refactor of Optimiz...

2018-02-28 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20693 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20642: i m not able to open Spark UI on local using localhost:4...

2018-02-20 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20642 @AtulKumVerma please close this PR. You can ask questions in the user list or on stack overflow. --- - To unsubscribe, e

[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20630#discussion_r168885323 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java --- @@ -51,6 +53,23 @@ public void testKnownLongInputs

[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20630#discussion_r168883881 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java --- @@ -60,6 +60,8 @@ public static int hashUnsafeWords(Object

[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20568 @mrkm4ntr this is legitimate failure. Can you fix the python tests? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20626 You are going to need to 'type' null values for this work, I think casting would be enough. --- - To unsubscribe, e-mail

[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20568 @mrkm4ntr I see your point. Adding a method to Murmur3 would work. The problem is that we are now going to release a `FeatureHasher` in Spark 2.3 that uses the current Murmur3

[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-12 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20568 @mrkm4ntr The change itself looks pretty reasonable. However I am very hesitant to merge this because this will probably break bucketing (it uses murmur3 to create the buckets); for example

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-02-12 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20579 @dilipbiswal this is a nice improvement. I left a few comments. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

2018-02-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167625448 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends

[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

2018-02-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167623896 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -68,6 +68,16 @@ class

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166767068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -190,9 +190,13 @@ object FileFormatWriter

[GitHub] spark issue #20530: [SPARK-23349][SQL]ShuffleExchangeExec: Duplicate and red...

2018-02-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20530 @wujianping10043419 this change is too trivial to merge, I am sorry. The main reason for not merging is because this will mess up the git blame for the given code, which is sometimes very useful

[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...

2018-02-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20504 @wangyum this is a PR of 24K lines! How are we going to review this properly? Can you provide some guidelines (major changes etc

[GitHub] spark issue #20505: [SPARK-23251][SQL] Add checks for collection element Enc...

2018-02-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20505 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

2018-02-04 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20505#discussion_r165889177 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -165,11 +165,15 @@ abstract class SQLImplicits extends

[GitHub] spark issue #18136: [SPARK-20910][SQL] Add build-in SQL function - UUID

2018-02-01 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18136 I just came across this expression and I have a few concerns: 1. A row will not get the same UUID assigned when a task fails. This might cause some really weird problems when the UUID

[GitHub] spark issue #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFr...

2018-01-31 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20463 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

spark git commit: [SPARK-23223][SQL] Make stacking dataset transforms more performant

2018-01-29 Thread hvanhovell
ver this. Author: Herman van Hovell <hvanhov...@databricks.com> Closes #20402 from hvanhovell/SPARK-23223. (cherry picked from commit 2d903cf9d3a827e54217dfc9f1e4be99d8204387) Signed-off-by: Herman van Hovell <hvanhov...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-23223][SQL] Make stacking dataset transforms more performant

2018-01-29 Thread hvanhovell
hor: Herman van Hovell <hvanhov...@databricks.com> Closes #20402 from hvanhovell/SPARK-23223. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d903cf9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2d903cf9 D

[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-29 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20402 Ok, merging this to master/4.0. Thanks for all the reviews! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20402: [SPARK-23223][SQL] Make stacking dataset transfor...

2018-01-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20402#discussion_r164338267 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -62,7 +62,11 @@ import org.apache.spark.util.Utils private[sql

[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20415 @heary-cao have you benchmarked this? The reason I am asking is because Spark SQL chains iterators, these are pipelined and only materialized when we need to. Your PR effectively removes two

[GitHub] spark pull request #20402: [SPARK-23223][SQL] Make stacking dataset transfor...

2018-01-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20402#discussion_r164030038 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -192,7 +192,7 @@ class Dataset[T] private[sql

[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-25 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20402 cc @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20402: [SPARK-23223][SQL] Make stacking dataset transfor...

2018-01-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20402#discussion_r164029516 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -74,10 +74,36 @@ trait CheckAnalysis extends

[GitHub] spark pull request #20402: [SPARK-23223][SQL] Make stacking dataset transfor...

2018-01-25 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/20402 [SPARK-23223][SQL] Make stacking dataset transforms more performant ## What changes were proposed in this pull request? It is a common pattern to apply multiple transforms to a `Dataset

[GitHub] spark pull request #20394: [SPARK-23214][SQL] cached data should not carry e...

2018-01-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20394#discussion_r163943771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -73,11 +73,16 @@ case class InMemoryRelation

[GitHub] spark pull request #20391: [SPARK-23208][SQL] Fix code generation for comple...

2018-01-24 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/20391 [SPARK-23208][SQL] Fix code generation for complex create array (related) expressions ## What changes were proposed in this pull request? The `GenArrayData.genCodeToCreateArrayData

[GitHub] spark issue #20391: [SPARK-23208][SQL] Fix code generation for complex creat...

2018-01-24 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20391 cc @cloud-fan @kiszk @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163013386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal

[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163007885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal

[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r162957968 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal

[GitHub] spark issue #20302: [SPARK-23094] Fix invalid character handling in JsonData...

2018-01-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20302 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-16 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20277 @cloud-fan did you do some benchmarks? I'd like to make sure that the abstract class to interface change does not negatively impact performance

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20200#discussion_r160391487 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1345,13 +1346,12 @@ abstract class RDD[T: ClassTag]( if (buf.isEmpty

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20200#discussion_r160390893 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -985,7 +985,7 @@ abstract class RDD[T: ClassTag]( def subtract

[GitHub] spark issue #19764: [SPARK-22539][SQL] Add second order for rangepartitioner...

2018-01-03 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19764 @caneGuy adding the second ordering makes the sort much more fine grained, this means that the range partitioner is probably going to use different range boundaries using all the ordering

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159164626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2017-12-20 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20023 I don't fully agree :)... 1. You can use `SQLConf.get` for this. Or you can wire up the rules using the `SessionStateBuilder`. 2. I am reluctant to change this for a minor version. I

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2017-12-20 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20023 In am generally in favor of following the SQL standard. How about we do this. Let's make the standard behavior the default, and add a flag to revert to the old behavior. This allows us to ease

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156718225 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -71,9 +74,10 @@ case class InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156610279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -71,9 +74,10 @@ case class InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156609849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -60,7 +62,8 @@ case class InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156609277 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -80,6 +80,14 @@ class CacheManager extends Logging

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156060138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156059425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156059043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156058341 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer

[GitHub] spark issue #19923: [SPARK-22721] BytesToBytesMap peak memory not updated.

2017-12-08 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19923 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

spark git commit: [SPARK-22721] BytesToBytesMap peak memory not updated.

2017-12-08 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master f88a67bf0 -> f28b1a4c4 [SPARK-22721] BytesToBytesMap peak memory not updated. ## What changes were proposed in this pull request? Follow-up to earlier commit. The peak memory of BytesToBytesMap is not updated in more places - spill() and

[GitHub] spark issue #19923: [SPARK-22721] BytesToBytesMap peak memory not updated.

2017-12-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19923 yay! LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

spark git commit: [SPARK-22721] BytesToBytesMap peak memory usage not accurate after reset()

2017-12-07 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 8ae004b46 -> d32337b1e [SPARK-22721] BytesToBytesMap peak memory usage not accurate after reset() ## What changes were proposed in this pull request? BytesToBytesMap doesn't update peak memory usage before shrinking back to initial

[GitHub] spark issue #19915: [SPARK-22721] BytesToBytesMap peak memory usage not accu...

2017-12-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19915 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19915: [SPARK-22721] BytesToBytesMap peak memory usage not accu...

2017-12-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19915 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

<    5   6   7   8   9   10   11   12   13   14   >