[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89263076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89242805 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89249380 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89252556 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89262935 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -274,7 +274,7 @@ class DDLSuite extends QueryTest

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89249078 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89248592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark pull request #15951: [SPARK-18510] Fix data corruption from inferred p...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15951#discussion_r89242786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -84,30 +88,106 @@ case class DataSource

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 A similar alternative fix @yhuai proposed is to convert the underlying `UnsafeRow` into a safe row (i.e. `GenericInternalRow` in this case) using a projection instead of simply adding a `.copy

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 Also cc @davies and @sameeragarwal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 The last build failure was caused by YARN tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 cc @yhuai @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #15976: [SPARK-18403][SQL] Fix unsafe data false sharing ...

2016-11-22 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15976#discussion_r89178617 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala --- @@ -325,70 +320,67 @@ class

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 The last build failure was caused by a logical conflict with #15703. We don't really have any aggregate functions that don't support partial aggregation now after merging #15703, while the re

[GitHub] spark pull request #15976: [SPARK-18403][SQL] Fix unsafe data false sharing ...

2016-11-21 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15976 [SPARK-18403][SQL] Fix unsafe data false sharing issue in ObjectHashAggregateExec [SPARK-18403][SQL] Fix unsafe data false sharing issue in ObjectHashAggregateExec ## What changes were

[GitHub] spark pull request #15813: [SPARK-18362][SQL] Use TextFileFormat in JsonFile...

2016-11-18 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15813#discussion_r88728867 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -173,35 +178,17 @@ class CSVFileFormat

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-16 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 Thanks everyone for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88312625 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88312643 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala --- @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88312590 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88311998 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88311961 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88311780 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88311737 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88311655 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88310719 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -289,73 +302,75 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88310296 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -289,73 +302,75 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r88310092 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -263,8 +265,19 @@ private[hive] case class HiveGenericUDTF

[GitHub] spark pull request #15845: [SPARK-18403][SQL] Temporarily disable flaky Obje...

2016-11-10 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15845 [SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite ## What changes were proposed in this pull request? Randomized tests in `ObjectHashAggregateSuite` is being flaky

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 The last build failure was because of a logical conflict between this PR and the master branch. Resolving it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-09 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r87309805 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -289,73 +302,77 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-09 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r87309760 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -365,4 +382,66 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark issue #15802: [SPARK-18338][SQL][test-maven] Fix test case initializat...

2016-11-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15802 The last build failure was caused by an irrelevant flaky test. BTW, I've reproduced the OOM issue locally by running `ObjectHashAggregateSuite` 200 times within a single SBT REPL session

[GitHub] spark issue #15802: [SPARK-18338][SQL][test-maven] Fix test case initializat...

2016-11-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15802 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15802: [SPARK-18338][SQL][test-maven] Fix test case initializat...

2016-11-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15802 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15802: [SPARK-18338][SQL][test-maven] Fix test case initializat...

2016-11-08 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15802 @cloud-fan already reported the OOM issue. I'm trying to reproduce it locally. Added the `[test-maven]` tag to trigger Maven tests. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15802: [SPARK-18338][SQL] Fix test case initialization o...

2016-11-07 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15802 [SPARK-18338][SQL] Fix test case initialization order under Maven builds ## What changes were proposed in this pull request? Test case initialization order under Maven and SBT

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 OK, now it's ready for review and merge. cc @yhuai @JoshRosen @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 It turned out that I didn't initialize Hive UDAF evaluators properly. Quoted from commit message of my previous commit: > Hive UDAFs are sensitive to aggregation mode, and m

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 @tejasapatil Another point that I'd like to add is that even if the performance for a single UDAF like `GenericUDAFCollectList` regresses, you still have performance gains if such UDAFs are used

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r85987778 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -293,69 +307,57 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-11-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15703#discussion_r85981118 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -293,69 +307,57 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 I found that I'm handling bridged UDAFs properly, which caused a few test failures. Working on it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-11-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 @tejasapatil For `collect_set` and `collect_list`, we'll simply migrate them to `TypedImperativeAggregate` and so that they become Spark native aggregate functions. We can also handle other built

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 I can't reproduce those test failures when executing failed test cases individually. Seems that it's related to execution order. Still investigating. --- If your project is set up for it, you

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 Will add more details in the PR description soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedIm...

2016-10-31 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15703 [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support ## What changes were proposed in this pull request? This PR migrates `HiveUDAFFunction

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-31 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15651#discussion_r8567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -130,17 +130,40 @@ case class ExternalRDDScanExec[T

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-28 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15651 Also cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-28 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r85611295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +140,59 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-28 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r85610578 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -97,7 +99,19 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-28 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84562093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15651 @viirya `Dataset.localCheckpoint()` also makes sense. Would like to add it as a follow-up though. Thanks for the suggestion! --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-27 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15651#discussion_r85421484 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -130,17 +130,23 @@ case class ExternalRDDScanExec[T

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-27 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15651#discussion_r85411204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -482,6 +483,33 @@ class Dataset[T] private[sql

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-27 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15651#discussion_r85408917 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -130,17 +130,23 @@ case class ExternalRDDScanExec[T

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15651 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-26 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15651#discussion_r85264291 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -919,6 +922,44 @@ class DatasetSuite extends QueryTest

[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...

2016-10-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15651 cc @mengxr @jkbradley @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15565: [DO NOT MERGE][17972][SQL] Another try of PR #15517

2016-10-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15565 Closing this in favor of #15651. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15565: [DO NOT MERGE][17972][SQL] Another try of PR #155...

2016-10-26 Thread liancheng
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/15565 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to tr...

2016-10-26 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15651 [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate large query plans ## What changes were proposed in this pull request? ### Problem Iterative ML code may easily create

[GitHub] spark issue #15590: [SPARK-17949][SQL] A JVM object based aggregate operator

2016-10-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15590 @hvanhovell That's a great point. This is actually one of my pain points while writing this new operator. These problems are: 1. `HashAggregateExec` and `SortAggregateExec` have

[GitHub] spark pull request #15590: [SPARK-17949][SQL] A JVM object based aggregate o...

2016-10-24 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15590#discussion_r84760919 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala --- @@ -0,0 +1,323 @@ +/* + * Licensed

[GitHub] spark pull request #15517: [SPARK-17972][SQL] Build Datasets upon `withCache...

2016-10-21 Thread liancheng
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/15517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15517: [SPARK-17972][SQL] Build Datasets upon `withCachedData` ...

2016-10-21 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15517 I'm closing this since caching is not the ultimate solution for this problem anyway. Caching is too memory consuming when you, say, computing connected components in an iterative way over a graph

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422346 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -212,6 +212,11 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -661,6 +666,8 @@ private[sql] class SQLConf extends Serializable

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84558190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -212,6 +212,11 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422353 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84406104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -97,7 +99,15 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84559521 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84436528 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -97,7 +99,15 @@ object FileSourceStrategy

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422762 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -571,6 +571,37 @@ class

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-10-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r84422376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -126,4 +136,52 @@ object FileSourceStrategy

[GitHub] spark pull request #15590: [SPARK-17949][SQL] A Java object based aggregate ...

2016-10-21 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15590 [SPARK-17949][SQL] A Java object based aggregate operator ## What changes were proposed in this pull request? This PR adds a new hash-based aggregate operator named

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-10-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14957 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-10-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14957 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15517: [SPARK-17972][SQL] Build Datasets upon `withCache...

2016-10-20 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15517#discussion_r84395740 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala --- @@ -66,11 +66,13 @@ class QueryExecution(val sparkSession

[GitHub] spark pull request #15565: [DO NOT MERGE][17972][SQL] Another try of PR #155...

2016-10-20 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15565 [DO NOT MERGE][17972][SQL] Another try of PR #15517 ## What changes were proposed in this pull request? This is another try of PR #15517, which aims to solve the exponential slow down

[GitHub] spark pull request #15562: [SPARK-18021][SQL] Refactor file name specificati...

2016-10-20 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15562#discussion_r84220219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala --- @@ -408,17 +416,6 @@ object WriteOutput extends

[GitHub] spark issue #15551: [SPARK-18012][SQL] Simplify WriterContainer

2016-10-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15551 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15517: [SPARK-17972][SQL] Build Datasets upon `withCachedData` ...

2016-10-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15517 The most recent version still breaks some test cases related to caching. Investigating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15551: [SPARK-18012][SQL] Simplify WriterContainer - WIP

2016-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15551#discussion_r84191061 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala --- @@ -0,0 +1,512 @@ +/* + * Licensed

[GitHub] spark pull request #15551: [SPARK-18012][SQL] Simplify WriterContainer - WIP

2016-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15551#discussion_r84187462 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala --- @@ -0,0 +1,514 @@ +/* + * Licensed

[GitHub] spark pull request #15551: [SPARK-18012][SQL] Simplify WriterContainer - WIP

2016-10-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15551#discussion_r84187214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala --- @@ -0,0 +1,514 @@ +/* + * Licensed

[GitHub] spark issue #15517: [SPARK-17972][SQL] Build Datasets upon `withCachedData` ...

2016-10-18 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15517 The previous test failure was because we replace the analyzed plan with `withCacheData`, while cache manager uses the original analyzed plan as keys. Force-pushed a new and much simpler

[GitHub] spark pull request #15517: [SPARK-17972][SQL] Cache analyzed plan instead of...

2016-10-17 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15517#discussion_r83703038 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -142,7 +142,7 @@ case class InMemoryRelation

[GitHub] spark pull request #15517: [SPARK-17972][SQL] Cache analyzed plan instead of...

2016-10-17 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/15517 [SPARK-17972][SQL] Cache analyzed plan instead of optimized plan to avoid slow query planning ## What changes were proposed in this pull request? Iterative ML code may easily create

[GitHub] spark pull request #15072: [SPARK-17123][SQL] Use type-widened encoder for D...

2016-10-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82704021 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql

[GitHub] spark issue #15332: [SPARK-10634][SQL] Support Parquet logical type TIMESTAM...

2016-10-04 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15332 @davies Unfortunately parquet-mr 1.8.1, which is used by the current master, hadn't included `TIMESTAMP_MICROS` yet. To be more specific, `OriginalType` in parquet-mr 1.8.1 doesn't include

[GitHub] spark issue #15333: [SPARK-17761][SQL] Remove MutableRow

2016-10-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15333 Would be nice to add a simple example to illustrate why we can't ensure that a `GenericInternalRow` is immutable. For example, for a `GenericInternalRow` with a `StructType` field, it's legal

[GitHub] spark issue #14649: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-09-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14649 @andreweduffy @andreweduffy Thanks for the explanations! This makes much more sense to me now. Although `_metadata` can be neat for the read path, it's a trouble maker for the write

[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...

2016-09-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14172 LGTM, merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14399: [SPARK-16777][SQL] Do not use deprecated listType API in...

2016-09-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14399 Sorry for the late review! LGTM, merging to master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14649: [SPARK-17059][SQL] Allow FileFormat to specify partition...

2016-09-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14649 Sorry for the late reply. Firstly, Spark SQL only reads footers of all Parquet files in case of schema merging, which can be controlled by SQL option `spark.sql.parquet.mergeSchema

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-27 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14537 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

<    1   2   3   4   5   6   7   8   9   10   >