Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89263076
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89242805
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89249380
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89252556
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89262935
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
---
@@ -274,7 +274,7 @@ class DDLSuite extends QueryTest
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89249078
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89248592
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15951#discussion_r89242786
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -84,30 +88,106 @@ case class DataSource
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
A similar alternative fix @yhuai proposed is to convert the underlying
`UnsafeRow` into a safe row (i.e. `GenericInternalRow` in this case) using a
projection instead of simply adding a `.copy
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
Also cc @davies and @sameeragarwal.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
The last build failure was caused by YARN tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
cc @yhuai @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15976#discussion_r89178617
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala
---
@@ -325,70 +320,67 @@ class
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15976
The last build failure was caused by a logical conflict with #15703. We
don't really have any aggregate functions that don't support partial
aggregation now after merging #15703, while the re
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15976
[SPARK-18403][SQL] Fix unsafe data false sharing issue in
ObjectHashAggregateExec
[SPARK-18403][SQL] Fix unsafe data false sharing issue in
ObjectHashAggregateExec
## What changes were
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15813#discussion_r88728867
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -173,35 +178,17 @@ class CSVFileFormat
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
Thanks everyone for the review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88312625
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala
---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88312643
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala
---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88312590
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88311998
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88311961
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88311780
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88311737
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88311655
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +380,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88310719
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -289,73 +302,75 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88310296
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -289,73 +302,75 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r88310092
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -263,8 +265,19 @@ private[hive] case class HiveGenericUDTF
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15845
[SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite
## What changes were proposed in this pull request?
Randomized tests in `ObjectHashAggregateSuite` is being flaky
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
The last build failure was because of a logical conflict between this PR
and the master branch. Resolving it.
---
If your project is set up for it, you can reply to this email and have your
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r87309805
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -289,73 +302,77 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r87309760
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -365,4 +382,66 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15802
The last build failure was caused by an irrelevant flaky test.
BTW, I've reproduced the OOM issue locally by running
`ObjectHashAggregateSuite` 200 times within a single SBT REPL session
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15802
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15802
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15802
@cloud-fan already reported the OOM issue. I'm trying to reproduce it
locally.
Added the `[test-maven]` tag to trigger Maven tests.
---
If your project is set up for it, you can reply
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15802
[SPARK-18338][SQL] Fix test case initialization order under Maven builds
## What changes were proposed in this pull request?
Test case initialization order under Maven and SBT
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
OK, now it's ready for review and merge.
cc @yhuai @JoshRosen @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
It turned out that I didn't initialize Hive UDAF evaluators properly.
Quoted from commit message of my previous commit:
> Hive UDAFs are sensitive to aggregation mode, and m
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
@tejasapatil Another point that I'd like to add is that even if the
performance for a single UDAF like `GenericUDAFCollectList` regresses, you
still have performance gains if such UDAFs are used
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r85987778
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -293,69 +307,57 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15703#discussion_r85981118
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -293,69 +307,57 @@ private[hive] case class HiveUDAFFunction
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
I found that I'm handling bridged UDAFs properly, which caused a few test
failures. Working on it.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
@tejasapatil For `collect_set` and `collect_list`, we'll simply migrate
them to `TypedImperativeAggregate` and so that they become Spark native
aggregate functions. We can also handle other built
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
I can't reproduce those test failures when executing failed test cases
individually. Seems that it's related to execution order. Still investigating.
---
If your project is set up for it, you
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15703
Will add more details in the PR description soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15703
[SPARK-18186] Migrate HiveUDAFFunction to TypedImperativeAggregate for
partial aggregation support
## What changes were proposed in this pull request?
This PR migrates `HiveUDAFFunction
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15651#discussion_r8567
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -130,17 +130,40 @@ case class ExternalRDDScanExec[T
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15651
Also cc @JoshRosen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r85611295
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +140,59 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r85610578
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +99,19 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84562093
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15651
@viirya `Dataset.localCheckpoint()` also makes sense. Would like to add it
as a follow-up though. Thanks for the suggestion!
---
If your project is set up for it, you can reply to this email
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15651#discussion_r85421484
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -130,17 +130,23 @@ case class ExternalRDDScanExec[T
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15651#discussion_r85411204
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -482,6 +483,33 @@ class Dataset[T] private[sql
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15651#discussion_r85408917
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -130,17 +130,23 @@ case class ExternalRDDScanExec[T
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15651
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15651#discussion_r85264291
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -919,6 +922,44 @@ class DatasetSuite extends QueryTest
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15651
cc @mengxr @jkbradley @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15565
Closing this in favor of #15651.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user liancheng closed the pull request at:
https://github.com/apache/spark/pull/15565
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15651
[SPARK-17972][SQL] Add Dataset.checkpoint() to truncate large query plans
## What changes were proposed in this pull request?
### Problem
Iterative ML code may easily create
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15590
@hvanhovell That's a great point.
This is actually one of my pain points while writing this new operator.
These problems are:
1. `HashAggregateExec` and `SortAggregateExec` have
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15590#discussion_r84760919
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
---
@@ -0,0 +1,323 @@
+/*
+ * Licensed
Github user liancheng closed the pull request at:
https://github.com/apache/spark/pull/15517
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15517
I'm closing this since caching is not the ultimate solution for this
problem anyway. Caching is too memory consuming when you, say, computing
connected components in an iterative way over a graph
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422346
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422485
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -212,6 +212,11 @@ object SQLConf {
.booleanConf
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422636
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -661,6 +666,8 @@ private[sql] class SQLConf extends Serializable
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84558190
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422606
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -212,6 +212,11 @@ object SQLConf {
.booleanConf
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422353
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84406104
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +99,15 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84559521
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84436528
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +99,15 @@ object FileSourceStrategy
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422762
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
---
@@ -571,6 +571,37 @@ class
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84422376
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object FileSourceStrategy
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15590
[SPARK-17949][SQL] A Java object based aggregate operator
## What changes were proposed in this pull request?
This PR adds a new hash-based aggregate operator named
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14957
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14957
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15517#discussion_r84395740
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -66,11 +66,13 @@ class QueryExecution(val sparkSession
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15565
[DO NOT MERGE][17972][SQL] Another try of PR #15517
## What changes were proposed in this pull request?
This is another try of PR #15517, which aims to solve the exponential slow
down
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15562#discussion_r84220219
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
---
@@ -408,17 +416,6 @@ object WriteOutput extends
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15551
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15517
The most recent version still breaks some test cases related to caching.
Investigating it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15551#discussion_r84191061
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
---
@@ -0,0 +1,512 @@
+/*
+ * Licensed
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15551#discussion_r84187462
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
---
@@ -0,0 +1,514 @@
+/*
+ * Licensed
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15551#discussion_r84187214
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
---
@@ -0,0 +1,514 @@
+/*
+ * Licensed
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15517
The previous test failure was because we replace the analyzed plan with
`withCacheData`, while cache manager uses the original analyzed plan as keys.
Force-pushed a new and much simpler
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15517#discussion_r83703038
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
---
@@ -142,7 +142,7 @@ case class InMemoryRelation
GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/15517
[SPARK-17972][SQL] Cache analyzed plan instead of optimized plan to avoid
slow query planning
## What changes were proposed in this pull request?
Iterative ML code may easily create
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15072#discussion_r82704021
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -53,7 +53,15 @@ import org.apache.spark.util.Utils
private[sql
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15332
@davies Unfortunately parquet-mr 1.8.1, which is used by the current
master, hadn't included `TIMESTAMP_MICROS` yet. To be more specific,
`OriginalType` in parquet-mr 1.8.1 doesn't include
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15333
Would be nice to add a simple example to illustrate why we can't ensure
that a `GenericInternalRow` is immutable. For example, for a
`GenericInternalRow` with a `StructType` field, it's legal
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14649
@andreweduffy @andreweduffy Thanks for the explanations! This makes much
more sense to me now.
Although `_metadata` can be neat for the read path, it's a trouble maker
for the write
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14172
LGTM, merging to master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14399
Sorry for the late review! LGTM, merging to master, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14649
Sorry for the late reply.
Firstly, Spark SQL only reads footers of all Parquet files in case of
schema merging, which can be controlled by SQL option
`spark.sql.parquet.mergeSchema
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14537
LGTM. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
101 - 200 of 5176 matches
Mail list logo