Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21889#discussion_r207719862
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -0,0 +1,205
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21889#discussion_r207718734
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -0,0 +1,205
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> Are there any other blockers to enabling this by default now that
@mallman fixed the currently known broken queries?
The functionality exercised by the ignored t
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
Success! Now where do we stand, @gatorsmile @HyukjinKwon ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
Oh dear. I don't know why we're getting all these sigkills. I think we're
going to need another retest...
---
-
To unsubscribe
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> Anybody else able to reproduce this failure? It succeeded on my developer
machine.
It worked for me, too. Let's see what a retest d
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> I was able to run the first failing test successfully. Can we get a
retest, please?
@ajacques I just rebased and pushed my branch off of master. Perhaps the
easiest thing to do wo
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> These test failures are in Spark streaming. Is this just an intermittent
test failure or actually caused by this PR?
I was able to run the first failing test successfully. Can we
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> This patch fails Scala style tests.
Hi @ajacques. I'm not sure if you're aware of this, but you can run the
scalastyle checks locally with
```
sbt scalast
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> Where does that leave both of these PRs? Do we still want this one with
the code refactoring or to go back to the original? Are there any comments for
this PR that would block merging? I've
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
Hi @gatorsmile. Where do you see us at this point? Do you still want to get
this into Spark 2.4?
---
-
To unsubscribe, e-mail
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> @gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results from the
following simple projection query, given @jainaks test file:
>
>```
>select page.url, page fr
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21889
> @mallman, if we are all happy here, mind taking a look
https://github.com/apache/spark/pull/21320#issuecomment-408271470 and
https://github.com/apache/spark/pull/21320#issuecomment-406765
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> Thanks @jainaks for the sample file and instructions to reproduce the
problem. I will investigate and reply.
@gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results f
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Thanks @jainaks for the sample file and instructions to reproduce the
problem. I will investigate and reply
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Sorry it's taken me a couple of days to respond. I needed the time to
ruminate (and not). I could write voluminously, but I just want to reply to a
couple of points and move on.
> I th
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> Few comments like #21320 (comment) or ^ are not minor or nits. I leave
hard -1 if they are not addressed.
I'm sorry to say I'm very close to hanging up this PR. I put a lot of care,
t
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> gentle ping @mallman since the code freeze is close
Outside of my primary occupation, my top priority on this PR right now is
investigating
https://github.com/apache/spark/pull/21
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> Regarding #21320 (comment), can you at least set this enable by default
and see if some existing tests are broken or not?
I have no intention to at this po
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205022974
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,388 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205022799
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -0,0 +1,156
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205022895
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,153
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205021712
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,388 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205021469
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,153
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205021282
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205020970
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -71,9 +80,22 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r205021140
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -0,0 +1,156
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
>> Hi @jainaks. Thanks for your report. Do you have the same problem
running your test with this PR?
> @mallman Yes, the issue with window functions is reproducible even with
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r204209033
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1298,8 +1298,18 @@ object SQLConf {
"issues.
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
The test failure is unrelated to this patch. Shall we retest?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> @mallman I still think we need to split it to two PRs. To resolve the
issues you mentioned above, how about creating a separate PR? Only 10 days left
before the code freeze of Spark 2.4. We p
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r204208518
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,387 @@
+/*
+ * Licensed
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> BTW, I am trying to take a look closely. I would appreciate if there are
some concrete examples so that I (or other reviewers) can double check along.
Parquet is pretty core fix and let's be v
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> I think if the tests are few, you can make them ignored for now here, and
make another PR enabling it back with the changes in ParquetReadSupport.scala.
That's the approach I've ta
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r204206072
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
---
@@ -0,0 +1,156
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> @mallman I still think we need to split it to two PRs. To resolve the
issues you mentioned above, how about creating a separate PR? Only 10 days left
before the code freeze of Spark 2.4. We p
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> Could we move the changes made in ParquetReadSupport.scala to a separate
PR? Then, we can merge this PR very quickly.
If I remove the changes to `ParquetReadSupport.scala`, then f
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r201863353
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,153
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r201863463
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala
---
@@ -0,0 +1,134 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r201863251
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -71,9 +80,22 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem.
I understand you've put a lot of work into this implementation, however I
think you should try a simpler approach before we consider something more
complicated. I believe
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem.
Thanks for the information regarding disk use for your scenario. What do
you think about my second point, using the `ContextCleaner
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r199648692
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -182,18 +182,20 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r199643803
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -71,9 +80,22 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r199631341
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -47,16 +47,25 @@ import
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem. I dug deeper and think I understand the problem better.
To state the obvious, the periodic checkpointer deletes checkpoint files of
RDDs that are potentially still accessible
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @szhem. Thanks for the kind reminder and thanks for your contribution.
I'm sorry I did not respond sooner.
I no longer work where I regularly used the checkpointing code with large
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@gatorsmile I've removed the changes to the files as you requested. This
removes support for schema pruning on filters of queries. I've pushed the
previous revision to a new branch in our `spark
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@gatorsmile The last build was killed by SIGKILL. Can you start a new
build, please?
---
-
To unsubscribe, e-mail: reviews
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@mallman Yes, the issue with window functions is reproducible even with
this PR.
Can you attach a (small) parquet file I can use to test this scenario
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
> @mallman It does work fine with "name.First".
@jainaks What is the value of the Spark SQL configuration setting
`spark.sql.caseSensitive` when you run this query? Also, are
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@gatorsmile The last couple of build test failures appear to be entirely
unrelated to this PR. The error message in the one failed test reads
`org.scalatest.exceptions.TestFailedException: Unable
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Hi @jainaks. Thanks for your report. Do you have the same problem running
your test with this PR?
---
-
To unsubscribe, e-mail
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
Hi @jainaks. I can see why your query would not work. In the example you
provide, if you refer to the column as `name.First`, does your query succeed
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@gatorsmile I've addressed many of your points in today's commits. Can you
please take a look at what I've done so far? I'm still working on the PRs you
requested
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190494243
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,432 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190493689
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,154
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190492424
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,154
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190491050
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
---
@@ -879,6 +879,15 @@ class
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190486220
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1256,8 +1256,18 @@ object SQLConf {
"issues.
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190485768
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -286,7 +286,19 @@ case class FileSourceScanExec
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190485713
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190484386
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
---
@@ -99,27 +100,28 @@ trait
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r190479026
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
---
@@ -162,7 +162,9 @@ case class FilterExec(condition
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/21320
@gatorsmile I believe this is the PR you requested for review.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
I'm closing this PR in favor of #21320. That PR deals with simple
projection and filter queries only. I will submit subsequent PRs for
aggregation and join queries following the acceptance
Github user mallman closed the pull request at:
https://github.com/apache/spark/pull/16578
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/21320
[SPARK-4502][SQL] Parquet nested column pruning - foundation
(Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502)
_N.B. This is a restart of PR #16578 which includes everything
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
BTW Iâve been and am currently traveling with a busy itinerary. I
havenât started work on this and probably wonât get to work on it until
Monday at the very earliest.
> On Ma
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> To ensure the PR and review quality, we normally avoid doing everything
in a single huge PR. It would be much better if you can cut it to a few smaller
PRs.
I'll have a
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r181575614
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -151,6 +151,9 @@ abstract class Optimizer
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
I'd just suggest trying it. Since this PR is a patch for master, please
message me personally at m...@allman.ms to discuss progress and questions
on a backport to 2.2. If we get it working
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> However, I am -1 on merging a change this large after branch cut.
It's disappointing, but I agree we can't merge a change this large into a
branch cut. It will have to wait for 2.
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> But I think we still need other eyes on this too.
Agreed.
@rxin can you help rope anyone else in on this? It's a big PR with a bigger
history, but absent some savaging by anot
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> Can you give an example it would fail? We didn't change
clipParquetSchema, so I think even when pruning happens, why we read a super
set of the file's schema and cause the exception, accord
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r151477529
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -961,6 +961,15 @@ object SQLConf {
.booleanConf
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r151474342
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -961,6 +961,15 @@ object SQLConf {
.booleanConf
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r151026919
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -961,6 +961,15 @@ object SQLConf {
.booleanConf
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/19682
Thanks for the fix!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
@viirya Can you please take a look at my latest revisions and replies to
your comments? Cheers.
---
-
To unsubscribe, e-mail
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148866084
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -63,9 +74,22 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148863673
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -63,9 +74,22 @@ private[parquet
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
I can't tell what's causing the build to fail:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83390/console
Any ideas
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> Yeah, I think with a config for this optimization is good.
I added a config switch, `spark.sql.nestedSchemaPruning.enabled`, which
disables the optimizations if set to `false`. By defa
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148731634
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,440 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148731266
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
---
@@ -0,0 +1,440 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148725914
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -63,9 +74,22 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148725482
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/FileSchemaPruningTest.scala
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148725325
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/FileSchemaPruningTest.scala
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148725152
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,147
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148723190
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
---
@@ -0,0 +1,147
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148722822
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -127,8 +127,8 @@ private[parquet
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148719962
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
---
@@ -0,0 +1,61 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148718059
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
---
@@ -0,0 +1,61 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148717122
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/JoinFieldExtractionPushdown.scala
---
@@ -0,0 +1,66
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148717085
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/FieldExtractionPushdown.scala
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148716702
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala
---
@@ -0,0 +1,77
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148716194
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala
---
@@ -0,0 +1,77
Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/16578#discussion_r148687395
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/FieldExtractionPushdown.scala
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/16578
> I'm reluctant to generalize this PR without practical experience applying
it to other column-oriented file formats. The only format I'm familiar with and
have production experie
101 - 200 of 656 matches
Mail list logo