[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208992737 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208993025 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208994951 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209014458 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209015928 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209015906 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016101 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016303 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016366 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -367,6 +367,7 @@ case class AppendData

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016955 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209020054 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209022127 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -27,10 +27,10 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209022769 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209023367 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -23,8 +23,9 @@ * The base interface for data source v2

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209038600 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209039505 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,79 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209041367 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209041811 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042148 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -39,52 +36,43 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209044995 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousPartitionReaderFactory.java --- @@ -0,0 +1,71

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209093885 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209094259 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-06 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21503 [SPARK-24478][SQL] Move projection and filter push down to physical conversion ## What changes were proposed in this pull request? This removes the v2 optimizer rule for push-down and

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21319 Here's the commit with my changes to support v2 stats in the visitor, sorry it took so long for me to find the time! https://github.com/apache/spark/pull/21503/co

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-12 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 @cloud-fan, this is the PR for moving push-down to the physical plan conversion and reporting the stats correctly. Sorry for the confusion because I sent a link to just the second commit

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r194829647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r194841328 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r194861645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r194875888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r195138932 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,15 +17,56 @@ package

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21503: [SPARK-24478][SQL] Move projection and filter pus...

2018-06-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21503#discussion_r195173700 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -32,79 +31,35 @@ import

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 Updated the stats interface. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21558: [SPARK-24552][SQL] Use task ID instead of attempt...

2018-06-13 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21558 [SPARK-24552][SQL] Use task ID instead of attempt number for v2 writes. ## What changes were proposed in this pull request? This passes the unique task attempt id instead of attempt number

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @cloud-fan, this is a work-around for SPARK-24552. I'm not sure the right way to fix this besides fixing the scheduler so that it doesn't use task attempt numbers twice, but I think

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 > So the problem here is, when we retry a stage, Spark doesn't kill the tasks of the old stage and just launch tasks for the new stage I think that's something that should be

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 > IMO your change is the right fix, not just a workaround @squito, part of the problem is that the output commit coordinator -- that ensures only one attempt of a task commits -- relies

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 @cloud-fan, tests are passing for c8517e145b1a460a8be07164c17ce20b1db86659, which has all of the functional changes. The Jenkins job ran out of memory for the last commit, but the only change in it

[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21574 @rxin, we can also add the second pushdown (in the stats visitor) to get better stats with a property to turn it on or off. We're going to add it back in our branch a

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196172875 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -105,117 +105,57 @@ case class

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196173289 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -17,51 +17,115 @@ package

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196173414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 Thank you for reviewing this, @cloud-fan! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196192774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @tgravescs, that's exactly what we're seeing. I think it might just be misleading to have a stage-local attempt ID although it is more friendly for users and matches wh

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 I think the right thing to do for this commit is to use the task ID instead of the stage-local attempt number. I've updated the PR with the change so I think this is ready to commit. @vanzin

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196214788 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -131,16 +139,17 @@ private[spark] class

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196214944 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -131,16 +139,17 @@ private[spark] class

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196215961 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -399,7 +399,8 @@ private[spark] object JsonProtocol { ("Full

[GitHub] spark issue #21577: [SPARK-24552][core] Correctly identify tasks in output c...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21577 +1. This fixes the commit coordinator problem where two separate tasks can be authorized. That case could lead to duplicate data (if, for example, both tasks generated unique file names using a

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196217742 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -399,7 +399,8 @@ private[spark] object JsonProtocol { ("Full

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @vanzin, the ID that this uses is the TID, which I thought was always unique. It appears to be a one-up counter. Also, I noted on your PR that both are needed because even if we only commit one of

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196222500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196223209 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -106,7 +106,7 @@ case class

[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21574 +1 (non-binding) assuming that tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 Yes, I just checked and speculative attempts do get a different TID. Just turn on speculation, run a large stage, and sort tasks in a stage by TID. There aren't dupli

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @vanzin, thanks for working on this. I was out most of this week at a conference and I'm still on just half time, which is why I was delayed. Sorry to leave you all waiting. I'll make c

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197540970 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197541490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542014 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -42,15 +42,12 @@ *Usually

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542704 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542830 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21577 Thanks for fixing this, @vanzin! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197543585 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197547079 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21606 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197552309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark issue #21615: [SPARK-24552][core][sql] Use unique id instead of attemp...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21615 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198230713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198244664 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-06-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, what needs to change to get this in? I'd like to start making more PRs based on these changes. --- - To unsubs

[GitHub] spark pull request #21262: [SPARK-24172][SQL]: Push projection and filters o...

2018-06-26 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/21262 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198551889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +661,56 @@ class

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198553569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -22,16 +22,23 @@ import java.sql.Date

[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21623 Overall, I think this is close. The tests need to cover the row group stats case and we should update how configuration is passed to the filters. Thanks for working on this, @wangyum

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904089 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,22 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904504 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -62,6 +98,30 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904779 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -62,6 +98,30 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198906232 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -359,6 +369,70 @@ class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198907669 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -359,6 +369,70 @@ class

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-07-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, thanks for the thorough feedback! > What catalog operations we want to forward to the data source catalog? Currently it's create/drop/alter table, I think it's go

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199977420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,14 @@ private[parquet] class

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199977784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -69,6 +77,14 @@ private[parquet] class

[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...

2018-07-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21682 +1 I agree with some of the minor refactoring suggestions, but overall this looks correct to me. --- - To unsubscribe

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199979463 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -379,14 +366,29 @@ class

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980897 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark issue #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21696 Thanks, @wangyum! I think this is refactor was a good idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170491 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170480 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170560 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200171424 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200171696 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200173138 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200174526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

<    4   5   6   7   8   9   10   11   >