[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202093665 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202093508 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202090024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -248,29 +371,29 @@ private[parquet] class

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, I'd like to get this PR in by 2.4.0. Now that the change to push predicates and projections happens when converting to the physical plan, this can go in. I've rebased this on master

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201763831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -248,29 +371,29 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201763489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -37,41 +39,64 @@ import

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201761849 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 I don't think we need (or want) `SaveMode` passed to writers after standardizing. Uses of `WriteSupport` will always append data to an existing table, which makes it simpler for writers

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201756667 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -37,41 +39,64 @@ import

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201755545 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201755353 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201754882 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r201754998 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -202,6 +283,16 @@ private[parquet] class

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-11 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, yes. There is an open PR, #21308, that adds `DeleteSupport`. I'm not pushing for that just yet because I think `DeleteSupport` should be applied to `Table` after #21306 makes

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-07-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r201427780 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,39 +240,47 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r201399506 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-09 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I think we can ignore that last test failure because tests are passing on the last commit that made real changes. The latest commit only changed a comment

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-09 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-09 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I've updated this and the tests are passing, so I think it is ready for another look. I just pushed a comments-only commit to fix the Javadoc for AppendData that @viirya pointed

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200825129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824906 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java --- @@ -38,15 +38,16 @@ * If this method fails (by throwing

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824602 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824532 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200824504 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I've updated this with the requested changes. Thanks for looking at it! --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200711421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -40,17 +44,24 @@ case class

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200710273 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200709816 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200428354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -172,6 +173,7 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200423733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -40,17 +44,24 @@ case class

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200423206 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200421789 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -203,33 +203,33 @@ class DataSourceV2Suite extends

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r200419939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -82,6 +120,30 @@ private[parquet] class

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200418152 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, can you also review this PR for DataSourceV2? This adds the first of the logical plans proposed in [SPIP: Standardize Logical Plans](https://docs.google.com/document/d

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r200181894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -82,6 +120,30 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r200181749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -82,6 +120,30 @@ private[parquet] class

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/21306 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
GitHub user rdblue reopened a pull request: https://github.com/apache/spark/pull/21306 [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog support. ## What changes were proposed in this pull request? This adds a mix-in to `DataSourceV2` that allows implementations

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-07-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, I've updated this to address your comments. Thanks for the reviews! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200178463 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200177371 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200174526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200173138 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200171696 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200171424 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170560 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170480 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

2018-07-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200170491 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21696 Thanks, @wangyum! I think this is refactor was a good idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980897 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199980632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package

[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r199979463 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -379,14 +366,29 @@ class

[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...

2018-07-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21682 +1 I agree with some of the minor refactoring suggestions, but overall this looks correct to me. --- - To unsubscribe

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199977784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -69,6 +77,14 @@ private[parquet] class

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199977420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,14 @@ private[parquet] class

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-07-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, thanks for the thorough feedback! > What catalog operations we want to forward to the data source catalog? Currently it's create/drop/alter table, I think it's good eno

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198907669 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -359,6 +369,70 @@ class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198906232 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -359,6 +369,70 @@ class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904779 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -62,6 +98,30 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904504 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -62,6 +98,30 @@ private[parquet] class

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-06-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r198904089 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,22 @@ object SQLConf { .booleanConf

[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21623 Overall, I think this is close. The tests need to cover the row group stats case and we should update how configuration is passed to the filters. Thanks for working on this, @wangyum

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198553569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -22,16 +22,23 @@ import java.sql.Date

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198551889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +661,56 @@ class

[GitHub] spark pull request #21262: [SPARK-24172][SQL]: Push projection and filters o...

2018-06-26 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/21262 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-06-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, what needs to change to get this in? I'd like to start making more PRs based on these changes. --- - To unsubscribe

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198244664 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class

[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...

2018-06-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198230713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class

[GitHub] spark issue #21615: [SPARK-24552][core][sql] Use unique id instead of attemp...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21615 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197552309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21606 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197547079 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197543585 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging

[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21577 Thanks for fixing this, @vanzin! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542830 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542704 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197542014 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -42,15 +42,12 @@ *Usually

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197541490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -125,11 +124,11 @@ object

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21606#discussion_r197540970 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @vanzin, thanks for working on this. I was out most of this week at a conference and I'm still on just half time, which is why I was delayed. Sorry to leave you all waiting. I'll make comments

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 Yes, I just checked and speculative attempts do get a different TID. Just turn on speculation, run a large stage, and sort tasks in a stage by TID. There aren't duplicates

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21574 +1 (non-binding) assuming that tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196223209 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -106,7 +106,7 @@ case class

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196222500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @vanzin, the ID that this uses is the TID, which I thought was always unique. It appears to be a one-up counter. Also, I noted on your PR that both are needed because even if we only commit one

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196217742 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -399,7 +399,8 @@ private[spark] object JsonProtocol { ("Full

[GitHub] spark issue #21577: [SPARK-24552][core] Correctly identify tasks in output c...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21577 +1. This fixes the commit coordinator problem where two separate tasks can be authorized. That case could lead to duplicate data (if, for example, both tasks generated unique file names using

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196215961 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -399,7 +399,8 @@ private[spark] object JsonProtocol { ("Full

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196214944 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -131,16 +139,17 @@ private[spark] class

[GitHub] spark pull request #21577: [SPARK-24552][core] Correctly identify tasks in o...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196214788 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -131,16 +139,17 @@ private[spark] class

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 I think the right thing to do for this commit is to use the task ID instead of the stage-local attempt number. I've updated the PR with the change so I think this is ready to commit. @vanzin

[GitHub] spark issue #21558: [SPARK-24552][SQL] Use task ID instead of attempt number...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 @tgravescs, that's exactly what we're seeing. I think it might just be misleading to have a stage-local attempt ID although it is more friendly for users and matches what MR does

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196192774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-18 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21503 Thank you for reviewing this, @cloud-fan! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21574: [SPARK-24478][SQL][followup] Move projection and ...

2018-06-18 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21574#discussion_r196173414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,17 +23,24 @@ import

<    1   2   3   4   5   6   7   8   9   10   >