[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 @cloud-fan, what are you suggesting to use as a design? If you think this shouldn't mirror the read side, then let's be clear on what it should look like. Maybe that's a design doc, or maybe that's

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239889152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,52 +17,49 @@ package

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239888975 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239888795 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239613722 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239613088 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,52 +17,49 @@ package

[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 @cloud-fan, I see that this adds `Table` and uses `TableProvider`, but I was expecting this to also update the write side to mirror the read side, like PR #22190 for [SPARK-25188](https

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239598346 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239596456 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239581374 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239578059 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239559037 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Skips Python resource limit on Win...

2018-12-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 @HyukjinKwon, for the future, I should note that I'm not a committer so my +1 for a PR is not binding. I'm fairly sure @vanzin would +1 this commit as well, but it's best not to merge based on my

[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23208 Thanks for posting this PR @cloud-fan! I'll have a look in the next day or so. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Skips Python resource limit on Win...

2018-11-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 +1 with the latest changes. Thanks for taking care of this, @HyukjinKwon! Functionality is in two parts: changing the resource requests (which doesn't change) and limiting memory use

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r238046730 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237995405 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237984050 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-11-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @stczwd, I agree with @mccheah. Tables are basically named data sets. Whether they support batch, micro-batch streaming, or continuous streaming is determined by checking whether they implement

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237975013 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237974718 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237974410 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237973548 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237972742 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237972182 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237971241 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237971288 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r237971092 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23086 @cloud-fan, thanks for getting this done! I'll wait for the equivalent write-side PR. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23086 > I still do not think we should mix the catalog support with the data source APIs We are trying to keep these separate. `Table` is the only overlap between the two. If you prefer m

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237966188 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Skips Python resource limit on Win...

2018-11-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 +1 once the docs are updated to note that resource requests still include python memory, even in Windows. --- - To unsubscribe

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Skips Python resource limit...

2018-11-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237963488 --- Diff: docs/configuration.md --- @@ -190,6 +190,8 @@ of the most common options to set are: and it is up to the application to avoid exceeding

[GitHub] spark issue #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 Rebased on master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23086 +1 There are only minor suggestions left from me. I'd like to see the default implementation of `Table.name` removed, but I don't think that should block committing

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237670228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -22,86 +22,56 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237670099 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237668483 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21978#discussion_r237660050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala --- @@ -18,48 +18,106 @@ package org.apache.spark.sql.catalyst

[GitHub] spark pull request #21978: [SPARK-25006][SQL] Add CatalogTableIdentifier.

2018-11-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21978#discussion_r237585203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala --- @@ -18,48 +18,106 @@ package org.apache.spark.sql.catalyst

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-11-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @stczwd, thanks for taking a look at this. What are the differences between batch and stream DDL that you think will come up

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237179854 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -54,27 +53,17 @@ case class

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237178976 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -23,29 +23,28 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237176552 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -38,7 +38,7 @@ import org.apache.spark.sql.execution.datasources.jdbc

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237176100 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r237172065 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237169532 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix

[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23086 @cloud-fan, sorry to spread review comments over two days, but I've finished the first pass. Overall, it looks great. I think we can simplify a couple of areas, like all of the args passed

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236859358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -396,87 +392,66 @@ object SimpleReaderFactory extends

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236858793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -116,16 +116,20 @@ object

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236858449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -54,27 +53,17 @@ case class

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236858107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -23,29 +23,28 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236857220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -23,29 +23,28 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236856960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -23,29 +23,28 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236852153 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -170,15 +157,24 @@ object

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236850263 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236849290 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -40,8 +40,8 @@ import

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236844174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -38,7 +38,7 @@ import org.apache.spark.sql.execution.datasources.jdbc

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236823417 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236820896 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Batch.java --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236820065 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236819758 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236818511 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236816739 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236796331 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236491385 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236487464 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236480711 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236345625 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r235082191 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234691652 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-11-16 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22547 I agree that there is consensus for the proposal in the design doc and I don't think there are any blockers. If there's something I can do to help, please let me know. Otherwise ping me to review

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-16 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234286173 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-15 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234084002 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 Thanks for fixing this so quickly, @HyukjinKwon! I'd like a couple of changes, but overall it is going in the right direction. We should also plan on porting this to the 2.4 branch when

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-15 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080578 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-15 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080290 --- Diff: python/pyspark/worker.py --- @@ -268,9 +272,11 @@ def main(infile, outfile): # set up memory limits memory_limit_mb

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r231707076 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r231706583 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-11-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r230528510 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala --- @@ -46,17 +45,22 @@ import

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-10-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @felixcheung, we're waiting on more reviews and a community decision about how to pass partition transforms. For passing transforms, I think the most reasonable compromise is to go

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-19 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22547 @jose-torres, I don't mean that the primary purpose of the v2 API is for catalog integration, I mean that the primary use of v2 is with tables that are stored in some catalog. So we should make sure

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226798538 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Format.java --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226798213 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Format.java --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-19 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22547 After looking at the changes, I want to reiterate that request for a design doc. I think that code is a great way to prototype a design, but that we need to step back and make sure that the design

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226796934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -173,12 +185,17 @@ object

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226790252 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/InputStream.java --- @@ -17,14 +17,18 @@ package

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226789748 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -15,37 +15,43 @@ * limitations under the License

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226789610 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226785695 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/BatchScan.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226784919 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226783272 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -106,85 +107,96 @@ private[kafka010] class

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-19 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22547 @cloud-fan, is there a design doc that outlines these changes and the new API structure? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226782371 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala --- @@ -46,17 +45,22 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r226780862 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -169,15 +174,16 @@ object

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22501#discussion_r226765772 --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt --- @@ -1,117 +1,145 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-17 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22547 @cloud-fan, sorry to look at this so late, I was out on vacation for a little while. Is this about ready for review

[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

2018-10-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22573 @dongjoon-hyun, Iceberg schema evolution is based on the field IDs, not on names. The current table schema's names are the runtime names for columns in that table, and all reads happen by first

[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

2018-10-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22573 The approach we've taken in Iceberg is to allow `.` in names by using an index in the top-level schema. The full path of every leaf in the schema is produced and added to a map from the full field

[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22413 Thanks @MaxGekk, sorry for the original omission! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

  1   2   3   4   5   6   7   8   9   10   >