[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209041811 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209041367 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209039505 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,79 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209038600 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209023367 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -23,8 +23,9 @@ * The base interface for data source v2

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209022769 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209022127 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -27,10 +27,10 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209020054 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016955 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016366 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -367,6 +367,7 @@ case class AppendData

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016303 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209016101 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209015906 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209015928 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r209014458 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208994951 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208993025 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208992737 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208991938 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22043#discussion_r208989784 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 > So if users don't set this conf, the behavior is the same as before, right? Yes. > My concern is that meaning of the overhead parameter becomes pretty confusing. I

[GitHub] spark issue #22043: [SPARK-24251][SQL] Add analysis tests for AppendData.

2018-08-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22043 @cloud-fan, here are tests to validate the analysis of AppendData logical plans. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-08 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/22043 [SPARK-24251][SQL] Add analysis tests for AppendData. ## What changes were proposed in this pull request? This is a follow-up to #21305 that adds a test suite for AppendData analysis

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @squito, this is much more clear for our user base. Right now, they can control the YARN container allocation to make room for python by increasing the overhead, but that does nothing to actually

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208642771 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java --- @@ -29,24 +28,24 @@ * provide data writing

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208642275 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java --- @@ -29,24 +28,24 @@ * provide data writing

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 @cloud-fan, when do you think we can get this in? It doesn't need to go in 2.4 because it doesn't change any read or write paths -- nothing uses CatalogTableIdentifier yet -- but it would be great

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 Thanks for reviewing, @cloud-fan! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208638600 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -93,21 +81,17 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208638252 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -39,52 +36,43 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208637663 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208636636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208635572 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208635227 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208634657 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208423440 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Yes, this is for YARN only. I've also opened follow-up issues for Mesos and Kubernetes integration. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208390359 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208390264 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208389947 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @gatorsmile, I started [YarnPySparkSuite](https://gist.github.com/rdblue/9848a00f49eaad6126fbbcfa1b039e19) but the YARN tests don't create python worker processes so the tests don't work. I need

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208384141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -80,17 +80,17 @@ object

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208383579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -93,21 +81,17 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208383098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -39,52 +36,43 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208380370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208380139 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208373784 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208373089 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208372512 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208371927 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208370532 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208370391 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208368798 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208348226 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportStatistics.java --- @@ -20,18 +20,18 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208347697 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsPushDownRequiredColumns.java --- @@ -21,22 +21,25 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208345467 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java --- @@ -18,22 +18,16 @@ package

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208344984 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java --- @@ -18,22 +18,16 @@ package

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208344510 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java --- @@ -18,22 +18,16 @@ package

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208343665 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadSupport.java --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208342404 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java --- @@ -0,0 +1,66 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208341897 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java --- @@ -0,0 +1,66 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208340913 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/BatchReadSupport.java --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208340329 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/BatchReadSupport.java --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r208339172 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208338263 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java --- @@ -29,24 +28,24 @@ * provide data writing

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208337966 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,32 +21,32 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208337773 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java --- @@ -29,24 +28,24 @@ * provide data writing

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208337008 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -39,7 +52,7 @@ * @param options

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208337291 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java --- @@ -45,8 +51,8 @@ * @param options

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208336624 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java --- @@ -20,23 +20,29 @@ import java.util.Optional

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208336499 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -23,8 +23,9 @@ * The base interface for data source v2

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208335865 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -20,17 +20,30 @@ import java.util.Optional

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208335523 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -20,17 +20,30 @@ import java.util.Optional

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208334273 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchReadSupportProvider.java --- @@ -19,18 +19,18 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208333413 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -20,17 +20,30 @@ import java.util.Optional

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208333129 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,32 +21,32 @@ import

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Okay, workers are tracked in `PythonWorkerFactory`. Its `create` method returns an idle worker if one is available. When a task finishes, it calls `SparkEnv.releasePythonWorker` that calls

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Posting this as a comment instead of in a thread so it doesn't get lost. In response to @holdenk's question about memory allocation to workers: That `useDaemon` flag controls whether

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r208310790 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,26 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r208305810 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala --- @@ -44,16 +46,16 @@ class MemorySinkV2Suite extends

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-07 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r208302473 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,26 @@ private[spark] object PythonEvalType

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 @cloud-fan, that's fine with me since #17185 is already merged. Would this conflict with #17185? We can just add a case that detects whether the first identifier in the seq is a catalog when

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I've rebased and updated with the requested change to disallow missing columns, even if they're optional. Thanks for reviewing

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22009 There must not be one. I thought you'd already started a PR, my mistake. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22009 Does this replace the other PR? I haven't looked at that one yet. If this is ready to review and follows the doc, I can review

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r208091782 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,26 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r208090428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -336,4 +337,124 @@ object DataType { case (fromDataType

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r208090280 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala --- @@ -0,0 +1,395 @@ +/* + * Licensed

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I've added a suite for the `DataType.canWrite`. I still need to add tests for the analyzer rule to make sure it catches any problems and so to validate that AppendData's `resolved` check

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207752632 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -336,4 +337,97 @@ object DataType { case (fromDataType

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207747975 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207725400 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala --- @@ -44,16 +46,16 @@ class MemorySinkV2Suite extends

[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21948 +1 when tests are passing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207725389 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -89,7 +89,8 @@ class

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Anyone able to reproduce the YARN test failures? When I run the tests, a different set fails and seems unrelated to python. This shouldn't change the behavior of any tests because the environment

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207723244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -336,4 +337,97 @@ object DataType { case (fromDataType

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207722434 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207722412 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207722402 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -133,10 +133,17 @@ private[yarn] class

<    1   2   3   4   5   6   7   8   9   10   >