[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21948 @cloud-fan, thanks for documenting the behavior and removing the default copy. I had a couple of questions, but I think it is close

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207722363 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala --- @@ -44,16 +46,16 @@ class MemorySinkV2Suite extends

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207722340 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -89,7 +89,8 @@ class

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 FYI @jzhuge --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @holdenk, I attempted to write a YARN unit test for this, but evidently the MiniYARNCluster doesn't run python workers. The task is run, but a worker is never started. If you have any idea how

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207690569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -81,6 +82,17 @@ case class AggregateInPandasExec

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207637946 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207636504 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -51,6 +52,17 @@ private[spark] class PythonRDD( val bufferSize

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207635841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -81,6 +82,17 @@ case class AggregateInPandasExec

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @ifilonenko, I opened follow-up SPARK-25021 for adding the PySpark memory allocation to Kubernetes. @mccheah, I opened follow-up SPARK-25022 for Mesos

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207631295 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -333,7 +340,7 @@ private[spark] class Client( val

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207630342 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -114,6 +114,10 @@ package object config { .checkValue(_ >

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207629997 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207624290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,98 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207623469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -352,6 +351,36 @@ case class Join

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207623351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -352,6 +351,36 @@ case class Join

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207591742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -336,4 +337,97 @@ object DataType { case (fromDataType

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207590794 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,98 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207395603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -352,6 +351,36 @@ case class Join

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 @gatorsmile and @cloud-fan, this adds catalog to `TableIdentifier` in preparation for multi-catalog support. `TableIdentifier` continues to work as-is to ensure that there are no behavior changes

[GitHub] spark pull request #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-02 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21978 SPARK-25006: Add CatalogTableIdentifier. ## What changes were proposed in this pull request? This adds `CatalogTableIdentifier`, which is an identifier that consists of a triple: catalog

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @holdenk, can you help review this since it is related to PySpark? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-02 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21977 SPARK-25004: Add spark.executor.pyspark.memory limit. ## What changes were proposed in this pull request? This adds `spark.executor.pyspark.memory` to configure Python's address space

[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21948 I'm changing my +1 to -1 because read-side changes are mixed in and because copies are the responsibility of data sources if they buffer and hold references to earlier rows

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207295461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -89,8 +89,7 @@ class

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207294283 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -50,4 +50,15 @@ *this ID

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207293465 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -50,4 +50,15 @@ *this ID

[GitHub] spark issue #21911: [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL...

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21911 I'd like to fix the AnalysisError message and I noted one small nit in the tests. +1 when the AnalysisError message is fixed

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce and Repartition Hint ...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r207285870 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveHintsSuite.scala --- @@ -17,15 +17,25 @@ package

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce and Repartition Hint ...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r207285266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala --- @@ -102,6 +104,35 @@ object ResolveHints

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I'll fix the conflicts and re-run tests. Yesterday's tests passed after I updated for your feedback. I'd like to try to get this in soon because it is taking so much time to resolve

[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21946 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043344 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207015951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21921 Yeah, I'd say that if it isn't documented then lets go with the usually RTC conventions. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21948 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207006871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -89,8 +89,7 @@ class

[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21946 Yeah, I'm fine with this, then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206979097 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java --- @@ -38,15 +38,16 @@ * If this method fails (by throwing

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206978690 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java --- @@ -38,15 +38,16 @@ * If this method fails (by throwing

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206978506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -352,6 +351,36 @@ case class Join

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206978261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206977289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206976856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer

[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21946 @cloud-fan, from your comment around the same time as mine, it sounds like the confusion may just be in how you're updating the current API to the proposed one. Can you post a migration plan

[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21946 Isn't this unnecessary after the API redesign? For the redesign, the `DataSourceV2` or a `ReadSupportProvider` will supply a `create` method (or `anonymousTable`) to return a `Table

[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21921 @cloud-fan, @gatorsmile, I'm fine with that if it's documented somewhere. I wasn't aware of that convention and no one brought it up the last time I pointed out commits without a committer +1

[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21921 This looks fine other than the possibly unnecessary cast. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206946453 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -416,12 +405,12 @@ class AdvancedDataSourceV2 extends

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206946335 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -121,17 +121,6 @@ class DataSourceV2Suite extends

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206946259 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamMicroBatchReader.scala --- @@ -169,7 +170,7 @@ class

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206946076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -91,7 +90,7 @@ class

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206936701 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -456,57 +445,20 @@ class AdvancedInputPartition(start: Int

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206936422 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -416,12 +405,12 @@ class AdvancedDataSourceV2 extends

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206936202 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -121,17 +121,6 @@ class DataSourceV2Suite extends

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206935616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamMicroBatchReader.scala --- @@ -169,7 +170,7 @@ class

[GitHub] spark pull request #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanR...

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21921#discussion_r206935176 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -91,7 +90,7 @@ class

[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21921 @cloud-fan, I thought it was a requirement to have a committer +1 before merging. Or is this [list of committers](https://spark.apache.org/committers.html) out of date

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I'll look into the test failures tomorrow, but this has been passing tests for weeks so I think it is still safe to review when you have time. We can fix both in parallel so that we can

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r206741523 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -660,6 +660,62 @@ class PlanParserSuite extends

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r206741178 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -332,7 +332,7 @@ resource ; queryNoWith

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r206740826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala --- @@ -102,6 +104,39 @@ object ResolveHints

[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r206689783 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala --- @@ -102,6 +104,39 @@ object ResolveHints

[GitHub] spark issue #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21911 @jzhuge, I think it is confusing that this hint exposes the `shuffle` boolean flag. The Spark API makes a clear distinction between `repartition` and `coalesce` where `coalesce` means that Spark

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, @gatorsmile, this has been ready for final review for a while. Do you think you'll have some time to look

[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-07-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 #21888 shows how this is used to implement DELETE FROM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 #21888 demonstrates how to add a `TableIdentifier` with a catalog element, `CatalogTableIdentifier` and how to safely migrate from the old identifier to the new one with catalog

[GitHub] spark pull request #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for ...

2018-07-26 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21888 [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tables ## What changes were proposed in this pull request? This adds support for DELETE FROM in SQL using the new DeleteFrom logical

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-07-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r205571827 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2018-07-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/12313 This is addressed by #21305. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #12313: [SPARK-14543] [SQL] Improve InsertIntoTable colum...

2018-07-26 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/12313 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21237: [SPARK-23325][WIP] Test parquet returning internal row

2018-07-26 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21237 This is no longer needed. #21118 fixes the copy problem by always inserting a projection that copies, but delaying until after filters are run

[GitHub] spark pull request #21237: [SPARK-23325][WIP] Test parquet returning interna...

2018-07-26 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/21237 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-25 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21877 @cloud-fan, @gatorsmile, @marmbrus, this PR demonstrates how plans would use the catalog changes introduced in #21306. To see the changes, you may want to look at just the last commit because

[GitHub] spark pull request #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS an...

2018-07-25 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21877 [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS support for DataSourceV2 ## What changes were proposed in this pull request? * Remove extends from `ReadSupport` and `WriteSupport

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-07-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r205237682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -609,6 +611,12 @@ class SparkSession private( */ @transient

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @marmbrus, @cloud-fan, @gatorsmile, I've updated this PR to use reflection to instantiate catalogs. This allows implementations to provide named catalogs (and reuse implementations) and configure

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, @gatorsmile, is it possible to get this in for 2.4? This validates writes to data source tables so I think it is a good one to have

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Thanks for reviewing and merging @cloud-fan, @gatorsmile, @felixcheung! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for table cat...

2018-07-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, @gatorsmile, I don't think this should be merged yet. I've been implementing CTAS and RTAS based on this commit and I don't think it makes sense to get a `TableCatalog` instance

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 @cloud-fan, any update on merging this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-07-20 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Rebased on master to fix conflicts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21118: SPARK-23325: Use InternalRow when reading with Da...

2018-07-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21118#discussion_r204150008 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala --- @@ -53,7 +54,7 @@ class

[GitHub] spark pull request #21118: SPARK-23325: Use InternalRow when reading with Da...

2018-07-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21118#discussion_r204149889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -125,16 +125,13 @@ object

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-17 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I've rebased this so it is ready for final review when you get a chance. Thanks! --- - To unsubscribe, e-mail

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21556 I misunderstood how it was safe as well. It was Yuming's clarification that helped. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21741: [SPARK-24718][SQL] Timestamp support pushdown to parquet...

2018-07-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21741 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...

2018-07-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202451540 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...

2018-07-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202451374 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21556 +1, I think this looks ready to go. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21556 @HyukjinKwon, even if the values are null, the makeEq function only casts null to Java Integer so the handling is still safe. It just looks odd that `null.asInstanceOf[JInt]` is safe. Thanks

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202448032 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -37,41 +39,64 @@ import

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202447544 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1

[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-12 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21556 @wangyum, can you explain what was happening with the `decimal(9,2)` benchmark more clearly? I asked additional questions, but the thread is on a line that changed so it's collapsed by default

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202093955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class

<    1   2   3   4   5   6   7   8   9   10   >