[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166394747 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166386661 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, this is a single commit on purpose because predicate push-down makes plan changes. I think it's best to do these at once to avoid unnecessary work. That's why I started looking more

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166381800 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166360278 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object

[GitHub] spark pull request #20495: [SPARK-23327] [SQL] Update the description and te...

2018-02-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20495#discussion_r166358420 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1655,15 +1655,17 @@ case class Left(str

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166165255 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166165005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20490 I've updated this to no longer require #20387. It wasn't relying on those changes at all. @gatorsmile, @cloud-fan, what do you think about getting this into 2.3.0

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166081367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > For safety, I wanna keep it unchanged, and start something new for data source v2 only. I disagree. * **#20476 addresses a bug caused by the new implementat

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-05 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166030958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > Why pushdown is happening in logical optimization and not during query planning. My first instinct would be to have the optimizer get operators as close to the leaves as possible and then f

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Add support for commit coordinator f...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20490 @dongjoon-hyun, @cloud-fan, @gatorsmile. Once the immutable plan PR is in, this can be reviewed. @steveloughran, I think this is what you were asking

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Add support for commit coordi...

2018-02-02 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/20490 [SPARK-23323][SQL]: Add support for commit coordinator for DataSourceV2 writes ## What changes were proposed in this pull request? DataSourceV2 batch writes should use the output commit

[GitHub] spark pull request #20488: [SPARK-23321][SQL]: Validate datasource v2 writes

2018-02-02 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/20488 [SPARK-23321][SQL]: Validate datasource v2 writes ## What changes were proposed in this pull request? DataSourceV2 does not currently apply any validation rules when writing. Other write

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, @dongjoon-hyun, @gatorsmile, I've rebased this on master and removed the support for SPARK-23204 that parses table identifiers. If you need other changes to get this in, let me know

[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-02 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r165728696 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -36,11 +38,14 @@ import

[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20485 Sounds fine to me, then. My focus is on the long-term design issues. I still think that the changes to make plans immutable and to use the existing push-down code as much as possible

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, I'll update this PR and we can talk about passing configuration on the dev list. And as a reminder, please close #20445

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I tried and can't figure out how to do it with PhysicalOperation, that's why I build something new for data source v2 pushdown. The problem is that we should get DSv2 working independen

[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20485 To be clear, the purpose of this commit, like #20476, is just to get something working for the 2.3.0 release? I just want to make sure since I think we should be approaching these problems

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 Yeah, I did review it, but at the time I wasn't familiar with how the other code paths worked and assumed that it was necessary to introduce this. I wasn't very familiar with how it *should* work

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @gatorsmile, Do you mean this? > Extensibility is not good and operator push-down capabilities are limited. If so, that's very open to interpretation. I would assume it me

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @gatorsmile, thanks for the context. If we need to redesign push-down, then I think we should do that separately and with a design plan. **I don't think it's a good idea to bundle

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20466 +1 Good to get this in before changes to the relation. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @cloud-fan, @gatorsmile, this PR demonstrates why we should use PhysicalOperation. I ported the tests from this PR over to our branch and they pass without modifying the push-down code. That's

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @dongjoon-hyun, @gatorsmile, could you guys weigh in on some this discussion? I'd like to get additional perspectives on the changes I'm proposing

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > Let's keep it general and let the data source to interprete it. I think this is the wrong approach. The reason why we are using a special `DataSourceOptions` object is to ensure that d

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, to your point about push-down order, I'm not saying that order doesn't matter at all, I'm saying that the push-down can run more than once and it should push the closest operators

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > `spark.read.format("iceberg").table("db.table").load()` I'm fine with this if you think it is confusing to parse the path as a table name in load. I think it is r

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @felixcheung, yes, we do already have a `table` option. That creates an `UnresolvedRelation` with the parsed table name as a `TableIdentifier`, which is not currently compatible with `DataSourceV2

[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20454 +1 I'd rather not add features without a known use case, but this implementation looks good to me. --- - To unsubscribe

[GitHub] spark pull request #20454: [SPARK-23202][SQL] Add new API in DataSourceWrite...

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20454#discussion_r165141464 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -62,6 +62,15 @@ */ DataWriterFactory

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165138574 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark issue #20448: [SPARK-23203][SQL] make DataSourceV2Relation immutable

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20448 @cloud-fan, **please close this PR**. There is already a pull request for these changes, #20387, and ongoing discussion there. If you want the proposed implementation to change, please ask

[GitHub] spark pull request #20448: [SPARK-23203][SQL] make DataSourceV2Relation immu...

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20448#discussion_r165132718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,36 +17,84 @@ package

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165131292 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165121965 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165119560 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -63,32 +68,42 @@ DataWriterFactory

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165119427 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-31 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165117779 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20386 > I assume this API is necessary . . . it sounds reasonable to provide a callback for task commit. I agree it sounds reasonable, but we shouldn't add methods to a new API blin

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20386 @gengliangwang, what is the use case supported by this? In other words, how is `onTaskCommit(taskCommit: TaskCommitMessage)` currently used that requires this change? In general, I'm more

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164909970 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -63,32 +68,42 @@ DataWriterFactory

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164909653 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164909225 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164908529 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -63,32 +68,42 @@ DataWriterFactory

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164907626 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164907397 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20386 @cloud-fan, is the intent to get this into 2.3.0? If so, I'll make time to review it today. --- - To unsubscribe, e-mail

[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164807449 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session

[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...

2018-01-30 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164806676 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session

[GitHub] spark pull request #20427: [SPARK-23260][SQL] remove V2 from the class name ...

2018-01-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164621094 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > It's hard to improve PhysicalOperation to support more operators and specific push down orders, so I created the new one I'm concerned about the new one. The projection support se

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > [The push-down rule may be run more than once if filters are not pushed through projections] looks weird, do you have a query to reproduce this issue? One of the DataSourceV2 tests

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'd suggest that we just propogate the paths parameter to options, and data source implementations are free to interprete the path option to whatever they want, e.g. table and database na

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > I'm ok to make it immutable if there is an significant benefit. Mutable nodes violate a basic assumption of catalyst, that trees are immutable. Here's a good quote from the SIGMOD pa

[GitHub] spark pull request #20427: [SPARK-23260][SQL] remove V2 from the class name ...

2018-01-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164544288 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session

[GitHub] spark pull request #20427: [SPARK-23260][SQL] remove V2 from the class name ...

2018-01-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164543216 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java --- @@ -18,23 +18,23 @@ package org.apache.spark.sql.sources.v2

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20397 This is more confusing, not less. Look at @jiangxb1987's comment above: "We shall create only one DataReaderFactory, and have that create multiple data readers." It is not clear why the AP

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20397 > I think the renaming is worth to remove future confusions. What future confusion? I understand that the difference isn't obvious, but making the names less accurate isn't a g

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20397 One last point: should significant changes to public APIs like this go in just before or just after a release? 2.3.0 candidates have used ReadTask up to now

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-01-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20397 @cloud-fan, thanks for pinging me on this. -1: I don't think there's a compelling benefit to justify this change, and I think it makes the API more confusing. I think we should revert

[GitHub] spark pull request #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Us...

2018-01-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r164169060 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,15 +17,149 @@ package

[GitHub] spark pull request #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Us...

2018-01-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r163913161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,15 +17,149 @@ package

[GitHub] spark pull request #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Us...

2018-01-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r163909751 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,15 +17,149 @@ package

[GitHub] spark pull request #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Us...

2018-01-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r163908263 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,15 +17,149 @@ package

[GitHub] spark issue #20387: [SPARK-23203][SPARK-23204][SQL]: DataSourceV2: Use immut...

2018-01-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 @cloud-fan, please have a look at these changes. This will require follow-up for the Streaming side. I have yet to review the streaming interfaces for `DataSourceV2`, so I haven't made any changes

[GitHub] spark pull request #20387: SPARK-22386: DataSourceV2: Use immutable logical ...

2018-01-24 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/20387 SPARK-22386: DataSourceV2: Use immutable logical plans. ## What changes were proposed in this pull request? DataSourceV2 should use immutable catalyst trees instead of wrapping a mutable

[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2018-01-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19861 Thanks for the example, @cloud-fan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2018-01-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19861 @jiangxb1987, I understand what this does. I just wanted an example use case where it was necessary. What was the motivating *use case

[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2018-01-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19861 @jiangxb1987, @cloud-fan, what was the use case you needed to add this for? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...

2018-01-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20201 @cloud-fan, please ping me to review PRs for DataSourceV2. Our new table format uses it and we're preparing some changes, so I want to make sure we're heading in the same direction

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-11-28 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/19568 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-11-28 Thread rdblue
GitHub user rdblue reopened a pull request: https://github.com/apache/spark/pull/19568 SPARK-22345: Fix sort-merge joins with conditions and codegen. ## What changes were proposed in this pull request? This adds a joined row to sort-merge join codegen. That joined row

[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

2017-11-28 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19568 @gatorsmile, I think it would be better to fix codegen than to prevent it from happening with an assertion. If `CodegenFallback` can produce fallback code, why not allow it to when necessary

[GitHub] incubator-toree pull request #147: Support for --spark-context-initializatio...

2017-11-20 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/incubator-toree/pull/147#discussion_r152119598 --- Diff: kernel/src/main/scala/org/apache/toree/boot/CommandLineOptions.scala --- @@ -193,6 +199,19 @@ class CommandLineOptions(args: Seq[String

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148875607 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34 @@ /** * Creates

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148849054 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -36,14 +36,24 @@ /** * The preferred locations

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148848790 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -34,11 +35,17 @@ /** * Proceed to next

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-03 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148848619 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java --- @@ -30,6 +30,9 @@ /** * Creates a {@link

[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

2017-10-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19568 @DonnyZone, I don't know of any cases that use codgen after the fix for `CodegenFallback`, but I think this is still a good idea. If Spark is going to generate code, it should generate

[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-10-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/13206 We still maintain a version of this for our Spark builds to avoid an extra sort in Hive. If someone is willing to review it, I can probably find the time to rebase it on master. I think the year

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r147471530 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala --- @@ -124,7 +125,8 @@ class InnerJoinSuite extends

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r147471054 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala --- @@ -228,6 +230,27 @@ class InnerJoinSuite extends

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-26 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r14733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -585,21 +585,26 @@ case class SortMergeJoinExec

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r146975643 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -585,21 +585,26 @@ case class SortMergeJoinExec

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r146974005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -585,21 +585,26 @@ case class SortMergeJoinExec

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r146970055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -615,6 +620,7 @@ case class SortMergeJoinExec

[GitHub] spark pull request #17813: [SPARK-20540][CORE] Fix unstable executor request...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/17813#discussion_r146939350 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -589,8 +605,18 @@ class

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r146928894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -585,21 +585,26 @@ case class SortMergeJoinExec

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-25 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19568#discussion_r146928318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -615,6 +620,7 @@ case class SortMergeJoinExec

[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

2017-10-24 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19568 @dongjoon-hyun, yes, I'm currently working on it. I just wanted to get the rest up. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

2017-10-24 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/19568 SPARK-22345: Fix sort-merge joins with conditions and codegen. ## What changes were proposed in this pull request? This adds a joined row to sort-merge join codegen. That joined row is used

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19448 I have a lot of sympathy for the argument that infrastructure software shouldn't have too many backports and that those should be generally bug fixes. But, if I were working on a Spark distribution

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144388430 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-12 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19448 Still +1 from me as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144331909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144097061 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache

<    4   5   6   7   8   9   10   11   12   13   >