[GitHub] spark pull request #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupp...

2018-08-01 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21946

 [SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema

## What changes were proposed in this pull request?

Regarding user-specified schema, data sources may have 3 different 
behaviors:
1. must have a user-specified schema
2. can't have a user-specified schema
3. can accept the user-specified if it's given, or infer the schema.

I added `ReadSupportWithSchema` to support these behaviors, following data 
source v1. But it turns out we don't need this extra interface. We can just add 
a `createReader(schema, options)` to `ReadSupport` and make it call 
`createReader(options)` by default.

The github currently has a problem with syncing the apache git repo, please 
review the second commit.

TODO: also fix the streaming API in followup PRs.

## How was this patch tested?

existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark ds-schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21946


commit defc54c69aadc510c6f77e13e57f003646c461bc
Author: Wenchen Fan 
Date:   2018-08-01T13:39:35Z

[SPARK-24971][SQL] remove SupportsDeprecatedScanRow

## What changes were proposed in this pull request?

This is a follow up of https://github.com/apache/spark/pull/21118 .

In https://github.com/apache/spark/pull/21118 we added 
`SupportsDeprecatedScanRow`. Ideally data source should produce `InternalRow` 
instead of `Row` for better performance. We should remove 
`SupportsDeprecatedScanRow` and encourage data sources to produce 
`InternalRow`, which is also very easy to build.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #21921 from cloud-fan/row.

commit 19808d500a869114d84383f23056483316e52a33
Author: Wenchen Fan 
Date:   2018-07-31T18:06:16Z

merge ReadSupport and ReadSupportWithSchema




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupp...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21946


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org