[ 
https://issues.apache.org/jira/browse/SPARK-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801393#comment-16801393
 ] 

Dongjoon Hyun commented on SPARK-27269:
---------------------------------------

Hi, [~Gengliang.Wang].
 - If this is a regression, this should have a type `BUG`.
{quote}This is actually a regression.
{quote}
 - If you want to put this under your umbrella JIRA, please put it there. Then, 
it will have `subtask` as you want.

> File source v2 should validate data schema only
> -----------------------------------------------
>
>                 Key: SPARK-27269
>                 URL: https://issues.apache.org/jira/browse/SPARK-27269
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> Currently, File source v2 allows each data source to specify the supported 
> data types by implementing the method `supportsDataType` in `FileScan` and 
> `FileWriteBuilder`.
> However, in the read path, the validation checks all the data types in 
> `readSchema`, which might contain partition columns.  This is actually a 
> regression. E.g. Text data source only supports String data type, while the 
> partition columns can still contain Integer type since partition columns are 
> processed by Spark.
> This PR is to:
> 1. Refactor schema validation and check data schema only
> 2. Filter the partition columns in data schema if user specified schema 
> provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to