[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

Yin Huai (JIRA) Wed, 26 Aug 2015 22:56:49 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716125#comment-14716125
 ]


Yin Huai commented on SPARK-10301:
----------------------------------

Seems this one is hard because at the executor side, we are actually using 
parquet file's schema to read data and parquet file's schema contains struct 
fields that do not appear in the global schema. 

For now, the workaround is to enable schema merge (set {{mergeSchema}} to true 
when load a parquet dataset), so the global schema is always the superset of 
the local schema. 

> For struct type, if parquet's global schema has less fields than a file's 
> schema, data reading will fail
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10301
>                 URL: https://issues.apache.org/jira/browse/SPARK-10301
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Critical
>
> When parquet's global schema has less number of fields than the local schema 
> of a file, the data reading path will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

Reply via email to