[ https://issues.apache.org/jira/browse/SPARK-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon closed SPARK-16842. -------------------------------- Resolution: Not A Problem > Concern about disallowing user-given schema for Parquet and ORC > --------------------------------------------------------------- > > Key: SPARK-16842 > URL: https://issues.apache.org/jira/browse/SPARK-16842 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > > If my understanding is correct, > If the user-given schema is different with the inferred schema, it is handled > differently for each datasource. > - For JSON and CSV > it is kind of permissive generally (for example, compatibility among > numeric types). > - For ORC and Parquet > Generally it is strict to types. So they don't allow the compatibility > (except for very few cases, e.g. for Parquet, > https://github.com/apache/spark/pull/14272 and > https://github.com/apache/spark/pull/14278) > - For Text > it only supports {{StringType}}. > - For JDBC > it does not take user-given schema since it does not implement > {{SchemaRelationProvider}}. > By allowing the user-given schema, we can use some types such as {{DateType}} > and {{TimestampType}} for JSON and CSV. CSV and JSON allow arguably > permissive schema. > To cut this short, JSON and CSV do not have the complete schema information > written in the data whereas Orc and Parquet do. > So, we might have to just disallow giving user-given schema for Parquet and > Orc. Actually, we can't give a different schema for Orc and Parquet almost at > all times if my understanding it correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org