[ https://issues.apache.org/jira/browse/SPARK-23786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-23786. ----------------------------- Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 2.4.0 > CSV schema validation - column names are not checked > ---------------------------------------------------- > > Key: SPARK-23786 > URL: https://issues.apache.org/jira/browse/SPARK-23786 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Maxim Gekk > Assignee: Maxim Gekk > Priority: Major > Fix For: 2.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Here is a csv file contains two columns of the same type: > {code} > $cat marina.csv > depth, temperature > 10.2, 9.0 > 5.5, 12.3 > {code} > If we define the schema with correct types but wrong column names (reversed > order): > {code:scala} > val schema = new StructType().add("temperature", DoubleType).add("depth", > DoubleType) > {code} > Spark reads the csv file without any errors: > {code:scala} > val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv") > ds.show > {code} > and outputs wrong result: > {code} > +-----------+-----+ > |temperature|depth| > +-----------+-----+ > | 10.2| 9.0| > | 5.5| 12.3| > +-----------+-----+ > {code} > The correct behavior would be either output error or read columns according > its names in the schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org