[
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-48241.
---------------------------------
> CSV parsing failure with char/varchar type columns
> --------------------------------------------------
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.1
> Reporter: Jiayi Liu
> Assignee: Jiayi Liu
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema
> (struct<id:int,name:string>) should be the subset of dataSchema
> (struct<id:int,name:string>).
> at scala.Predef$.require(Predef.scala:281)
> at
> org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56)
> at
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
> at
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
> at
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and
> requiredSchema of UnivocityParser are not consistent. It is due to the
> metadata contained in the StringType StructField of the dataSchema, which is
> missing in the requiredSchema. We need to retain the metadata when resolving
> schema.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]