Jiayi Liu created SPARK-48241: --------------------------------- Summary: CSV parsing failure with char/varchar type columns Key: SPARK-48241 URL: https://issues.apache.org/jira/browse/SPARK-48241 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1 Reporter: Jiayi Liu Fix For: 4.0.0
CSV table containing char and varchar columns will result in the following error when selecting from the CSV table: {code:java} java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<id:int,name:string>) should be the subset of dataSchema (struct<id:int,name:string>). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} The reason for the error is that the StringType columns in the dataSchema and requiredSchema of UnivocityParser are not consistent. It is due to the metadata contained in the StringType StructField of the dataSchema, which is missing in the requiredSchema. We need to retain the metadata when resolving schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org