[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-48241: ----------------------------------- Labels: pull-request-available (was: ) > CSV parsing failure with char/varchar type columns > -------------------------------------------------- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.1 > Reporter: Jiayi Liu > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct<id:int,name:string>) should be the subset of dataSchema > (struct<id:int,name:string>). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org