[jira] [Created] (SPARK-48241) CSV parsing failure with char/varchar type columns

Jiayi Liu (Jira) Sat, 11 May 2024 00:47:03 -0700

Jiayi Liu created SPARK-48241:
---------------------------------

             Summary: CSV parsing failure with char/varchar type columns
                 Key: SPARK-48241
                 URL: https://issues.apache.org/jira/browse/SPARK-48241
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.1
            Reporter: Jiayi Liu
             Fix For: 4.0.0



CSV table containing char and varchar columns will result in the following 
error when selecting from the CSV table:
{code:java}
java.lang.IllegalArgumentException: requirement failed: requiredSchema 
(struct<id:int,name:string>) should be the subset of dataSchema 
(struct<id:int,name:string>).
    at scala.Predef$.require(Predef.scala:281)
    at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56)
    at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
    at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
    at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
The reason for the error is that the StringType columns in the dataSchema and 
requiredSchema of UnivocityParser are not consistent. It is due to the metadata 
contained in the StringType StructField of the dataSchema, which is missing in 
the requiredSchema. We need to retain the metadata when resolving schema.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48241) CSV parsing failure with char/varchar type columns

Reply via email to