[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812059#comment-17812059 ]
Daniel commented on SPARK-46890: -------------------------------- I think this `tokenIndexArr` within Spark's `UnivocityParser` class has different values in the passing and failing cases: {code:java} // This index is used to reorder parsed tokens private val tokenIndexArr = requiredSchema.map(f => java.lang.Integer.valueOf(dataSchema.indexOf(f))).toArray {code} > CSV fails on a column with default and without enforcing schema > --------------------------------------------------------------- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0 > Reporter: Max Gekk > Priority: Major > Attachments: image-2024-01-29-13-22-05-326.png > > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org