[jira] [Assigned] (SPARK-26372) CSV parsing uses previous good value for bad input field

Apache Spark (JIRA) Fri, 14 Dec 2018 15:05:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-26372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-26372:
------------------------------------

    Assignee:     (was: Apache Spark)

> CSV parsing uses previous good value for bad input field
> --------------------------------------------------------
>
>                 Key: SPARK-26372
>                 URL: https://issues.apache.org/jira/browse/SPARK-26372
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> For example:
> {noformat}
> bash-3.2$ cat test.csv 
> "hello",1999-08-01
> "there","bad date"
> "again","2017-11-22"
> bash-3.2$ bin/spark-shell
> ..etc..
> scala> import org.apache.spark.sql.types._
> scala> import org.apache.spark.sql.SaveMode
> scala> var schema = StructType(StructField("col1", StringType) ::
>      |   StructField("col2", DateType) ::
>      |   Nil)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(col1,StringType,true), StructField(col2,DateType,true))
> scala> val df = spark.read.schema(schema).csv("test.csv")
> df: org.apache.spark.sql.DataFrame = [col1: string, col2: date]
> scala> df.show
> +-----+----------+                                                            
>   
> | col1|      col2|
> +-----+----------+
> |hello|1999-08-01|
> |there|1999-08-01|
> |again|2017-11-22|
> +-----+----------+
> scala> 
> {noformat}
> col2 from the second row contains "1999-08-01", when it should contain null.
> This is because UnivocityParser reuses the same Row object for each input 
> record. If there is an exception converting an input field, the code simply 
> skips over that field, leaving the existing value in the Row object.
> The simple fix is to set the column to null in the Row object whenever there 
> is a badRecordException while converting the input field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26372) CSV parsing uses previous good value for bad input field

Reply via email to