[ https://issues.apache.org/jira/browse/SPARK-19228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-19228: --------------------------------- Labels: (was: easyfix) > inferSchema function processed csv date column as string and "dateFormat" > DataSource option is ignored > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-19228 > URL: https://issues.apache.org/jira/browse/SPARK-19228 > Project: Spark > Issue Type: Bug > Components: Input/Output, SQL > Affects Versions: 2.1.0 > Reporter: Sergey Rubtsov > Priority: Major > Original Estimate: 6h > Remaining Estimate: 6h > > Current FastDateFormat parser can't properly parse date and timestamp and > does not meet the ISO8601. > For example, I need to process user.csv like this: > {code:java} > id,project,started,ended > sergey.rubtsov,project0,12/12/2012,10/10/2015 > {code} > When I add date format options: > {code:java} > Dataset<Row> users = spark.read().format("csv").option("mode", > "PERMISSIVE").option("header", "true") > .option("inferSchema", > "true").option("dateFormat", > "dd/MM/yyyy").load("src/main/resources/user.csv"); > users.printSchema(); > {code} > expected scheme should be > {code:java} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: date (nullable = true) > |-- ended: date (nullable = true) > {code} > but the actual result is: > {code:java} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: string (nullable = true) > |-- ended: string (nullable = true) > {code} > This mean that date processed as string and "dateFormat" option is ignored. > If I add option > {code:java} > .option("timestampFormat", "dd/MM/yyyy") > {code} > result is: > {code:java} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: timestamp (nullable = true) > |-- ended: timestamp (nullable = true) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org