[ https://issues.apache.org/jira/browse/SPARK-19228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309509#comment-16309509 ]
Apache Spark commented on SPARK-19228: -------------------------------------- User 'sergey-rubtsov' has created a pull request for this issue: https://github.com/apache/spark/pull/20140 > inferSchema function processed csv date column as string and "dateFormat" > DataSource option is ignored > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-19228 > URL: https://issues.apache.org/jira/browse/SPARK-19228 > Project: Spark > Issue Type: Bug > Components: Input/Output, SQL > Affects Versions: 2.1.0 > Reporter: Sergey Rubtsov > Labels: easyfix > Original Estimate: 6h > Remaining Estimate: 6h > > I need to process user.csv like this: > {code} > id,project,started,ended > sergey.rubtsov,project0,12/12/2012,10/10/2015 > {code} > When I add date format options: > {code} > Dataset<Row> users = spark.read().format("csv").option("mode", > "PERMISSIVE").option("header", "true") > .option("inferSchema", > "true").option("dateFormat", > "dd/MM/yyyy").load("src/main/resources/user.csv"); > users.printSchema(); > {code} > expected scheme should be > {code} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: date (nullable = true) > |-- ended: date (nullable = true) > {code} > but the actual result is: > {code} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: string (nullable = true) > |-- ended: string (nullable = true) > {code} > This mean that date processed as string and "dateFormat" option is ignored. > If I add option > {code} > .option("timestampFormat", "dd/MM/yyyy") > {code} > result is: > {code} > root > |-- id: string (nullable = true) > |-- project: string (nullable = true) > |-- started: timestamp (nullable = true) > |-- ended: timestamp (nullable = true) > {code} > I think, the issue is somewhere in object CSVInferSchema, function > inferField, lines 80-97 and > method "tryParseDate" need to be added before/after "tryParseTimestamp", or > date/timestamp process logic need to be changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org