[ https://issues.apache.org/jira/browse/SPARK-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538051#comment-14538051 ]
Kalle Jepsen commented on SPARK-7278: ------------------------------------- Shouldn't {{DateType}} at least find {{datetime.datetime}} acceptable? > Inconsistent handling of dates in PySparks Row object > ----------------------------------------------------- > > Key: SPARK-7278 > URL: https://issues.apache.org/jira/browse/SPARK-7278 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.3.1 > Reporter: Kalle Jepsen > > Consider the following Python code: > {code:none} > import datetime > rdd = sc.parallelize([[0, datetime.date(2014, 11, 11)], [1, > datetime.date(2015,6,4)]]) > df = rdd.toDF(schema=['rid', 'date']) > row = df.first() > {code} > Accessing the {{date}} column via {{\_\_getitem\_\_}} returns a > {{datetime.datetime}} instance > {code:none} > >>>row[1] > datetime.datetime(2014, 11, 11, 0, 0) > {code} > while access via {{getattr}} returns a {{datetime.date}} instance: > {code:none} > >>>row.date > datetime.date(2014, 11, 11) > {code} > The problem seems to be that that Java deserializes the {{datetime.date}} > objects to {{datetime.datetime}}. This is taken care of > [here|https://github.com/apache/spark/blob/master/python/pyspark/sql/_types.py#L1027] > when using {{getattr}}, but is overlooked when directly accessing the tuple > by index. > Is there an easy way to fix this? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org