[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171879#comment-17171879 ]
Hyukjin Kwon commented on SPARK-32547: -------------------------------------- I think this is more because of a limitation of Python library itself, see e.g.) https://bugs.python.org/issue31212. When you have to convert back to the local dates, seems like the adjustment can set the year to 0 internally in case of {{0001-01-01}} specifically. > Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType > ------------------------------------------------------------------------------ > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.0 > Reporter: Manjunath H > Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+0000").cast(TimestampType())) > new_df.printSchema() > root > |-- id: long (nullable = true) > |-- txt: string (nullable = true) > |-- test_timestamp: timestamp (nullable = true) > new_df.show() > +---+---+-------------------+ > | id|txt| test_timestamp| > +---+---+-------------------+ > | 1|foo|0001-01-01 00:00:00| > | 2|bar|0001-01-01 00:00:00| > +---+---+-------------------+ > {code} > > new_df.rdd.isEmpty() operation is failing with *year 0 is out of range* > > {code:java} > new_df.rdd.isEmpty() > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: > Traceback (most recent call last): > File "/databricks/spark/python/pyspark/serializers.py", line 177, in > _read_with_length > return self.loads(obj) > File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads > return pickle.loads(obj, encoding=encoding) > File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in <lambda> > return lambda *a: dataType.fromInternal(a) > File "/databricks/spark/python/pyspark/sql/types.py", line 635, in > fromInternal > for f, v, c in zip(self.fields, obj, self._needConversion)] > File "/databricks/spark/python/pyspark/sql/types.py", line 635, in <listcomp> > for f, v, c in zip(self.fields, obj, self._needConversion)] > File "/databricks/spark/python/pyspark/sql/types.py", line 447, in > fromInternal > return self.dataType.fromInternal(obj) > File "/databricks/spark/python/pyspark/sql/types.py", line 201, in > fromInternal > return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts > % 1000000) > ValueError: year 0 is out of range{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org