[jira] [Commented] (SPARK-32547) Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType
[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172199#comment-17172199 ] Manjunath H commented on SPARK-32547: - [~Qin Yao] timezone is UTC spark.conf.get('spark.sql.session.timeZone') Out[2]: 'Etc/UTC' import datetime LOCAL_TIMEZONE = datetime.datetime.now(datetime.timezone(datetime.timedelta(0))).astimezone().tzinfo print(LOCAL_TIMEZONE) UTC > Cant able to process Timestamp 0001-01-01T00:00:00.000+ with TimestampType > -- > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Manjunath H >Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) > new_df.printSchema() > root > |-- id: long (nullable = true) > |-- txt: string (nullable = true) > |-- test_timestamp: timestamp (nullable = true) > new_df.show() > +---+---+---+ > | id|txt| test_timestamp| > +---+---+---+ > | 1|foo|0001-01-01 00:00:00| > | 2|bar|0001-01-01 00:00:00| > +---+---+---+ > {code} > > new_df.rdd.isEmpty() operation is failing with *year 0 is out of range* > > {code:java} > new_df.rdd.isEmpty() > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: > Traceback (most recent call last): > File "/databricks/spark/python/pyspark/serializers.py", line 177, in > _read_with_length > return self.loads(obj) > File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads > return pickle.loads(obj, encoding=encoding) > File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in > return lambda *a: dataType.fromInternal(a) > File "/databricks/spark/python/pyspark/sql/types.py", line 635, in > fromInternal > for f, v, c in zip(self.fields, obj, self._needConversion)] > File "/databricks/spark/python/pyspark/sql/types.py", line 635, in > for f, v, c in zip(self.fields, obj, self._needConversion)] > File "/databricks/spark/python/pyspark/sql/types.py", line 447, in > fromInternal > return self.dataType.fromInternal(obj) > File "/databricks/spark/python/pyspark/sql/types.py", line 201, in > fromInternal > return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts > % 100) > ValueError: year 0 is out of range{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32547) Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType
[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manjunath H updated SPARK-32547: Description: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+---+ {code} new_df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} new_df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} was: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+---+ {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} > Cant able to process Timestamp 0001-01-01T00:00:00.000+ with TimestampType > -- > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Manjunath H >Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) > new_df.printSchema() > root > |-- id: long (nu
[jira] [Updated] (SPARK-32547) Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType
[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manjunath H updated SPARK-32547: Description: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+---+ {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} was: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+--- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} > Cant able to process Timestamp 0001-01-01T00:00:00.000+ with TimestampType > -- > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Manjunath H >Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) > new_df.printSchema() > root > |-- id: long (nullable = true)
[jira] [Updated] (SPARK-32547) Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType
[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manjunath H updated SPARK-32547: Description: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_date: timestamp (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+--- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} was: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_date: timestamp (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_date| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+--- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} > Cant able to process Timestamp 0001-01-01T00:00:00.000+ with TimestampType > -- > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Manjunath H >Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(Time
[jira] [Updated] (SPARK-32547) Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType
[ https://issues.apache.org/jira/browse/SPARK-32547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manjunath H updated SPARK-32547: Description: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+--- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} was: Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_date: timestamp (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+---+ | id|txt| test_timestamp| +---+---+---+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+--- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) ValueError: year 0 is out of range{code} > Cant able to process Timestamp 0001-01-01T00:00:00.000+ with TimestampType > -- > > Key: SPARK-32547 > URL: https://issues.apache.org/jira/browse/SPARK-32547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Manjunath H >Priority: Major > > Spark Version : 3.0.0 > Below is the sample code to reproduce the problem with TimestampType. > {code:java} > from pyspark.sql.functions import lit > from pyspark.sql.types import TimestampType > df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) > new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+").cast(TimestampType())) > new_df.printSchema() > ro