[ https://issues.apache.org/jira/browse/SPARK-47202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-47202: ------------------------------------ Assignee: Arzav Jain > AttributeError: module 'pandas' has no attribute 'Timstamp' > ----------------------------------------------------------- > > Key: SPARK-47202 > URL: https://issues.apache.org/jira/browse/SPARK-47202 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.1 > Reporter: Arzav Jain > Assignee: Arzav Jain > Priority: Minor > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > When using the pyspark.sql.types.TimestampType, if your value is a > datetime.datetime object with a tzinfo, [this > typo|https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/types.py#L996] > breaks things. > > I believe [this > commit|https://github.com/apache/spark/commit/46949e692e863992f4c50bdd482d5216d4fd9221] > introduced the bug 9 months ago > > Full stack trace below: > > {code:java} > File "/databricks/spark/python/pyspark/worker.py", line 1490, in main > process() File "/databricks/spark/python/pyspark/worker.py", line 1482, in > process serializer.dump_stream(out_iter, outfile) File > "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 531, in > dump_stream return ArrowStreamSerializer.dump_stream( File > "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 107, in > dump_stream for batch in iterator: File > "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 525, in > init_stream_yield_batches batch = self._create_batch(series) File > "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 511, in > _create_batch arrs.append(self._create_array(s, t, > arrow_cast=self._arrow_cast)) File > "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 284, in > _create_array series = conv(series) File > "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1060, in > <lambda> return lambda pser: pser.apply( # type: ignore[return-value] File > "/databricks/python/lib/python3.10/site-packages/pandas/core/series.py", line > 4771, in apply return SeriesApply(self, func, convert_dtype, args, > kwargs).apply() File > "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line > 1123, in apply return self.apply_standard() File > "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line > 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", > line 2924, in pandas._libs.lib.map_infer File > "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1061, in > <lambda> lambda x: conv(x) if x is not None else None # type: ignore[misc] > File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 889, in > convert_array return [ File > "/databricks/spark/python/pyspark/sql/pandas/types.py", line 890, in > <listcomp> _element_conv(v) if v is not None else None # type: ignore[misc] > File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1010, in > convert_struct return { File > "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1011, in > <dictcomp> name: conv(v) if conv is not None and v is not None else v File > "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in > convert_timestamp ts = pd.Timstamp(value) File > "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line > 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute > '{name}'") AttributeError: module 'pandas' has no attribute 'Timstamp' > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org