[
https://issues.apache.org/jira/browse/SPARK-47202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-47202.
---------------------------------
> AttributeError: module 'pandas' has no attribute 'Timstamp'
> -----------------------------------------------------------
>
> Key: SPARK-47202
> URL: https://issues.apache.org/jira/browse/SPARK-47202
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.5.1
> Reporter: Arzav Jain
> Assignee: Arzav Jain
> Priority: Minor
> Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> When using the pyspark.sql.types.TimestampType, if your value is a
> datetime.datetime object with a tzinfo, [this
> typo|https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/types.py#L996]
> breaks things.
>
> I believe [this
> commit|https://github.com/apache/spark/commit/46949e692e863992f4c50bdd482d5216d4fd9221]
> introduced the bug 9 months ago
>
> Full stack trace below:
>
> {code:java}
> File "/databricks/spark/python/pyspark/worker.py", line 1490, in main
> process() File "/databricks/spark/python/pyspark/worker.py", line 1482, in
> process serializer.dump_stream(out_iter, outfile) File
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 531, in
> dump_stream return ArrowStreamSerializer.dump_stream( File
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 107, in
> dump_stream for batch in iterator: File
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 525, in
> init_stream_yield_batches batch = self._create_batch(series) File
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 511, in
> _create_batch arrs.append(self._create_array(s, t,
> arrow_cast=self._arrow_cast)) File
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 284, in
> _create_array series = conv(series) File
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1060, in
> <lambda> return lambda pser: pser.apply( # type: ignore[return-value] File
> "/databricks/python/lib/python3.10/site-packages/pandas/core/series.py", line
> 4771, in apply return SeriesApply(self, func, convert_dtype, args,
> kwargs).apply() File
> "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line
> 1123, in apply return self.apply_standard() File
> "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line
> 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx",
> line 2924, in pandas._libs.lib.map_infer File
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1061, in
> <lambda> lambda x: conv(x) if x is not None else None # type: ignore[misc]
> File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 889, in
> convert_array return [ File
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 890, in
> <listcomp> _element_conv(v) if v is not None else None # type: ignore[misc]
> File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1010, in
> convert_struct return { File
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1011, in
> <dictcomp> name: conv(v) if conv is not None and v is not None else v File
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in
> convert_timestamp ts = pd.Timstamp(value) File
> "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line
> 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute
> '{name}'") AttributeError: module 'pandas' has no attribute 'Timstamp'
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]