[ 
https://issues.apache.org/jira/browse/SPARK-47202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47202:
------------------------------------

    Assignee: Arzav Jain

> AttributeError: module 'pandas' has no attribute 'Timstamp'
> -----------------------------------------------------------
>
>                 Key: SPARK-47202
>                 URL: https://issues.apache.org/jira/browse/SPARK-47202
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.1
>            Reporter: Arzav Jain
>            Assignee: Arzav Jain
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> When using the pyspark.sql.types.TimestampType, if your value is a 
> datetime.datetime object with a tzinfo, [this 
> typo|https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/types.py#L996]
>  breaks things.
>  
> I believe [this 
> commit|https://github.com/apache/spark/commit/46949e692e863992f4c50bdd482d5216d4fd9221]
>  introduced the bug 9 months ago
>  
> Full stack trace below:
>  
> {code:java}
> File "/databricks/spark/python/pyspark/worker.py", line 1490, in main 
> process() File "/databricks/spark/python/pyspark/worker.py", line 1482, in 
> process serializer.dump_stream(out_iter, outfile) File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 531, in 
> dump_stream return ArrowStreamSerializer.dump_stream( File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 107, in 
> dump_stream for batch in iterator: File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 525, in 
> init_stream_yield_batches batch = self._create_batch(series) File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 511, in 
> _create_batch arrs.append(self._create_array(s, t, 
> arrow_cast=self._arrow_cast)) File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 284, in 
> _create_array series = conv(series) File 
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1060, in 
> <lambda> return lambda pser: pser.apply( # type: ignore[return-value] File 
> "/databricks/python/lib/python3.10/site-packages/pandas/core/series.py", line 
> 4771, in apply return SeriesApply(self, func, convert_dtype, args, 
> kwargs).apply() File 
> "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line 
> 1123, in apply return self.apply_standard() File 
> "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line 
> 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", 
> line 2924, in pandas._libs.lib.map_infer File 
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1061, in 
> <lambda> lambda x: conv(x) if x is not None else None # type: ignore[misc] 
> File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 889, in 
> convert_array return [ File 
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 890, in 
> <listcomp> _element_conv(v) if v is not None else None # type: ignore[misc] 
> File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1010, in 
> convert_struct return { File 
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1011, in 
> <dictcomp> name: conv(v) if conv is not None and v is not None else v File 
> "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in 
> convert_timestamp ts = pd.Timstamp(value) File 
> "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 
> 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute 
> '{name}'") AttributeError: module 'pandas' has no attribute 'Timstamp'
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to