DAAworld commented on PR #37232: URL: https://github.com/apache/spark/pull/37232#issuecomment-2378324786
> With the release of pandas 2.0, I think this is PR should be re-opened, right? > > I can recreate the issue originally described with > > ```python > Python 3.9.16 (main, May 3 2023, 09:54:39) > [GCC 10.2.1 20210110] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> pyspark.__version__ > '3.4.0' > >>> import pandas > >>> pandas.__version__ > '2.0.1' > >>> import pyspark.pandas as ps > >>> ps.DatetimeIndex(["1970-01-01", "1970-01-02", "1970-01-03"]) > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). > 23/05/18 21:07:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > 23/05/18 21:07:31 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/indexes/base.py", line 2705, in __repr__ > pindex = self._psdf._get_or_create_repr_pandas_cache(max_display_count).index > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 13347, in _get_or_create_repr_pandas_cache > self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()} > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 13342, in _to_internal_pandas > return self._internal.to_pandas_frame > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/utils.py", line 588, in wrapped_lazy_property > setattr(self, attr_name, fn(self)) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/internal.py", line 1056, in to_pandas_frame > pdf = sdf.toPandas() > File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 251, in toPandas > if (t is not None and not all([is_timedelta64_dtype(t),is_datetime64_dtype(t)])) or should_check_timedelta: > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 6324, in astype > new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 451, in astype > return self.apply( > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 352, in apply > applied = getattr(b, f)(**kwargs) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 511, in astype > new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 242, in astype_array_safe > new_values = astype_array(values, dtype, copy=copy) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 184, in astype_array > values = values.astype(dtype, copy=copy) > File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 694, in astype > raise TypeError( > TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead. > ``` my pandas == 2.2.2,pyspark==3.4.3,also raise TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org