[ https://issues.apache.org/jira/browse/SPARK-39821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39821: ------------------------------------ Assignee: Apache Spark > DatetimeIndex error during pyspark session > ------------------------------------------ > > Key: SPARK-39821 > URL: https://issues.apache.org/jira/browse/SPARK-39821 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.2 > Environment: OS: ubuntu > Python version: 3.8.13 > Reporter: bo zhao > Assignee: Apache Spark > Priority: Minor > > {code:java} > Using Python version 3.8.13 (default, Jun 29 2022 11:50:19) > Spark context Web UI available at http://172.25.179.45:4042 > Spark context available as 'sc' (master = local[*], app id = > local-1658283215853). > SparkSession available as 'spark'. > >>> from pyspark import pandas as ps > WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It > is required to set this environment variable to '1' in both driver and > executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you > but it does not work if there is a Spark context already launched. > >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) > /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: > iteritems is deprecated and will be removed in a future version. Use .items > instead. > fields = [ > /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: > iteritems is deprecated and will be removed in a future version. Use .items > instead. > for column, series in pdf.iteritems(): > /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: > FutureWarning: iteritems is deprecated and will be removed in a future > version. Use .items instead. > for item in s.iteritems(): > /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: > FutureWarning: iteritems is deprecated and will be removed in a future > version. Use .items instead. > for item in s.iteritems(): > /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: > FutureWarning: iteritems is deprecated and will be removed in a future > version. Use .items instead. > for item in s.iteritems(): > /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: > FutureWarning: iteritems is deprecated and will be removed in a future > version. Use .items instead. > for item in s.iteritems(): > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, > in __repr__ > pindex = > self._psdf._get_or_create_repr_pandas_cache(max_display_count).index > File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in > _get_or_create_repr_pandas_cache > self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()} > File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in > _to_internal_pandas > return self._internal.to_pandas_frame > File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in > wrapped_lazy_property > setattr(self, attr_name, fn(self)) > File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in > to_pandas_frame > pdf = sdf.toPandas() > File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, > in toPandas > series = series.astype(t, copy=False) > File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in > astype > new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) > File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line > 386, in astype > return self.apply("astype", dtype=dtype, copy=copy, errors=errors) > File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line > 308, in apply > applied = getattr(b, f)(**kwargs) > File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line > 526, in astype > new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) > File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, > in astype_array_safe > new_values = astype_array(values, dtype, copy=copy) > File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, > in astype_array > values = values.astype(dtype, copy=copy) > File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line > 631, in astype > return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy) > File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line > 504, in astype > raise TypeError(msg) > TypeError: Cannot cast DatetimeArray to dtype datetime64 > {code} > I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', > '1970-01-01']) in the session. > But it don't raise error like below > {code:java} > a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code} > It will raise error when I call a in the session, such as > {code:java} > >>> a > {code} > So, it would be in trouch in the __repr__ function. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org