[ https://issues.apache.org/jira/browse/SPARK-30961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler resolved SPARK-30961. ---------------------------------- Resolution: Won't Fix Thanks [~KevinAppel] and [~nicornk] for the info, I'll go ahead and close this then. > Arrow enabled: to_pandas with date column fails > ----------------------------------------------- > > Key: SPARK-30961 > URL: https://issues.apache.org/jira/browse/SPARK-30961 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.5 > Environment: Apache Spark 2.4.5 > Reporter: Nicolas Renkamp > Priority: Major > Labels: ready-to-commit > > Hi, > there seems to be a bug in the arrow enabled to_pandas conversion from spark > dataframe to pandas dataframe when the dataframe has a column of type > DateType. Here is a minimal example to reproduce the issue: > {code:java} > spark = SparkSession.builder.getOrCreate() > is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled") > print("Arrow optimization is enabled: " + is_arrow_enabled) > spark_df = spark.createDataFrame( > [['2019-12-06']], 'created_at: string') \ > .withColumn('created_at', F.to_date('created_at')) > # works > spark_df.toPandas() > spark.conf.set("spark.sql.execution.arrow.enabled", 'true') > is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled") > print("Arrow optimization is enabled: " + is_arrow_enabled) > # raises AttributeError: Can only use .dt accessor with datetimelike values > # series is still of type object, .dt does not exist > spark_df.toPandas(){code} > A fix would be to modify the _check_series_convert_date function in > pyspark.sql.types to: > {code:java} > def _check_series_convert_date(series, data_type): > """ > Cast the series to datetime.date if it's a date type, otherwise returns > the original series. :param series: pandas.Series > :param data_type: a Spark data type for the series > """ > from pyspark.sql.utils import require_minimum_pandas_version > require_minimum_pandas_version() from pandas import to_datetime > if type(data_type) == DateType: > return to_datetime(series).dt.date > else: > return series > {code} > Let me know if I should prepare a Pull Request for the 2.4.5 branch. > I have not tested the behavior on master branch. > > Thanks, > Nicolas -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org