[ https://issues.apache.org/jira/browse/SPARK-32137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179562#comment-17179562 ]
rahul bhatia commented on SPARK-32137: -------------------------------------- Hi [~WaterKnight], I have been trying to resolve this issue for hours, but no progress, did you find a solution? > AttributeError: Can only use .dt accessor with datetimelike values > ------------------------------------------------------------------ > > Key: SPARK-32137 > URL: https://issues.apache.org/jira/browse/SPARK-32137 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.4.5 > Reporter: David Lacalle Castillo > Priority: Major > > I was using a pandas udf with a dataframe containing a date object. I was > using the lastversion of pyarrow, 0.17.0. > I setup this variable on zeppelin spark interpreter: > ARROW_PRE_0_15_IPC_FORMAT=1 > > However, I was getting the following error: > Job aborted due to stage failure: Task 0 in stage 19.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 19.0 (TID 1619, 10.20.0.5, executor > 1): org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 377, in main > process() > File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in > process > serializer.dump_stream(func(split_index, iterator), outfile) > File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 290, > in dump_stream > for series in iterator: > File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 311, > in load_stream > yield [self.arrow_to_pandas(c) for c in > pa.Table.from_batches([batch]).itercolumns()] > File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 311, > in <listcomp> > yield [self.arrow_to_pandas(c) for c in > pa.Table.from_batches([batch]).itercolumns()] > File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 278, > in arrow_to_pandas > s = _check_series_convert_date(s, from_arrow_type(arrow_column.type)) > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1692, in > _check_series_convert_date > return series.dt.date > File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line > 5270, in getattr > return object.getattribute(self, name) > File "/usr/local/lib/python3.7/dist-packages/pandas/core/accessor.py", line > 187, in get > accessor_obj = self._accessor(obj) > File > "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py", > line 338, in new > raise AttributeError("Can only use .dt accessor with datetimelike values") > AttributeError: Can only use .dt accessor with datetimelike values > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:456) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:172) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org