Re: PySpark Pandas UDF

Bryan Cutler Sun, 17 Nov 2019 23:09:18 -0800

There was a change in the binary format of Arrow 0.15.1 and there is an
environment variable you can set to make pyarrow 0.15.1 compatible with
current Spark, which looks to be your problem. Please see the doc below for
instructions added in SPARK-2936. Note, this will not be required for the
upcoming release of Spark 3.0.0.
https://github.com/apache/spark/blob/master/docs/sql-pyspark-pandas-with-arrow.md#compatibiliy-setting-for-pyarrow--0150-and-spark-23x-24x


On Tue, Nov 12, 2019 at 7:53 AM Holden Karau <hol...@pigscanfly.ca> wrote:

> Thanks for sharing that. I think we should maybe add some checks around
> this so it’s easier to debug. I’m CCing Bryan who might have some thoughts.
>
> On Tue, Nov 12, 2019 at 7:42 AM gal.benshlomo <gal.benshl...@startapp.com>
> wrote:
>
>> SOLVED!
>> thanks for the help - I found the issue. it was the version of pyarrow
>> (0.15.1) which apparently isn't currently stable. Downgrading it solved
>> the
>> issue for me
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: PySpark Pandas UDF

Reply via email to