Re: PySpark Pandas UDF

2019-11-17 Thread Gourav Sengupta
Hi,

sorry a completely unrelated question.

when is the upcoming release of SPARK 3.0. There are several parallel
distributed deep learning frameworks that are being developed, do you think
that we could use SPARK 3.0 for distributed deep learning using Pytorch or
Tensorflow?

Is there any place where we can find its details?

Regards,
Gourav Sengupta

On Mon, Nov 18, 2019 at 7:09 AM Bryan Cutler  wrote:

> There was a change in the binary format of Arrow 0.15.1 and there is an
> environment variable you can set to make pyarrow 0.15.1 compatible with
> current Spark, which looks to be your problem. Please see the doc below for
> instructions added in SPARK-2936. Note, this will not be required for the
> upcoming release of Spark 3.0.0.
>
> https://github.com/apache/spark/blob/master/docs/sql-pyspark-pandas-with-arrow.md#compatibiliy-setting-for-pyarrow--0150-and-spark-23x-24x
>
> On Tue, Nov 12, 2019 at 7:53 AM Holden Karau  wrote:
>
>> Thanks for sharing that. I think we should maybe add some checks around
>> this so it’s easier to debug. I’m CCing Bryan who might have some thoughts.
>>
>> On Tue, Nov 12, 2019 at 7:42 AM gal.benshlomo 
>> wrote:
>>
>>> SOLVED!
>>> thanks for the help - I found the issue. it was the version of pyarrow
>>> (0.15.1) which apparently isn't currently stable. Downgrading it solved
>>> the
>>> issue for me
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: PySpark Pandas UDF

2019-11-17 Thread Bryan Cutler
There was a change in the binary format of Arrow 0.15.1 and there is an
environment variable you can set to make pyarrow 0.15.1 compatible with
current Spark, which looks to be your problem. Please see the doc below for
instructions added in SPARK-2936. Note, this will not be required for the
upcoming release of Spark 3.0.0.
https://github.com/apache/spark/blob/master/docs/sql-pyspark-pandas-with-arrow.md#compatibiliy-setting-for-pyarrow--0150-and-spark-23x-24x

On Tue, Nov 12, 2019 at 7:53 AM Holden Karau  wrote:

> Thanks for sharing that. I think we should maybe add some checks around
> this so it’s easier to debug. I’m CCing Bryan who might have some thoughts.
>
> On Tue, Nov 12, 2019 at 7:42 AM gal.benshlomo 
> wrote:
>
>> SOLVED!
>> thanks for the help - I found the issue. it was the version of pyarrow
>> (0.15.1) which apparently isn't currently stable. Downgrading it solved
>> the
>> issue for me
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>