Here is a link to the JIRA for adding StructType support for scalar pandas_udf https://issues.apache.org/jira/browse/SPARK-24579
On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi <hich...@elkhalfi.com> wrote: > Hey Holden, > Thanks for your reply, > > We currently using a python function that produces a Row(TS=LongType(), > bin=BinaryType()). > We use this function like this dataframe.rdd.map(my_function) > .toDF().write.parquet() > > To reuse it in pandas_udf, we changes the return type to > StructType(StructField(Long), StructField(BinaryType). > > 1)But we face an issue that StructType is not supported by pandas_udf. > > So I was wondering to still continue to reuse dataftame.rdd.map but get an > improvement in serialization by using ArrowFormat instead of Pickle. > > *From:* hol...@pigscanfly.ca > *Sent:* July 25, 2018 4:41 PM > *To:* hich...@elkhalfi.com > *Cc:* user@spark.apache.org > *Subject:* Re: Use Arrow instead of Pickle without pandas_udf > > Not currently. What's the problem with pandas_udf for your use case? > > On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi <hich...@elkhalfi.com> > wrote: > >> Hi There, >> >> >> Is there a way to use Arrow format instead of Pickle but without using >> pandas_udf ? >> >> >> Thank for your help, >> >> >> Hichame >> > > > > -- > Twitter: https://twitter.com/holdenkarau >