Re: pySpark - pandas UDF and binaryType

Bryan Cutler Thu, 02 May 2019 13:32:47 -0700

Hi,

BinaryType support was not added until Spark 2.4.0, see
https://issues.apache.org/jira/browse/SPARK-23555. Also, pyarrow 0.10.0 or
greater is require as you saw in the docs.


Bryan

On Thu, May 2, 2019 at 4:26 AM Nicolas Paris <nicolas.pa...@riseup.net>
wrote:

> Hi all
>
> I am using pySpark 2.3.0 and pyArrow 0.10.0
>
> I want to apply a pandas-udf on a dataframe with <String, binaryType>
> I have the bellow error:
>
> > Invalid returnType with grouped map Pandas UDFs:
> >
> StructType(List(StructField(filename,StringType,true),StructField(contents,BinaryType,true)))
> > is not supported
>
>
> I am missing something ?
> the doc
> https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html#supported-sql-types
> says pyArrow 0.10 is minimum to handle binaryType
>
> here is the code:
>
> > from pyspark.sql.functions import pandas_udf, PandasUDFType
> >
> > df = sql("select filename, contents from test_binary")
> >
> > @pandas_udf("filename String, contents binary",
> PandasUDFType.GROUPED_MAP)
> > def transform_binary(pdf):
> >     contents = pdf.contents
> >     return pdf.assign(contents=contents)
> >
> > df.groupby("filename").apply(transform_binary).count()
>
> Thanks
> --
> nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: pySpark - pandas UDF and binaryType

Reply via email to