[ https://issues.apache.org/jira/browse/SPARK-31704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106642#comment-17106642 ]
Dongjoon Hyun commented on SPARK-31704: --------------------------------------- +1 for [~bryanc]'s advice. You may see Apache Spark 3.0.0 RC1 document. - https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/_site/index.html > PandasUDFType.GROUPED_AGG with Java 11 > -------------------------------------- > > Key: SPARK-31704 > URL: https://issues.apache.org/jira/browse/SPARK-31704 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.0 > Environment: java jdk: 11 > python: 3.7 > > Reporter: Markus Tretzmüller > Priority: Minor > Labels: newbie > > Running the example from the > [docs|https://spark.apache.org/docs/3.0.0-preview2/api/python/pyspark.sql.html#module-pyspark.sql.functions] > gives an error with java 11. It works with java 8. > {code:python} > import findspark > findspark.init('/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7') > from pyspark.sql.functions import pandas_udf, PandasUDFType > from pyspark.sql import Window > from pyspark.sql import SparkSession > if __name__ == '__main__': > spark = SparkSession \ > .builder \ > .appName('test') \ > .getOrCreate() > df = spark.createDataFrame( > [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], > ("id", "v")) > @pandas_udf("double", PandasUDFType.GROUPED_AGG) > def mean_udf(v): > return v.mean() > w = (Window.partitionBy('id') > .orderBy('v') > .rowsBetween(-1, 0)) > df.withColumn('mean_v', mean_udf(df['v']).over(w)).show() > {code} > {noformat} > File > "/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", > line 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o81.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 > in stage 7.0 failed 1 times, most recent failure: Lost task 44.0 in stage 7.0 > (TID 37, 131.130.32.15, executor driver): > java.lang.UnsupportedOperationException: sun.misc.Unsafe or > java.nio.DirectByteBuffer.<init>(long, int) not available > at > io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473) > at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243) > at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233) > at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245) > at > org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:240) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:132) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:120) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:94) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:101) > at > org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:373) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) > at > org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:213) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org