Without seeing the code and the whole stack trace, just a wild guess if you set the config param for enabling arrow (spark.sql.execution.arrow.pyspark.enabled)?  If not in your code, you would have to set it in the spark-default.conf.   Please note that the parameter spark.sql.execution.arrow.enabled is deprecated since Spark 3.0...

 I encounter the error:

"java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available"

When reading from Google BigQuery (GBQ) table using Kubernetes cluster built on debian buster

The current debian bustere from the docker image is:

root@ccf3ac45d0ed:/opt/spark/work-dir# cat /etc/*-release

PRETTY_NAME="Debian GNU/Linux 10 (buster)"

And the Java version is



Now according to Spark 3.1.2 doc <https://spark.apache.org/docs/latest/>

"_For Java 11_, |-Dio.netty.tryReflectionSetAccessible=true| is required additionally for Apache Arrow library. This prevents |/java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available/| when Apache Arrow uses Netty internally.

So I have used it as follows:

spark-submit --verbose \

 --properties-file ${property_file} \

 --master k8s://https://$KUBERNETES_MASTER_IP:443 \

 --deploy-mode cluster \

 --name pytest \

 --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \

 --py-files $CODE_DIRECTORY/DSBQ.zip \

 --conf spark.kubernetes.namespace=$NAMESPACE \

 --conf spark.executor.memory=5000m \

 --conf spark.network.timeout=300 \

 --conf spark.executor.instances=2 \

 --conf spark.kubernetes.driver.limit.cores=1 \

 --conf spark.driver.cores=1 \

 --conf spark.executor.cores=1 \

 --conf spark.executor.memory=2000m \

 --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \

 --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \

 --conf spark.kubernetes.container.image=${IMAGEGCP} \

 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \

 --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \

 --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \


However, some comments mentioned <https://stackoverflow.com/questions/62109276/errorjava-lang-unsupportedoperationexception-for-pyspark-pandas-udf-documenta> that these parameters need to be supplied before spark-submit, so I added them to $SPARK_HOME/conf/spark-defaults.conf

185@b272bbf663e6:/opt/spark/conf$ cat spark-defaults.conf



But I'm still getting the same error!

Any ideas will be appreciated.


