Without seeing the code and the whole stack trace, just a wild guess if
you set the config param for enabling arrow
(spark.sql.execution.arrow.pyspark.enabled)? If not in your code, you
would have to set it in the spark-default.conf. Please note that the
parameter spark.sql.execution.arrow.enabled is deprecated since Spark 3.0...
-- ND
On 8/7/21 2:08 PM, Mich Talebzadeh wrote:
Hi,
I encounter the error:
"java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available"
When reading from Google BigQuery (GBQ) table using Kubernetes cluster
built on debian buster
The current debian bustere from the docker image is:
root@ccf3ac45d0ed:/opt/spark/work-dir# cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
And the Java version is
echo $JAVA_HOME
/usr/local/openjdk-11
Now according to Spark 3.1.2 doc <https://spark.apache.org/docs/latest/>
"_For Java 11_, |-Dio.netty.tryReflectionSetAccessible=true| is
required additionally for Apache Arrow library. This prevents
|/java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.(long, int) not available/| when Apache
Arrow uses Netty internally.
So I have used it as follows:
spark-submit --verbose \
--properties-file ${property_file} \
--master k8s://https://$KUBERNETES_MASTER_IP:443 \
--deploy-mode cluster \
--name pytest \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \
--py-files $CODE_DIRECTORY/DSBQ.zip \
--conf spark.kubernetes.namespace=$NAMESPACE \
--conf spark.executor.memory=5000m \
--conf spark.network.timeout=300 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf spark.driver.cores=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=2000m \
--conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
--conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
--conf spark.kubernetes.container.image=${IMAGEGCP} \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
--conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
--conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
$CODE_DIRECTORY/${APPLICATION}
However, some comments mentioned
<https://stackoverflow.com/questions/62109276/errorjava-lang-unsupportedoperationexception-for-pyspark-pandas-udf-documenta>
that these parameters need to be supplied before spark-submit, so I
added them to $SPARK_HOME/conf/spark-defaults.conf
185@b272bbf663e6:/opt/spark/conf$ cat spark-defaults.conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
But I'm still getting the same error!
Any ideas will be appreciated.
Mich
*Disclaimer:* Use it at your own risk.Any and all responsibility for
any loss, damage or destruction of data or any other property which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.