Hello I am trying to load the Hadoop-Azure driver in Apache Spark, but so far I have failed. The plan is to include the required files in the docker image, as we plan on using a Client-mode SparkSession.
My current Dockerfile looks like this: ________________________________ FROM spark:latest COPY *.jar $SPARK_HOME/jars ENV SPARK_EXTRA_CLASSPATH="$SPARK_HOME/jars/hadoop-azure-3.2.0.jar:$SPARK_HOME/jars/azure-keyvault-core-1.2.4.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/jetty-util-ajax-9.3.24.v20180605.jar:$SPARK_HOME/jars/wildfly-openssl-2.1.3.Final.jar" ENV HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-azure-datalake" ________________________________ In the directory I have the following dependencies: hadoop-azure-3.2.0.jar azure-storage-8.6.6.jar azure-keyvault-core-1.2.4.jar jetty-util-ajax-9.3.24.v20180605.jar wildfly-openssl-2.1.3.Final.jar (I have validated that these files are part of the image and located where I expect (/opt/spark/jars)) When looking in the Spark UI - Environment, I can't see that Hadoop-azure should be loaded. In addition when I try and read a file using the wasb:// schema, I get the following error: java.lang.classnotfoundexception: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found