You are trying to access a Google storage bucket gs:// from your local host.

It does not see it because spark-submit assumes that it is a local file
system on the host which is not.

You need to mount gs:// bucket as a local file system.

You can use the tool called gcsfuse
https://cloud.google.com/storage/docs/gcs-fuse . Cloud Storage FUSE is an
open source FUSE <http://fuse.sourceforge.net/> adapter that allows you to
mount Cloud Storage buckets as file systems on Linux or macOS systems. You
can download gcsfuse from here
<https://github.com/GoogleCloudPlatform/gcsfuse>


Pretty simple.


It will be installed as /usr/bin/gcsfuse and you can mount it by creating a
local mount file like /mnt/gs as root and give permission to others to use
it.


As a normal user that needs to access gs:// bucket (not as root), use
gcsfuse to mount it. For example I am mounting a gcs bucket called
spark-jars-karan here


Just use the bucket name itself


gcsfuse spark-jars-karan /mnt/gs


Then you can refer to it as /mnt/gs in spark-submit from on-premise host

spark-submit --packages
org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 --jars
/mnt/gs/spark-bigquery-with-dependencies_2.12-0.23.2.jar

HTH

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 12 Feb 2022 at 04:31, karan alang <karan.al...@gmail.com> wrote:

> Hello All,
>
> I'm trying to access gcp buckets while running spark-submit from local,
> and running into issues.
>
> I'm getting error :
> ```
>
> 22/02/11 20:06:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" 
> org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
> "gs"
>
> ```
> I tried adding the --conf
> spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
>
> to the spark-submit command, but getting ClassNotFoundException
>
> Details are in stackoverflow :
>
> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
>
> Any ideas on how to fix this ?
> tia !
>
>

Reply via email to