Hi Gaurav, All,
I'm doing a spark-submit from my local system to a GCP Dataproc cluster ..
This is more for dev/testing.
I can run a -- 'gcloud dataproc jobs submit' command as well, which is what
will be done in Production.

Hope that clarifies.

regds,
Karan Alang


On Sat, Feb 12, 2022 at 10:31 PM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> agree with Holden, have faced quite a few issues with FUSE.
>
> Also trying to understand "spark-submit from local" . Are you submitting
> your SPARK jobs from a local laptop or in local mode from a GCP dataproc /
> system?
>
> If you are submitting the job from your local laptop, there will be
> performance bottlenecks I guess based on the internet bandwidth and volume
> of data.
>
> Regards,
> Gourav
>
>
> On Sat, Feb 12, 2022 at 7:12 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> You can also put the GS access jar with your Spark jars — that’s what the
>> class not found exception is pointing you towards.
>>
>> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> BTW I also answered you in in stackoverflow :
>>>
>>>
>>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sat, 12 Feb 2022 at 08:24, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> You are trying to access a Google storage bucket gs:// from your local
>>>> host.
>>>>
>>>> It does not see it because spark-submit assumes that it is a local file
>>>> system on the host which is not.
>>>>
>>>> You need to mount gs:// bucket as a local file system.
>>>>
>>>> You can use the tool called gcsfuse
>>>> https://cloud.google.com/storage/docs/gcs-fuse . Cloud Storage FUSE is
>>>> an open source FUSE <http://fuse.sourceforge.net/> adapter that allows
>>>> you to mount Cloud Storage buckets as file systems on Linux or macOS
>>>> systems. You can download gcsfuse from here
>>>> <https://github.com/GoogleCloudPlatform/gcsfuse>
>>>>
>>>>
>>>> Pretty simple.
>>>>
>>>>
>>>> It will be installed as /usr/bin/gcsfuse and you can mount it by
>>>> creating a local mount file like /mnt/gs as root and give permission to
>>>> others to use it.
>>>>
>>>>
>>>> As a normal user that needs to access gs:// bucket (not as root), use
>>>> gcsfuse to mount it. For example I am mounting a gcs bucket called
>>>> spark-jars-karan here
>>>>
>>>>
>>>> Just use the bucket name itself
>>>>
>>>>
>>>> gcsfuse spark-jars-karan /mnt/gs
>>>>
>>>>
>>>> Then you can refer to it as /mnt/gs in spark-submit from on-premise host
>>>>
>>>> spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 
>>>> --jars /mnt/gs/spark-bigquery-with-dependencies_2.12-0.23.2.jar
>>>>
>>>> HTH
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, 12 Feb 2022 at 04:31, karan alang <karan.al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I'm trying to access gcp buckets while running spark-submit from
>>>>> local, and running into issues.
>>>>>
>>>>> I'm getting error :
>>>>> ```
>>>>>
>>>>> 22/02/11 20:06:59 WARN NativeCodeLoader: Unable to load native-hadoop 
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> Exception in thread "main" 
>>>>> org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for 
>>>>> scheme "gs"
>>>>>
>>>>> ```
>>>>> I tried adding the --conf
>>>>> spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
>>>>>
>>>>> to the spark-submit command, but getting ClassNotFoundException
>>>>>
>>>>> Details are in stackoverflow :
>>>>>
>>>>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
>>>>>
>>>>> Any ideas on how to fix this ?
>>>>> tia !
>>>>>
>>>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Reply via email to