Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-13 Thread Gourav Sengupta
Hi, may be this is useful in case someone is testing SPARK in containers for developing SPARK. *From a production scale work point of view:* But if I am in AWS, I will just use GLUE if I want to use containers for SPARK, without massively increasing my costs for operations unnecessarily. Also,

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Hi Gaurav, All, I'm doing a spark-submit from my local system to a GCP Dataproc cluster .. This is more for dev/testing. I can run a -- 'gcloud dataproc jobs submit' command as well, which is what will be done in Production. Hope that clarifies. regds, Karan Alang On Sat, Feb 12, 2022 at 10:31

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Hi Holden, when you mention - GS Access jar - which jar is this ? Can you pls clarify ? thanks, Karan Alang On Sat, Feb 12, 2022 at 11:10 AM Holden Karau wrote: > You can also put the GS access jar with your Spark jars — that’s what the > class not found exception is pointing you towards. >

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Thanks, Mich - will check this and update. regds, Karan Alang On Sat, Feb 12, 2022 at 1:57 AM Mich Talebzadeh wrote: > BTW I also answered you in in stackoverflow : > > > https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit > > HTH > > >view my

Re: Help With unstructured text file with spark scala

2022-02-13 Thread Rafael Mendes
Hi, Danilo. Do you have a single large file, only? If so, I guess you can use tools like sed/awk to split it into more files based on layout, so you can read these files into Spark. Em qua, 9 de fev de 2022 09:30, Bitfox escreveu: > Hi > > I am not sure about the total situation. > But if you

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread Mich Talebzadeh
Putting the GS access jar with Spark jars may technically resolve the issue of spark-submit but it is not a recommended practice to create a local copy of jar files. The approach that the thread owner adopted by putting the files in Google cloud bucket is correct. Indeed this is what he states