Re: Spark 3.1 with spark AVRO

2022-03-10 Thread Yong Zhang
Thank you so much, you absolutely nailed it. There is a stupid "SPARK_HOME" env variable pointing to Spark2.4 existed on zsh, which is the troublemaker. Totally forgot that and didn't realize this environment variable could cause days frustration for me. Yong

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Artemis User
I guess there are several misconceptions here: 1. Worker doesn't create driver, client does. 2. Regardless of job scheduling, all jobs of the same task/application are under the same SparkContext which is created by the driver.  Therefore, you need to specify ALL dependency jars for ALL

Re: Spark 3.1 with spark AVRO

2022-03-10 Thread Artemis User
It must be some misconfiguration in your environment.  Do you perhaps have a hardwired $SPARK_HOME env variable in your shell?  An easy test would be to place the spark-avro jar file you downloaded in the jars directory of Spark and run spark-shell again without the packages option.  This will

Spark 3.1 with spark AVRO

2022-03-10 Thread Yong Zhang
Hi, I am puzzled with this issue of Spark 3.1 version to read avro file. Everything is done on my local mac laptop so far, and I really don't know where the issue comes from, and I googled a lot and cannot find any clue. I am always using Spark 2.4 version, as it is really mature. But for a

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Sean Owen
Wouldn't these be separately submitted jobs for separate workloads? You can of course dynamically change each job submitted to have whatever packages you like, from whatever is orchestrating. A single job doing everything sound right. On Thu, Mar 10, 2022, 12:05 PM Rafał Wojdyła wrote: >

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Rafał Wojdyła
Because I can't (and should not) know ahead of time which jobs will be executed, that's the job of the orchestration layer (and can be dynamic). I know I can specify multiple packages. Also not worried about memory. On Thu, 10 Mar 2022 at 13:54, Artemis User wrote: > If changing packages or

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Artemis User
If changing packages or jars isn't your concern, why not just specify ALL packages that you would need for the Spark environment? You know you can define multiple packages under the packages option.  This shouldn't cause memory issues since JVM uses dynamic class loading... On 3/9/22 10:03

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-10 Thread Saurabh Gulati
Hi Gourav, We use auto-scaling containers in GKE for running the Spark thriftserver. From: Gourav Sengupta Sent: 07 March 2022 14:36 To: Saurabh Gulati Cc: Mich Talebzadeh ; Kidong Lee ; user@spark.apache.org Subject: Re: [EXTERNAL] Re: Need to make WHERE