Wouldn't these be separately submitted jobs for separate workloads? You can of course dynamically change each job submitted to have whatever packages you like, from whatever is orchestrating. A single job doing everything sound right.
On Thu, Mar 10, 2022, 12:05 PM Rafał Wojdyła <ravwojd...@gmail.com> wrote: > Because I can't (and should not) know ahead of time which jobs will be > executed, that's the job of the orchestration layer (and can be dynamic). I > know I can specify multiple packages. Also not worried about memory. > > On Thu, 10 Mar 2022 at 13:54, Artemis User <arte...@dtechspace.com> wrote: > >> If changing packages or jars isn't your concern, why not just specify ALL >> packages that you would need for the Spark environment? You know you can >> define multiple packages under the packages option. This shouldn't cause >> memory issues since JVM uses dynamic class loading... >> >> On 3/9/22 10:03 PM, Rafał Wojdyła wrote: >> >> Hi Artemis, >> Thanks for your input, to answer your questions: >> >> > You may want to ask yourself why it is necessary to change the jar >> packages during runtime. >> >> I have a long running orchestrator process, which executes multiple spark >> jobs, currently on a single VM/driver, some of those jobs might >> require extra packages/jars (please see example in the issue). >> >> > Changing package doesn't mean to reload the classes. >> >> AFAIU this is unrelated >> >> > There is no way to reload the same class unless you customize the >> classloader of Spark. >> >> AFAIU this is an implementation detail. >> >> > I also don't think it is necessary to implement a warning or error >> message when changing the configuration since it doesn't do any harm >> >> To reiterate right now the API allows to change configuration of the >> context, without that configuration taking effect. See example of confused >> users here: >> * >> https://stackoverflow.com/questions/41886346/spark-2-1-0-session-config-settings-pyspark >> * >> https://stackoverflow.com/questions/53606756/how-to-set-spark-driver-memory-in-client-mode-pyspark-version-2-3-1 >> >> I'm curious if you have any opinion about the "hard-reset" workaround, >> copy-pasting from the issue: >> >> ``` >> s: SparkSession = ... >> >> # Hard reset: >> s.stop() >> s._sc._gateway.shutdown() >> s._sc._gateway.proc.stdin.close() >> SparkContext._gateway = None >> SparkContext._jvm = None >> ``` >> >> Cheers - Rafal >> >> On 2022/03/09 15:39:58 Artemis User wrote: >> > This is indeed a JVM issue, not a Spark issue. You may want to ask >> > yourself why it is necessary to change the jar packages during >> runtime. >> > Changing package doesn't mean to reload the classes. There is no way to >> > reload the same class unless you customize the classloader of Spark. I >> > also don't think it is necessary to implement a warning or error >> message >> > when changing the configuration since it doesn't do any harm. Spark >> > uses lazy binding so you can do a lot of such "unharmful" things. >> > Developers will have to understand the behaviors of each API before >> when >> > using them.. >> > >> > >> > On 3/9/22 9:31 AM, Rafał Wojdyła wrote: >> > > Sean, >> > > I understand you might be sceptical about adding this functionality >> > > into (py)spark, I'm curious: >> > > * would error/warning on update in configuration that is currently >> > > effectively impossible (requires restart of JVM) be reasonable? >> > > * what do you think about the workaround in the issue? >> > > Cheers - Rafal >> > > >> > > On Wed, 9 Mar 2022 at 14:24, Sean Owen <sr...@gmail.com> wrote: >> > > >> > > Unfortunately this opens a lot more questions and problems than it >> > > solves. What if you take something off the classpath, for example? >> > > change a class? >> > > >> > > On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła >> > > <ra...@gmail.com> wrote: >> > > >> > > Thanks Sean, >> > > To be clear, if you prefer to change the label on this issue >> > > from bug to sth else, feel free to do so, no strong opinions >> > > on my end. What happens to the classpath, whether spark uses >> > > some classloader magic, is probably an implementation detail. >> > > That said, it's definitely not intuitive that you can change >> > > the configuration and get the context (with the updated >> > > config) without any warnings/errors. Also what would you >> > > recommend as a workaround or solution to this problem? Any >> > > comments about the workaround in the issue? Keep in mind that >> > > I can't restart the long running orchestration process (python >> > > process if that matters). >> > > Cheers - Rafal >> > > >> > > On Wed, 9 Mar 2022 at 13:15, Sean Owen <sr...@gmail.com> >> wrote: >> > > >> > > That isn't a bug - you can't change the classpath once the >> > > JVM is executing. >> > > >> > > On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła >> > > <ra...@gmail.com> wrote: >> > > >> > > Hi, >> > > My use case is that, I have a long running process >> > > (orchestrator) with multiple tasks, some tasks might >> > > require extra spark dependencies. It seems once the >> > > spark context is started it's not possible to update >> > > `spark.jars.packages`? I have reported an issue at >> > > https://issues.apache.org/jira/browse/SPARK-38438, >> > > together with a workaround ("hard reset of the >> > > cluster"). I wonder if anyone has a solution for this? >> > > Cheers - Rafal >> > > >> > >> >>> >>