Because I can't (and should not) know ahead of time which jobs will be
executed, that's the job of the orchestration layer (and can be dynamic). I
know I can specify multiple packages. Also not worried about memory.

On Thu, 10 Mar 2022 at 13:54, Artemis User <arte...@dtechspace.com> wrote:

> If changing packages or jars isn't your concern, why not just specify ALL
> packages that you would need for the Spark environment?  You know you can
> define multiple packages under the packages option.  This shouldn't cause
> memory issues since JVM uses dynamic class loading...
>
> On 3/9/22 10:03 PM, Rafał Wojdyła wrote:
>
> Hi Artemis,
> Thanks for your input, to answer your questions:
>
> > You may want to ask yourself why it is necessary to change the jar
> packages during runtime.
>
> I have a long running orchestrator process, which executes multiple spark
> jobs, currently on a single VM/driver, some of those jobs might
> require extra packages/jars (please see example in the issue).
>
> > Changing package doesn't mean to reload the classes.
>
> AFAIU this is unrelated
>
> > There is no way to reload the same class unless you customize the
> classloader of Spark.
>
> AFAIU this is an implementation detail.
>
> > I also don't think it is necessary to implement a warning or error
> message when changing the configuration since it doesn't do any harm
>
> To reiterate right now the API allows to change configuration of the
> context, without that configuration taking effect. See example of confused
> users here:
>  *
> https://stackoverflow.com/questions/41886346/spark-2-1-0-session-config-settings-pyspark
>  *
> https://stackoverflow.com/questions/53606756/how-to-set-spark-driver-memory-in-client-mode-pyspark-version-2-3-1
>
> I'm curious if you have any opinion about the "hard-reset" workaround,
> copy-pasting from the issue:
>
> ```
> s: SparkSession = ...
>
> # Hard reset:
> s.stop()
> s._sc._gateway.shutdown()
> s._sc._gateway.proc.stdin.close()
> SparkContext._gateway = None
> SparkContext._jvm = None
> ```
>
> Cheers - Rafal
>
> On 2022/03/09 15:39:58 Artemis User wrote:
> > This is indeed a JVM issue, not a Spark issue.  You may want to ask
> > yourself why it is necessary to change the jar packages during runtime.
> > Changing package doesn't mean to reload the classes. There is no way to
> > reload the same class unless you customize the classloader of Spark.  I
> > also don't think it is necessary to implement a warning or error message
> > when changing the configuration since it doesn't do any harm.  Spark
> > uses lazy binding so you can do a lot of such "unharmful" things.
> > Developers will have to understand the behaviors of each API before when
> > using them..
> >
> >
> > On 3/9/22 9:31 AM, Rafał Wojdyła wrote:
> > >  Sean,
> > > I understand you might be sceptical about adding this functionality
> > > into (py)spark, I'm curious:
> > > * would error/warning on update in configuration that is currently
> > > effectively impossible (requires restart of JVM) be reasonable?
> > > * what do you think about the workaround in the issue?
> > > Cheers - Rafal
> > >
> > > On Wed, 9 Mar 2022 at 14:24, Sean Owen <sr...@gmail.com> wrote:
> > >
> > >     Unfortunately this opens a lot more questions and problems than it
> > >     solves. What if you take something off the classpath, for example?
> > >     change a class?
> > >
> > >     On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła
> > >     <ra...@gmail.com> wrote:
> > >
> > >         Thanks Sean,
> > >         To be clear, if you prefer to change the label on this issue
> > >         from bug to sth else, feel free to do so, no strong opinions
> > >         on my end. What happens to the classpath, whether spark uses
> > >         some classloader magic, is probably an implementation detail.
> > >         That said, it's definitely not intuitive that you can change
> > >         the configuration and get the context (with the updated
> > >         config) without any warnings/errors. Also what would you
> > >         recommend as a workaround or solution to this problem? Any
> > >         comments about the workaround in the issue? Keep in mind that
> > >         I can't restart the long running orchestration process (python
> > >         process if that matters).
> > >         Cheers - Rafal
> > >
> > >         On Wed, 9 Mar 2022 at 13:15, Sean Owen <sr...@gmail.com>
> wrote:
> > >
> > >             That isn't a bug - you can't change the classpath once the
> > >             JVM is executing.
> > >
> > >             On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła
> > >             <ra...@gmail.com> wrote:
> > >
> > >                 Hi,
> > >                 My use case is that, I have a long running process
> > >                 (orchestrator) with multiple tasks, some tasks might
> > >                 require extra spark dependencies. It seems once the
> > >                 spark context is started it's not possible to update
> > >                 `spark.jars.packages`? I have reported an issue at
> > >                 https://issues.apache.org/jira/browse/SPARK-38438,
> > >                 together with a workaround ("hard reset of the
> > >                 cluster"). I wonder if anyone has a solution for this?
> > >                 Cheers - Rafal
> > >
> >
>
>>
>

Reply via email to