This is indeed a JVM issue, not a Spark issue. You may want to ask
yourself why it is necessary to change the jar packages during runtime.
Changing package doesn't mean to reload the classes. There is no way to
reload the same class unless you customize the classloader of Spark. I
also don't think it is necessary to implement a warning or error message
when changing the configuration since it doesn't do any harm. Spark
uses lazy binding so you can do a lot of such "unharmful" things.
Developers will have to understand the behaviors of each API before when
using them..
On 3/9/22 9:31 AM, Rafał Wojdyła wrote:
Sean,
I understand you might be sceptical about adding this functionality
into (py)spark, I'm curious:
* would error/warning on update in configuration that is currently
effectively impossible (requires restart of JVM) be reasonable?
* what do you think about the workaround in the issue?
Cheers - Rafal
On Wed, 9 Mar 2022 at 14:24, Sean Owen <sro...@gmail.com> wrote:
Unfortunately this opens a lot more questions and problems than it
solves. What if you take something off the classpath, for example?
change a class?
On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła
<ravwojd...@gmail.com> wrote:
Thanks Sean,
To be clear, if you prefer to change the label on this issue
from bug to sth else, feel free to do so, no strong opinions
on my end. What happens to the classpath, whether spark uses
some classloader magic, is probably an implementation detail.
That said, it's definitely not intuitive that you can change
the configuration and get the context (with the updated
config) without any warnings/errors. Also what would you
recommend as a workaround or solution to this problem? Any
comments about the workaround in the issue? Keep in mind that
I can't restart the long running orchestration process (python
process if that matters).
Cheers - Rafal
On Wed, 9 Mar 2022 at 13:15, Sean Owen <sro...@gmail.com> wrote:
That isn't a bug - you can't change the classpath once the
JVM is executing.
On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła
<ravwojd...@gmail.com> wrote:
Hi,
My use case is that, I have a long running process
(orchestrator) with multiple tasks, some tasks might
require extra spark dependencies. It seems once the
spark context is started it's not possible to update
`spark.jars.packages`? I have reported an issue at
https://issues.apache.org/jira/browse/SPARK-38438,
together with a workaround ("hard reset of the
cluster"). I wonder if anyone has a solution for this?
Cheers - Rafal