Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/20669
  
    @vanzin I have a question regarding how this will interact with files that 
need to exist before the JVM starts.
    
    When we used the init-container approach, we were specifically fetching the 
user's files before the user's JVM launches. This meant that all the user's 
dependencies were present at JVM boot time. Now, however, we are having 
spark-submit do the localization. But the spark-submit JVM is the same JVM that 
runs the user's code here.
    
    Let's take a concrete hypothetical example where the user would like to 
load a YourKit agent binary into the driver container for debugging. The user 
may not want to build an entirely separate docker image for this, or perhaps 
they're porting over a debugging workflow from YARN mode where they used 
`--files` to distribute this binary that they hosted in some remote location. 
In YARN mode, adding the yourkit agent to `spark.files` works because 
SparkSubmit distributes the files via `spark.yarn.dist.files` and using 
HDFS-backed localization. The files are localized before the driver JVM starts, 
and the yourkit agent is loaded correctly. However, in Kubernetes mode, without 
an init-container, the yourkit agent binary will always be localized after the 
driver JVM starts, which is too late.
    
    In other words, without an init-container, it's impossible for applications 
to depend on localizing files that must be present _before_ the JVM even 
launches. This is mitigated by docker being essentially a built-in localization 
mechanism in and of itself, and the fact that one can use secrets/config maps 
as an alternate mounting mechanism. But I'm curious as to our thoughts for 
those porting over YARN applications with this specific use case. I'm also 
curious as to how Mesos handles this, since they use a similar scheme too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to