[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178267#comment-15178267
 ] 

Marcelo Vanzin commented on SPARK-11157:
----------------------------------------

For the record: there's a potential issue with doing this that is not discussed 
in the document. Mainly, when we stop generating the big assembly file, we also 
stop doing relocation on all the dependencies of Spark. This means that 
Hadoop's guava will now leak into Spark's classpath.

Spark itself will still relocate its own guava (14) and will use it. It 
wouldn't work otherwise, since it doesn't work with version 11. But not 
applications will see guava 11 in their classpath.

In the end, I don't think this is too bad, for a couple of reasons:

- if you use the "hadoop-provided" package, or distributions that behave 
similar to it, that's already the case
- it's easy for applications that really need it to use a newer guava and shade 
it, if they're using maven

Given the original driver for shading guava was to allow Spark to be embedded 
into applications that needed a different version (namely Hive), that use case 
won't be affected.

> Allow Spark to be built without assemblies
> ------------------------------------------
>
>                 Key: SPARK-11157
>                 URL: https://issues.apache.org/jira/browse/SPARK-11157
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Spark Core, YARN
>            Reporter: Marcelo Vanzin
>         Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to