[ https://issues.apache.org/jira/browse/SPARK-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567752#comment-14567752 ]
Marcelo Vanzin commented on SPARK-4048: --------------------------------------- That is not a regression. The whole point of "hadoop-provided" is that *you* have to provide the needed jars. So if a jar is missing, you are failing to provide them. > Enhance and extend hadoop-provided profile > ------------------------------------------ > > Key: SPARK-4048 > URL: https://issues.apache.org/jira/browse/SPARK-4048 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 1.2.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Fix For: 1.3.0 > > > The hadoop-provided profile is used to not package Hadoop dependencies inside > the Spark assembly. It works, sort of, but it could use some enhancements. A > quick list: > - It doesn't include all things that could be removed from the assembly > - It doesn't work well when you're publishing artifacts based on it > (SPARK-3812 fixes this) > - There are other dependencies that could use similar treatment: Hive, HBase > (for the examples), Flume, Parquet, maybe others I'm missing at the moment. > - Unit tests, more specifically, those that use local-cluster mode, do not > work when the assembly is built with this profile enabled. > - The scripts to launch Spark jobs do not add needed "provided" jars to the > classpath when this profile is enabled, leaving it for people to figure that > out for themselves. > - The examples assembly duplicates a lot of things in the main assembly. > Part of this task is selfish since we build internally with this profile and > we'd like to make it easier for us to merge changes without having to keep > too many patches on top of upstream. But those feel like good improvements to > me, regardless. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org