Re: Packaging a spark job using maven
Hi Eugen, Thanks for your help. I'm not familiar with the shaded plugin and i was wondering: does it replace the assembly plugin ? Also, do i have to specify all the artifacts and sub artifacts in the artifactSet ? Or can i just use a *:* wildcard and let the maven scopes do their work ? I have a lot of overlap warnings when i do so. Thanks for your help. Regards, Laurent -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p6024.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Packaging a spark job using maven
2014-05-19 10:35 GMT+02:00 Laurent T laurent.thou...@ldmobile.net: Hi Eugen, Thanks for your help. I'm not familiar with the shaded plugin and i was wondering: does it replace the assembly plugin ? Nope it doesn't replace it. It allows you to make fat jars and other nice things such as relocating classes to some other package. I am using it in combination with assembly and jdeb to build deployable archives (zip and debian). I find that building fat jars with shade plugin is more powerful and easier that with assembly. Also, do i have to specify all the artifacts and sub artifacts in the artifactSet ? Or can i just use a *:* wildcard and let the maven scopes do their work ? I have a lot of overlap warnings when i do so. Indeed you don't have to tell exactly what must be included, I do so, in order to have at the end a small archive that we can quickly deploy. Have a look at the doc you have some examples http://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html In short, you remove the includes and instead write the excludes (spark, hadoop, etc). The overlap is due to same classes being present in different jars. You can exclude those jars to remove the warnings. http://stackoverflow.com/questions/19987080/maven-shade-plugin-uber-jar-and-overlapping-classes http://stackoverflow.com/questions/11824633/maven-shade-plugin-warning-we-have-a-duplicate-how-to-fix Eugen Thanks for your help. Regards, Laurent -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p6024.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Packaging a spark job using maven
Laurent the problem is that the reference.conf that is embedded in akka jars is being overriden by some other conf. This happens when multiple files have the same name. I am using Spark with maven. In order to build the fat jar I use the shade plugin and it works pretty well. The trick here is to use an AppendingTransformer that will merge all the resource.conf into a single one. Try something like that: plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-shade-plugin/artifactId version2.1/version executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJarfalse/minimizeJar createDependencyReducedPomfalse/createDependencyReducedPom artifactSet includes !-- Include here the dependencies you want to be packed in your fat jar -- includemy.package.etc:*/include /includes /artifactSet filters filter artifact*:*/artifact excludes excludeMETA-INF/*.SF/exclude excludeMETA-INF/*.DSA/exclude excludeMETA-INF/*.RSA/exclude /excludes /filter /filters transformers transformer implementation=org.apache.maven.plugins.shade.resource.AppendingTransformer resourcereference.conf/resource /transformer /transformers /configuration /execution /executions /plugin 2014-05-14 15:37 GMT+02:00 Laurent T laurent.thou...@ldmobile.net: Hi, Thanks François but this didn't change much. I'm not even sure what this reference.conf is. It isn't mentioned in any of spark documentation. Should i have one in my resources ? Thanks Laurent -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p5707.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Packaging a spark job using maven
Hi, Thanks François but this didn't change much. I'm not even sure what this reference.conf is. It isn't mentioned in any of spark documentation. Should i have one in my resources ? Thanks Laurent -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p5707.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Packaging a spark job using maven
I have a similar objective to use maven as our build tool and ran into the same issue. The idea is that your config file is actually not found, your fat jar assembly does not contain the reference.conf resource. I added the following the resources section of my pom to make it work : resource directorysrc/main/resources/directory includes include*.conf/include /includes targetPath${project.build.directory}/classes/targetPath /resource I think Paul's gist also achieves a similar effect by specifying a proper appender in the shading conf. cheers François On Tue, May 13, 2014 at 4:09 AM, Laurent Thoulon laurent.thou...@ldmobile.net wrote: (I've never actually received my previous mail so i'm resending it. Sorry if it creates a duplicate.) Hi, I'm quite new to spark (and scala) but has anyone ever successfully compiled and run a spark job using java and maven ? Packaging seems to go fine but when i try to execute the job using mvn package java -Xmx4g -cp target/jobs-1.4.0.0-jar-with-dependencies.jar my.jobs.spark.TestJob I get the following error Exception in thread main com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155) at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197) at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:136) at akka.actor.ActorSystemImpl.init(ActorSystem.scala:470) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126) at org.apache.spark.SparkContext.init(SparkContext.scala:139) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:47) at my.jobs.spark.TestJob.run(TestJob.java:56) Here's the code right until line 56 SparkConf conf = new SparkConf() .setMaster(local[ + cpus + ]) .setAppName(this.getClass().getSimpleName()) .setSparkHome(/data/spark) .setJars(JavaSparkContext.jarOfClass(this.getClass())) .set(spark.default.parallelism, String.valueOf(cpus * 2)) .set(spark.executor.memory, 4g) .set(spark.storage.memoryFraction, 0.6) .set(spark.shuffle.memoryFraction, 0.3); JavaSparkContext sc = new JavaSparkContext(conf); Thanks Regards, Laurent -- François /fly Le Lay Data Infra Chapter Lead NYC +1 (646)-656-0075