Re: Packaging a spark job using maven

2014-05-19 Thread Laurent T
Hi Eugen,

Thanks for your help. I'm not familiar with the shaded plugin and i was
wondering: does it replace the assembly plugin ? Also, do i have to specify
all the artifacts and sub artifacts in the artifactSet ? Or can i just use a
*:* wildcard and let the maven scopes do their work ? I have a lot of
overlap warnings when i do so.

Thanks for your help.
Regards,
Laurent



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p6024.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Packaging a spark job using maven

2014-05-19 Thread Eugen Cepoi
2014-05-19 10:35 GMT+02:00 Laurent T laurent.thou...@ldmobile.net:

 Hi Eugen,

 Thanks for your help. I'm not familiar with the shaded plugin and i was
 wondering: does it replace the assembly plugin ?


Nope it doesn't replace it. It allows you to make fat jars and other nice
things such as relocating classes to some other package.

I am using it in combination with assembly and jdeb to build deployable
archives (zip and debian). I find that building fat jars with shade plugin
is more powerful and easier that with assembly.


 Also, do i have to specify
 all the artifacts and sub artifacts in the artifactSet ? Or can i just use
 a
 *:* wildcard and let the maven scopes do their work ? I have a lot of
 overlap warnings when i do so.


Indeed you don't have to tell exactly what must be included, I do so, in
order to have at the end a small archive that we can quickly deploy. Have a
look at the doc you have some examples
http://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html

In short, you remove the includes and instead write the excludes (spark,
hadoop, etc). The overlap is due to same classes being present in different
jars. You can exclude those jars to remove the warnings.

http://stackoverflow.com/questions/19987080/maven-shade-plugin-uber-jar-and-overlapping-classes
http://stackoverflow.com/questions/11824633/maven-shade-plugin-warning-we-have-a-duplicate-how-to-fix

Eugen




 Thanks for your help.
 Regards,
 Laurent



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p6024.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Packaging a spark job using maven

2014-05-16 Thread Eugen Cepoi
Laurent the problem is that the reference.conf that is embedded in akka
jars is being overriden by some other conf. This happens when multiple
files have the same name.
I am using Spark with maven. In order to build the fat jar I use the shade
plugin and it works pretty well. The trick here is to use an
AppendingTransformer that will merge all the resource.conf into a single
one.

Try something like that:

plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-shade-plugin/artifactId
version2.1/version
executions
execution
phasepackage/phase
goals
goalshade/goal
/goals
configuration
minimizeJarfalse/minimizeJar

createDependencyReducedPomfalse/createDependencyReducedPom
artifactSet
includes
!-- Include here the dependencies you
want to be packed in your fat jar --
includemy.package.etc:*/include
/includes
/artifactSet
filters
filter
artifact*:*/artifact
excludes
excludeMETA-INF/*.SF/exclude
excludeMETA-INF/*.DSA/exclude
excludeMETA-INF/*.RSA/exclude
/excludes
/filter
/filters
transformers
transformer
implementation=org.apache.maven.plugins.shade.resource.AppendingTransformer
resourcereference.conf/resource
/transformer
/transformers
/configuration
/execution
/executions
/plugin


2014-05-14 15:37 GMT+02:00 Laurent T laurent.thou...@ldmobile.net:

 Hi,

 Thanks François but this didn't change much. I'm not even sure what this
 reference.conf is. It isn't mentioned in any of spark documentation. Should
 i have one in my resources ?

 Thanks
 Laurent



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p5707.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Packaging a spark job using maven

2014-05-14 Thread Laurent T
Hi,

Thanks François but this didn't change much. I'm not even sure what this
reference.conf is. It isn't mentioned in any of spark documentation. Should
i have one in my resources ?

Thanks
Laurent



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-tp5615p5707.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Packaging a spark job using maven

2014-05-14 Thread François Le Lay
I have a similar objective to use maven as our build tool and ran into the
same issue.
The idea is that your config file is actually not found, your fat jar
assembly does not contain the reference.conf resource.

I added the following the resources section of my pom to make it work :
resource
  directorysrc/main/resources/directory
includes
  include*.conf/include
/includes
  targetPath${project.build.directory}/classes/targetPath
/resource

I think Paul's gist also achieves a similar effect by specifying a proper
appender in the shading conf.

cheers
François







On Tue, May 13, 2014 at 4:09 AM, Laurent Thoulon 
laurent.thou...@ldmobile.net wrote:

 (I've never actually received my previous mail so i'm resending it. Sorry
 if it creates a duplicate.)


 Hi,

 I'm quite new to spark (and scala) but has anyone ever successfully
 compiled and run a spark job using java and maven ?
 Packaging seems to go fine but when i try to execute the job using

 mvn package
 java -Xmx4g -cp target/jobs-1.4.0.0-jar-with-dependencies.jar
 my.jobs.spark.TestJob

 I get the following error
 Exception in thread main com.typesafe.config.ConfigException$Missing: No
 configuration setting found for key 'akka.version'
 at
 com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
 at
 com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
 at
 com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
 at
 com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
 at
 com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
 at
 com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
 at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:136)
 at akka.actor.ActorSystemImpl.init(ActorSystem.scala:470)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
 at
 org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
 at org.apache.spark.SparkContext.init(SparkContext.scala:139)
 at
 org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:47)
 at my.jobs.spark.TestJob.run(TestJob.java:56)


 Here's the code right until line 56

 SparkConf conf = new SparkConf()
 .setMaster(local[ + cpus + ])
 .setAppName(this.getClass().getSimpleName())
 .setSparkHome(/data/spark)
 .setJars(JavaSparkContext.jarOfClass(this.getClass()))
 .set(spark.default.parallelism, String.valueOf(cpus * 2))
 .set(spark.executor.memory, 4g)
 .set(spark.storage.memoryFraction, 0.6)
 .set(spark.shuffle.memoryFraction, 0.3);
 JavaSparkContext sc = new JavaSparkContext(conf);

 Thanks
 Regards,
 Laurent




-- 
François /fly Le Lay
Data Infra Chapter Lead NYC
+1 (646)-656-0075