Re: about spark assembly jar

Josh Rosen Tue, 02 Sep 2014 11:57:08 -0700

SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be 
easier to find): 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools



On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs....@gmail.com) wrote:

Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)  

Maybe we should add a "developer notes" page to document all these useful  
black magic.  


On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin <r...@databricks.com> wrote:  

> Having a SSD help tremendously with assembly time.  
>  
> Without that, you can do the following in order for Spark to pick up the  
> compiled classes before assembly at runtime.  
>  
> export SPARK_PREPEND_CLASSES=true  
>  
>  
> On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza <sandy.r...@cloudera.com>  
> wrote:  
>  
> > This doesn't help for every dependency, but Spark provides an option to  
> > build the assembly jar without Hadoop and its dependencies. We make use  
> of  
> > this in CDH packaging.  
> >  
> > -Sandy  
> >  
> >  
> > On Tue, Sep 2, 2014 at 2:12 AM, scwf <wangf...@huawei.com> wrote:  
> >  
> > > Hi sean owen,  
> > > here are some problems when i used assembly jar  
> > > 1 i put spark-assembly-*.jar to the lib directory of my application, it  
> > > throw compile error  
> > >  
> > > Error:scalac: Error: class scala.reflect.BeanInfo not found.  
> > > scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo  
> not  
> > > found.  
> > >  
> > > at scala.tools.nsc.symtab.Definitions$definitions$.  
> > > getModuleOrClass(Definitions.scala:655)  
> > >  
> > > at scala.tools.nsc.symtab.Definitions$definitions$.  
> > > getClass(Definitions.scala:608)  
> > >  
> > > at scala.tools.nsc.backend.jvm.GenJVM$BytecodeGenerator.<  
> > > init>(GenJVM.scala:127)  
> > >  
> > > at scala.tools.nsc.backend.jvm.GenJVM$JvmPhase.run(GenJVM.  
> > > scala:85)  
> > >  
> > > at scala.tools.nsc.Global$Run.compileSources(Global.scala:953)  
> > >  
> > > at scala.tools.nsc.Global$Run.compile(Global.scala:1041)  
> > >  
> > > at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)  
> > >  
> > > at  
> > xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:102)  
> > >  
> > > at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)  
> > >  
> > > at xsbt.CompilerInterface.run(CompilerInterface.scala:27)  
> > >  
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
> > >  
> > > at sun.reflect.NativeMethodAccessorImpl.invoke(  
> > > NativeMethodAccessorImpl.java:39)  
> > >  
> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(  
> > > DelegatingMethodAccessorImpl.java:25)  
> > >  
> > > at java.lang.reflect.Method.invoke(Method.java:597)  
> > >  
> > > at sbt.compiler.AnalyzingCompiler.call(  
> > > AnalyzingCompiler.scala:102)  
> > >  
> > > at sbt.compiler.AnalyzingCompiler.compile(  
> > > AnalyzingCompiler.scala:48)  
> > >  
> > > at sbt.compiler.AnalyzingCompiler.compile(  
> > > AnalyzingCompiler.scala:41)  
> > >  
> > > at org.jetbrains.jps.incremental.scala.local.  
> > > IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:28)  
> > >  
> > > at org.jetbrains.jps.incremental.scala.local.LocalServer.  
> > > compile(LocalServer.scala:25)  
> > >  
> > > at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.  
> > > scala:58)  
> > >  
> > > at org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(  
> > > Main.scala:21)  
> > >  
> > > at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(  
> > > Main.scala)  
> > >  
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
> > >  
> > > at sun.reflect.NativeMethodAccessorImpl.invoke(  
> > > NativeMethodAccessorImpl.java:39)  
> > >  
> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(  
> > > DelegatingMethodAccessorImpl.java:25)  
> > >  
> > > at java.lang.reflect.Method.invoke(Method.java:597)  
> > >  
> > > at  
> com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)  
> > > 2 i test my branch which updated hive version to org.apache.hive 0.13.1  
> > > it run successfully when use a bag of 3rd jars as dependency but  
> throw  
> > > error using assembly jar, it seems assembly jar lead to conflict  
> > > ERROR DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo  
> > > at org.apache.hadoop.hive.ql.io.parquet.serde.  
> > > ArrayWritableObjectInspector.getObjectInspector(  
> > > ArrayWritableObjectInspector.java:66)  
> > > at org.apache.hadoop.hive.ql.io.parquet.serde.  
> > >  
> ArrayWritableObjectInspector.<init>(ArrayWritableObjectInspector.java:59)  
> > > at org.apache.hadoop.hive.ql.io.parquet.serde.  
> > > ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)  
> > > at org.apache.hadoop.hive.metastore.MetaStoreUtils.  
> > > getDeserializer(MetaStoreUtils.java:339)  
> > > at org.apache.hadoop.hive.ql.metadata.Table.  
> > > getDeserializerFromMetaStore(Table.java:283)  
> > > at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(  
> > > Table.java:189)  
> > > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(  
> > > Hive.java:597)  
> > > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(  
> > > DDLTask.java:4194)  
> > > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.  
> > > java:281)  
> > > at  
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)  
> > > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(  
> > > TaskRunner.java:85)  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > On 2014/9/2 16:45, Sean Owen wrote:  
> > >  
> > >> Hm, are you suggesting that the Spark distribution be a bag of 100  
> > >> JARs? It doesn't quite seem reasonable. It does not remove version  
> > >> conflicts, just pushes them to run-time, which isn't good. The  
> > >> assembly is also necessary because that's where shading happens. In  
> > >> development, you want to run against exactly what will be used in a  
> > >> real Spark distro.  
> > >>  
> > >> On Tue, Sep 2, 2014 at 9:39 AM, scwf <wangf...@huawei.com> wrote:  
> > >>  
> > >>> hi, all  
> > >>> I suggest spark not use assembly jar as default run-time  
> > >>> dependency(spark-submit/spark-class depend on assembly jar),use a  
> > >>> library of  
> > >>> all 3rd dependency jar like hadoop/hive/hbase more reasonable.  
> > >>>  
> > >>> 1 assembly jar packaged all 3rd jars into a big one, so we need  
> > >>> rebuild  
> > >>> this jar if we want to update the version of some component(such as  
> > >>> hadoop)  
> > >>> 2 in our practice with spark, sometimes we meet jar compatibility  
> > >>> issue,  
> > >>> it is hard to diagnose compatibility issue with assembly jar  
> > >>>  
> > >>>  
> > >>>  
> > >>>  
> > >>>  
> > >>>  
> > >>>  
> > >>> ---------------------------------------------------------------------  
> > >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
> > >>> For additional commands, e-mail: dev-h...@spark.apache.org  
> > >>>  
> > >>>  
> > >>  
> > >>  
> > >  
> > >  
> > > ---------------------------------------------------------------------  
> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
> > > For additional commands, e-mail: dev-h...@spark.apache.org  
> > >  
> > >  
> >  
>

Re: about spark assembly jar

Reply via email to