So to answer my own question. It is a bug and there is unmerged PR for that 
already.

https://issues.apache.org/jira/browse/SPARK-2624
https://github.com/apache/spark/pull/3238

Jakub


---------- Původní zpráva ----------
Od: spark.dubovsky.ja...@seznam.cz
Komu: spark.dubovsky.ja...@seznam.cz
Datum: 12. 12. 2014 15:26:35
Předmět: Re: Including data nucleus tools

"
Hi,

  I had time to try it again. I submited my app by the same command with 
these additional options:

  --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-core-3.2.10.jar,
lib/datanucleus-rdbms-3.2.9.jar

  Now an app successfully creates hive context. So my question remains: Is 
"classpath entries" from sparkUI the same classpath as mentioned in submit 
script message?

"Spark assembly has been built with Hive, including Datanucleus jars on 
classpath"

  If so then why the script fails to really include datanucleus jars on 
classpath? I found no bug about this on jira. Or is there a way how 
particular yarn/os settings on our cluster overrides this?

  Thanks in advance

  Jakub


---------- Původní zpráva ----------
Od: spark.dubovsky.ja...@seznam.cz
Komu: Michael Armbrust <mich...@databricks.com>
Datum: 7. 12. 2014 3:02:33
Předmět: Re: Including data nucleus tools

"
Next try. I copied whole dist directory created by make-distribution script 
to cluster not just assembly jar. Then I used

./bin/spark-submit --num-executors 200 --master yarn-cluster --class org.
apache.spark.mllib.CreateGuidDomainDictionary ../spark/root-0.1.jar ${args}

 ...to run app again. Startup scripts printed this message:

"Spark assembly has been built with Hive, including Datanucleus jars on 
classpath"

  ...so I thought I am finally there. But job started and failed on the same
ClassNotFound exception as before. Is "classpath" from script message just 
classpath of driver? Or is it the same classpath which is affected by --jars
option? I was trying to find out from scripts but I was not able to find 
where --jars option is processed.

  thanks


---------- Původní zpráva ----------
Od: Michael Armbrust <mich...@databricks.com>
Komu: spark.dubovsky.ja...@seznam.cz
Datum: 6. 12. 2014 20:39:13
Předmět: Re: Including data nucleus tools

"



On Sat, Dec 6, 2014 at 5:53 AM, <spark.dubovsky.ja...@seznam.cz
(mailto:/skin/default/img/empty.gif)> wrote:"
Bonus question: Should the class org.datanucleus.api.jdo.
JDOPersistenceManagerFactory be part of assembly? Because it is not in jar 
now.

"



No these jars cannot be put into the assembly because they have extra 
metadata files that live in the same location (so if you put them all in an 
assembly they overrwrite each other).  This metadata is used in discovery.  
Instead they must be manually put on the classpath in their original form 
(usually using --jars). 



 
"
"
"

Reply via email to