Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing serious issue. I successfully updated spark thrift server from 1.5.2 to 1.6.0. But I have standalone application, which worked fine with 1.5.2 but failing on 1.6.0 with:
*NestedThrowables:* *java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory* * at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)* * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)* * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)* Inside this application I work with hive table, which have data in json format. When I add <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-core</artifactId> <version>4.0.0-release</version> </dependency> <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-api-jdo</artifactId> <version>4.0.0-release</version> </dependency> <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-rdbms</artifactId> <version>3.2.9</version> </dependency> I'm getting: *Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.* * at org.datanucleus.AbstractNucleusContext.<init>(AbstractNucleusContext.java:102)* * at org.datanucleus.PersistenceNucleusContextImpl.<init>(PersistenceNucleusContextImpl.java:162)* I have CDH 5.5. I build spark with *./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.5.0 -Phive -DskipTests* Than I publish fat jar locally: *mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file -Dfile=./spark-assembly.jar -DgroupId=org.spark-project -DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar* Than I include dependency on this fat jar: <dependency> <groupId>org.spark-project</groupId> <artifactId>my-spark-assembly</artifactId> <version>1.6.0-SNAPSHOT</version> </dependency> Than I build my application with assembly plugin: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <configuration> <artifactSet> <includes> <include>*:*</include> </includes> </artifactSet> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer"> <resource>log4j.properties</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer"/> </transformers> </configuration> </execution> </executions> </plugin> Configuration of assembly plugin is copy-past from spark assembly pom. This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good approach of creating this standalone application, please recommend other approach, but spark-submit does not work for me - it hard for me to connect it to Oozie. Any suggestion would be appreciated - I'm stuck. My spark config: lazy val sparkConf = new SparkConf() .setMaster("yarn-client") .setAppName(appName) .set("spark.yarn.queue", "jenkins") .set("spark.executor.memory", "10g") .set("spark.yarn.executor.memoryOverhead", "2000") .set("spark.executor.cores", "3") .set("spark.driver.memory", "4g") .set("spark.shuffle.io.numConnectionsPerPeer", "5") .set("spark.sql.autoBroadcastJoinThreshold", "200483647") .set("spark.network.timeout", "1000s") .set("spark.executor.extraJavaOptions", "-XX:MaxPermSize=2g") .set("spark.driver.maxResultSize", "2g") .set("spark.rpc.lookupTimeout", "1000s") .set("spark.sql.hive.convertMetastoreParquet", "false") .set("spark.kryoserializer.buffer.max", "200m") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.yarn.driver.memoryOverhead", "1000") .set("spark.dynamicAllocation.enabled", "true") .set("spark.shuffle.service.enabled", "true") .set("spark.dynamicAllocation.minExecutors", "1") .set("spark.dynamicAllocation.maxExecutors", "20") .set("spark.dynamicAllocation.executorIdleTimeout", "60s") .set("spark.sql.tungsten.enabled", "false") .set("spark.dynamicAllocation.cachedExecutorIdleTimeout", "100s") .setJars(List(this.getClass.getProtectionDomain().getCodeSource().getLocation().toURI().getPath())) -- *Sincerely yoursEgor Pakhomov*