Hi all,
I am getting an exception when trying to execute a Spark Job that is using
the new Phoenix 4.5 spark connector. The application works very well in my
local machine, but fails to run in a cluster environment on top of yarn.

The cluster is a Cloudera CDH 5.4.4 with HBase 1.0.0 and Phoenix 4.5
(phoenix is installed correctly as sqlline works without errors).

In the pom.xml, only the spark-core jar (version 1.3.0-cdh5.4.4) has scope
"provided", while all other jars has been copied by maven into the
/myapp/lib folder. I include all the dependent libs using the option
"--jar" in the spark-submit command (among these libraries, there is the
phoenix-core-xx.jar, which contains the class PhoenixOutputFormat).

This is the command:
spark-submit --class my.JobRunner \
--master yarn --deploy-mode client \
--jars `ls -dm /myapp/lib/* | tr -d ' \r\n'` \
/myapp/mainjar.jar
The /myapp/lib folders contains the phoenix core lib, which contains class
org.apache.phoenix.mapreduce.PhoenixOutputFormat. But it seems that the
driver/executor cannot see it.

And I get an exception when I try to save to Phoenix an RDD:

Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.phoenix.mapreduce.PhoenixOutputFormat not found
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
    at
org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:232)
    at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:971)
    at
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:903)
    at
org.apache.phoenix.spark.ProductRDDFunctions.saveToPhoenix(ProductRDDFunctions.scala:51)
    at com.mypackage.save(DAOImpl.scala:41)
    at com.mypackage.ProtoStreamingJob.execute(ProtoStreamingJob.scala:58)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.mypackage.SparkApplication.sparkRun(SparkApplication.scala:95)
    at
com.mypackage.SparkApplication$delayedInit$body.apply(SparkApplication.scala:112)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at com.mypackage.SparkApplication.main(SparkApplication.scala:15)
    at com.mypackage.ProtoStreamingJobRunner.main(ProtoStreamingJob.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class
org.apache.phoenix.mapreduce.PhoenixOutputFormat not found
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
    ... 30 more


The phoenix-core-xxx.jar is included in the classpath. I am sure it is in
the classpath because I tried to instantiate an object of class
PhoenixOutputFormat directly in the main class and it worked.

The problem is that the method
"org.apache.hadoop.conf.Configuration.getClassByName" cannot find it.
Since I am using "client" deploy mode, the exception should have been
thrown by the driver in the local machine.

How can this happen?

Reply via email to