Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Jeroen Vlek
Hi,

I posted a question with regards to Phoenix and Spark Streaming on 
StackOverflow [1]. Please find a copy of the question to this email below the 
first stack trace. I also already contacted the Phoenix mailing list and tried 
the suggestion of setting spark.driver.userClassPathFirst. Unfortunately that 
only pushed me further into the dependency hell, which I tried to resolve 
until I hit a wall with an UnsatisfiedLinkError on Snappy.

What I am trying to achieve: To save a stream from Kafka into  Phoenix/Hbase 
via Spark Streaming. I'm using MapR as a platform and the original exception 
happens both on a 3-node cluster, as on the MapR Sandbox (a VM for 
experimentation), in YARN and stand-alone mode. Further experimentation (like 
the saveAsNewHadoopApiFile below), was done only on the sandbox in standalone 
mode.

Phoenix only supports Spark from 4.4.0 onwards, but I thought I could 
use a naive implementation that creates a new connection for 
every RDD from the DStream in 4.3.1.  This resulted in the 
ClassNotFoundException described in [1], so I switched to 4.4.0.

Unfortunately the saveToPhoenix method is only available in Scala. So I did 
find the suggestion to try it via the saveAsNewHadoopApiFile method [2] and an 
example implementation [3], which I adapted to my own needs. 

However, 4.4.0 + saveAsNewHadoopApiFile  raises the same 
ClassNotFoundExeption, just a slightly different stacktrace:

  java.lang.RuntimeException: java.sql.SQLException: ERROR 103 
(08004): Unable to establish connection.
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to 
establish connection.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
at 
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at 
java.sql.DriverManager.getConnection(DriverManager.java:571)
at 
java.sql.DriverManager.getConnection(DriverManager.java:187)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68)
at 
org.apache.phoenix.mapreduce.PhoenixRecordWriter.(PhoenixRecordWriter.java:49)
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55)
... 8 more
Caused by: java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:457)
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:350)
at 
org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
... 23 more
Caused by: java.lang.reflect.InvocationTargetException
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh,

Thank you for your effort. Looking at your code, I feel that mine is 
semantically the same, except written in Java. The dependencies in the pom.xml 
all have the scope provided. The job is submitted as follows:

$ rm spark.log && MASTER=spark://maprdemo:7077 
/opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars 
/home/mapr/projects/customer/lib/spark-streaming-
kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.jar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/customer/lib/metrics-
core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.0-
HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class 
nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector 
KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181 true

The spark-defaults.conf is reverted back to its defaults (i.e. no 
userClassPathFirst). In the catch-block of the Phoenix connection buildup  the 
class path is printed by recursively iterating over the class loaders. The 
first one already prints the phoenix-client jar [1]. It's also very unlikely to 
be a bug in Spark or Phoenix, if your proof-of-concept just works.

So if the JAR that contains the offending class is known by the class loader, 
then that might indicate that there's a second JAR providing the same class 
but with a different version, right? 
Yet, the only Phoenix JAR on the whole class path hierarchy is the 
aforementioned phoenix-client JAR. Furthermore, I googled the class in 
question, ClientRpcControllerFactory, and it really only exists in the Phoenix 
project. We're not talking about some low-level AOP Alliance stuff here ;)

Maybe I'm missing some fundamental class loading knowledge, in that case I'd 
be very happy to be enlightened. This all seems very strange.

Cheers,
Jeroen

[1]  [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
streaming-kafka_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.10-0.8.1.1.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-0.3.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4.4.0-
HBase-0.98-client.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
sql_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-3.1.0.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStreamConsumer.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-2.2.0.jar]


On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> This may or may not be helpful for your classpath issues, but I wanted to
> verify that basic functionality worked, so I made a sample app here:
> 
> https://github.com/jmahonin/spark-streaming-phoenix
> 
> This consumes events off a Kafka topic using spark streaming, and writes
> out event counts to Phoenix using the new phoenix-spark functionality:
> http://phoenix.apache.org/phoenix_spark.html
> 
> It's definitely overkill, and would probably be more efficient to use the
> JDBC driver directly, but it serves as a proof-of-concept.
> 
> I've only tested this in local mode. To convert it to a full jobs JAR, I
> suspect that keeping all of the spark and phoenix dependencies marked as
> 'provided', and including the Phoenix client JAR in the Spark classpath
> would work as well.
> 
> Good luck,
> 
> Josh
> 
> On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek  wrote:
> > Hi,
> > 
> > I posted a question with regards to Phoenix and Spark Streaming on
> > StackOverflow [1]. Please find a copy of the question to this email below
> > the
> > first stack trace. I also already contacted the Phoenix mailing list and
> > tried
> > the suggestion of setting spark.driver.userClassPathFirst. Unfortunately
> > that
> > only pushed me further into the dependency hell, which I tried to resolve
> > until I hit a wall with an UnsatisfiedLinkError on Snappy.
> > 
> > What I am trying to achieve: To save a stream from Kafka into
> > Phoenix/Hbase
> > via Spark Streaming. I'm using MapR as a platform and the original
> > exception
> > happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
> > experimentation), in YARN and stand-alone mode. Further experimentation
> > (like
> > the saveAsNewHadoopApiFile below), was done only on the sandbox in
> > standalone
> > mode.
> > 
> > Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
> > use a naive implementation that creates a new connection for
> > every RDD from the DStream in 4.3.1.  This resulted in the
> > Class

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh,

That worked! Thank you so much! (I can't believe it was something so obvious 
;) )

If you care about such a thing you could answer my question here for bounty: 
http://stackoverflow.com/questions/30639659/apache-phoenix-4-3-1-and-4-4-0-hbase-0-98-on-spark-1-3-1-classnotfoundexceptio

Have a great day!

Cheers,
Jeroen

On Wednesday 10 June 2015 08:58:02 Josh Mahonin wrote:
> Hi Jeroen,
> 
> Rather than bundle the Phoenix client JAR with your app, are you able to
> include it in a static location either in the SPARK_CLASSPATH, or set the
> conf values below (I use SPARK_CLASSPATH myself, though it's deprecated):
> 
>   spark.driver.extraClassPath
>   spark.executor.extraClassPath
> 
> Josh
> 
> On Wed, Jun 10, 2015 at 4:11 AM, Jeroen Vlek  wrote:
> > Hi Josh,
> > 
> > Thank you for your effort. Looking at your code, I feel that mine is
> > semantically the same, except written in Java. The dependencies in the
> > pom.xml
> > all have the scope provided. The job is submitted as follows:
> > 
> > $ rm spark.log && MASTER=spark://maprdemo:7077
> > /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars
> > /home/mapr/projects/customer/lib/spark-streaming-
> > 
> > kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.j
> > ar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/c
> > ustomer/lib/metrics-
> > core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
> > 
> > core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.
> > 0- HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class
> > nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector
> > KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181
> > true
> > 
> > The spark-defaults.conf is reverted back to its defaults (i.e. no
> > userClassPathFirst). In the catch-block of the Phoenix connection buildup
> > the
> > class path is printed by recursively iterating over the class loaders. The
> > first one already prints the phoenix-client jar [1]. It's also very
> > unlikely to
> > be a bug in Spark or Phoenix, if your proof-of-concept just works.
> > 
> > So if the JAR that contains the offending class is known by the class
> > loader,
> > then that might indicate that there's a second JAR providing the same
> > class
> > but with a different version, right?
> > Yet, the only Phoenix JAR on the whole class path hierarchy is the
> > aforementioned phoenix-client JAR. Furthermore, I googled the class in
> > question, ClientRpcControllerFactory, and it really only exists in the
> > Phoenix
> > project. We're not talking about some low-level AOP Alliance stuff here ;)
> > 
> > Maybe I'm missing some fundamental class loading knowledge, in that case
> > I'd
> > be very happy to be enlightened. This all seems very strange.
> > 
> > Cheers,
> > Jeroen
> > 
> > [1]
> > [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > streaming-kafka_2.10-1.3.1.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.1
> > 0-0.8.1.1.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-
> > 0.3.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4
> > .4.0- HBase-0.98-client.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > sql_2.10-1.3.1.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > core-3.1.0.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStre
> > amConsumer.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > core-2.2.0.jar]
> > 
> > On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> > > This may or may not be helpful for your classpath issues, but I wanted
> > > to
> > > verify that basic functionality worked, so I made a sample app here:
> > > 
> > > https://github.com/jmahonin/spark-streaming-phoenix
> > > 
> > > This consumes events off a Kafka topic using spark streaming, and writes
> > > out event counts to Phoenix using the new phoenix-spark functionality:
> > > http://phoenix.apache.org/phoenix_spark.html
> > > 
> > > It's definitely overkill, and would probably be more efficient to use
> > > the
> > > JDBC driver directly, but it serves as a proof-of-concept.
> > > 
> > >