Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Jeroen Vlek
Hi,

I posted a question with regards to Phoenix and Spark Streaming on 
StackOverflow [1]. Please find a copy of the question to this email below the 
first stack trace. I also already contacted the Phoenix mailing list and tried 
the suggestion of setting spark.driver.userClassPathFirst. Unfortunately that 
only pushed me further into the dependency hell, which I tried to resolve 
until I hit a wall with an UnsatisfiedLinkError on Snappy.

What I am trying to achieve: To save a stream from Kafka into  Phoenix/Hbase 
via Spark Streaming. I'm using MapR as a platform and the original exception 
happens both on a 3-node cluster, as on the MapR Sandbox (a VM for 
experimentation), in YARN and stand-alone mode. Further experimentation (like 
the saveAsNewHadoopApiFile below), was done only on the sandbox in standalone 
mode.

Phoenix only supports Spark from 4.4.0 onwards, but I thought I could 
use a naive implementation that creates a new connection for 
every RDD from the DStream in 4.3.1.  This resulted in the 
ClassNotFoundException described in [1], so I switched to 4.4.0.

Unfortunately the saveToPhoenix method is only available in Scala. So I did 
find the suggestion to try it via the saveAsNewHadoopApiFile method [2] and an 
example implementation [3], which I adapted to my own needs. 

However, 4.4.0 + saveAsNewHadoopApiFile  raises the same 
ClassNotFoundExeption, just a slightly different stacktrace:

  java.lang.RuntimeException: java.sql.SQLException: ERROR 103 
(08004): Unable to establish connection.
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to 
establish connection.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
at 
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at 
java.sql.DriverManager.getConnection(DriverManager.java:571)
at 
java.sql.DriverManager.getConnection(DriverManager.java:187)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68)
at 
org.apache.phoenix.mapreduce.PhoenixRecordWriter.(PhoenixRecordWriter.java:49)
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55)
... 8 more
Caused by: java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:457)
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:350)
at 
org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
... 23 more
Caused by: java.lang.reflect.InvocationTargetException
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Josh Mahonin
This may or may not be helpful for your classpath issues, but I wanted to
verify that basic functionality worked, so I made a sample app here:

https://github.com/jmahonin/spark-streaming-phoenix

This consumes events off a Kafka topic using spark streaming, and writes
out event counts to Phoenix using the new phoenix-spark functionality:
http://phoenix.apache.org/phoenix_spark.html

It's definitely overkill, and would probably be more efficient to use the
JDBC driver directly, but it serves as a proof-of-concept.

I've only tested this in local mode. To convert it to a full jobs JAR, I
suspect that keeping all of the spark and phoenix dependencies marked as
'provided', and including the Phoenix client JAR in the Spark classpath
would work as well.

Good luck,

Josh

On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek  wrote:

> Hi,
>
> I posted a question with regards to Phoenix and Spark Streaming on
> StackOverflow [1]. Please find a copy of the question to this email below
> the
> first stack trace. I also already contacted the Phoenix mailing list and
> tried
> the suggestion of setting spark.driver.userClassPathFirst. Unfortunately
> that
> only pushed me further into the dependency hell, which I tried to resolve
> until I hit a wall with an UnsatisfiedLinkError on Snappy.
>
> What I am trying to achieve: To save a stream from Kafka into
> Phoenix/Hbase
> via Spark Streaming. I'm using MapR as a platform and the original
> exception
> happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
> experimentation), in YARN and stand-alone mode. Further experimentation
> (like
> the saveAsNewHadoopApiFile below), was done only on the sandbox in
> standalone
> mode.
>
> Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
> use a naive implementation that creates a new connection for
> every RDD from the DStream in 4.3.1.  This resulted in the
> ClassNotFoundException described in [1], so I switched to 4.4.0.
>
> Unfortunately the saveToPhoenix method is only available in Scala. So I did
> find the suggestion to try it via the saveAsNewHadoopApiFile method [2]
> and an
> example implementation [3], which I adapted to my own needs.
>
> However, 4.4.0 + saveAsNewHadoopApiFile  raises the same
> ClassNotFoundExeption, just a slightly different stacktrace:
>
>   java.lang.RuntimeException: java.sql.SQLException: ERROR 103
> (08004): Unable to establish connection.
> at
>
> org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
> at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995)
> at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to
> establish connection.
> at
>
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
> at
>
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at
>
> org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
> at
>
> org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
> at
>
> org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
> at
>
> org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
> at
>
> org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
> at
>
> org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
> at
>
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
> at
>
> org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
> at
> java.sql.DriverManager.getConnection(DriverManager.java:571)
> at
> java.sql.DriverManager.getConnection(DriverManager.java:187)
> at
>
> org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92)
> at
>
> org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80)
> at
>
> org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68)
> at
>
> org.

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh,

Thank you for your effort. Looking at your code, I feel that mine is 
semantically the same, except written in Java. The dependencies in the pom.xml 
all have the scope provided. The job is submitted as follows:

$ rm spark.log && MASTER=spark://maprdemo:7077 
/opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars 
/home/mapr/projects/customer/lib/spark-streaming-
kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.jar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/customer/lib/metrics-
core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.0-
HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class 
nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector 
KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181 true

The spark-defaults.conf is reverted back to its defaults (i.e. no 
userClassPathFirst). In the catch-block of the Phoenix connection buildup  the 
class path is printed by recursively iterating over the class loaders. The 
first one already prints the phoenix-client jar [1]. It's also very unlikely to 
be a bug in Spark or Phoenix, if your proof-of-concept just works.

So if the JAR that contains the offending class is known by the class loader, 
then that might indicate that there's a second JAR providing the same class 
but with a different version, right? 
Yet, the only Phoenix JAR on the whole class path hierarchy is the 
aforementioned phoenix-client JAR. Furthermore, I googled the class in 
question, ClientRpcControllerFactory, and it really only exists in the Phoenix 
project. We're not talking about some low-level AOP Alliance stuff here ;)

Maybe I'm missing some fundamental class loading knowledge, in that case I'd 
be very happy to be enlightened. This all seems very strange.

Cheers,
Jeroen

[1]  [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
streaming-kafka_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.10-0.8.1.1.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-0.3.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4.4.0-
HBase-0.98-client.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
sql_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-3.1.0.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStreamConsumer.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-2.2.0.jar]


On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> This may or may not be helpful for your classpath issues, but I wanted to
> verify that basic functionality worked, so I made a sample app here:
> 
> https://github.com/jmahonin/spark-streaming-phoenix
> 
> This consumes events off a Kafka topic using spark streaming, and writes
> out event counts to Phoenix using the new phoenix-spark functionality:
> http://phoenix.apache.org/phoenix_spark.html
> 
> It's definitely overkill, and would probably be more efficient to use the
> JDBC driver directly, but it serves as a proof-of-concept.
> 
> I've only tested this in local mode. To convert it to a full jobs JAR, I
> suspect that keeping all of the spark and phoenix dependencies marked as
> 'provided', and including the Phoenix client JAR in the Spark classpath
> would work as well.
> 
> Good luck,
> 
> Josh
> 
> On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek  wrote:
> > Hi,
> > 
> > I posted a question with regards to Phoenix and Spark Streaming on
> > StackOverflow [1]. Please find a copy of the question to this email below
> > the
> > first stack trace. I also already contacted the Phoenix mailing list and
> > tried
> > the suggestion of setting spark.driver.userClassPathFirst. Unfortunately
> > that
> > only pushed me further into the dependency hell, which I tried to resolve
> > until I hit a wall with an UnsatisfiedLinkError on Snappy.
> > 
> > What I am trying to achieve: To save a stream from Kafka into
> > Phoenix/Hbase
> > via Spark Streaming. I'm using MapR as a platform and the original
> > exception
> > happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
> > experimentation), in YARN and stand-alone mode. Further experimentation
> > (like
> > the saveAsNewHadoopApiFile below), was done only on the sandbox in
> > standalone
> > mode.
> > 
> > Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
> > use a naive implementation that creates a new connection for
> > every RDD from the DStream in 4.3.1.  This resulted in the
> > ClassNotFoundException described in [1], so I switched to 4.4.0.
> > 
> > Unfortunately the saveToPhoenix method is only available in Scala. So I
> > did
> > find the suggestion to try it via the saveAsNewHadoopApiFile method [2]
> > and an
> > example implementation [3], which I adapted to my

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Josh Mahonin
Hi Jeroen,

Rather than bundle the Phoenix client JAR with your app, are you able to
include it in a static location either in the SPARK_CLASSPATH, or set the
conf values below (I use SPARK_CLASSPATH myself, though it's deprecated):

  spark.driver.extraClassPath
  spark.executor.extraClassPath

Josh

On Wed, Jun 10, 2015 at 4:11 AM, Jeroen Vlek  wrote:

> Hi Josh,
>
> Thank you for your effort. Looking at your code, I feel that mine is
> semantically the same, except written in Java. The dependencies in the
> pom.xml
> all have the scope provided. The job is submitted as follows:
>
> $ rm spark.log && MASTER=spark://maprdemo:7077
> /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars
> /home/mapr/projects/customer/lib/spark-streaming-
>
> kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.jar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/customer/lib/metrics-
> core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
>
> core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.0-
> HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class
> nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector
> KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181
> true
>
> The spark-defaults.conf is reverted back to its defaults (i.e. no
> userClassPathFirst). In the catch-block of the Phoenix connection buildup
> the
> class path is printed by recursively iterating over the class loaders. The
> first one already prints the phoenix-client jar [1]. It's also very
> unlikely to
> be a bug in Spark or Phoenix, if your proof-of-concept just works.
>
> So if the JAR that contains the offending class is known by the class
> loader,
> then that might indicate that there's a second JAR providing the same class
> but with a different version, right?
> Yet, the only Phoenix JAR on the whole class path hierarchy is the
> aforementioned phoenix-client JAR. Furthermore, I googled the class in
> question, ClientRpcControllerFactory, and it really only exists in the
> Phoenix
> project. We're not talking about some low-level AOP Alliance stuff here ;)
>
> Maybe I'm missing some fundamental class loading knowledge, in that case
> I'd
> be very happy to be enlightened. This all seems very strange.
>
> Cheers,
> Jeroen
>
> [1]
> [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> streaming-kafka_2.10-1.3.1.jar,
>
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.10-0.8.1.1.jar,
>
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-0.3.jar,
>
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4.4.0-
> HBase-0.98-client.jar,
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> sql_2.10-1.3.1.jar,
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> core-3.1.0.jar,
>
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStreamConsumer.jar,
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> core-2.2.0.jar]
>
>
> On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> > This may or may not be helpful for your classpath issues, but I wanted to
> > verify that basic functionality worked, so I made a sample app here:
> >
> > https://github.com/jmahonin/spark-streaming-phoenix
> >
> > This consumes events off a Kafka topic using spark streaming, and writes
> > out event counts to Phoenix using the new phoenix-spark functionality:
> > http://phoenix.apache.org/phoenix_spark.html
> >
> > It's definitely overkill, and would probably be more efficient to use the
> > JDBC driver directly, but it serves as a proof-of-concept.
> >
> > I've only tested this in local mode. To convert it to a full jobs JAR, I
> > suspect that keeping all of the spark and phoenix dependencies marked as
> > 'provided', and including the Phoenix client JAR in the Spark classpath
> > would work as well.
> >
> > Good luck,
> >
> > Josh
> >
> > On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek  wrote:
> > > Hi,
> > >
> > > I posted a question with regards to Phoenix and Spark Streaming on
> > > StackOverflow [1]. Please find a copy of the question to this email
> below
> > > the
> > > first stack trace. I also already contacted the Phoenix mailing list
> and
> > > tried
> > > the suggestion of setting spark.driver.userClassPathFirst.
> Unfortunately
> > > that
> > > only pushed me further into the dependency hell, which I tried to
> resolve
> > > until I hit a wall with an UnsatisfiedLinkError on Snappy.
> > >
> > > What I am trying to achieve: To save a stream from Kafka into
> > > Phoenix/Hbase
> > > via Spark Streaming. I'm using MapR as a platform and the original
> > > exception
> > > happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
> > > experimentation), in YARN and stand-alone mode. Further experimentation
> > > (like
> > > the saveAsNewHadoopApiFile below), was done only on the sandbox

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh,

That worked! Thank you so much! (I can't believe it was something so obvious 
;) )

If you care about such a thing you could answer my question here for bounty: 
http://stackoverflow.com/questions/30639659/apache-phoenix-4-3-1-and-4-4-0-hbase-0-98-on-spark-1-3-1-classnotfoundexceptio

Have a great day!

Cheers,
Jeroen

On Wednesday 10 June 2015 08:58:02 Josh Mahonin wrote:
> Hi Jeroen,
> 
> Rather than bundle the Phoenix client JAR with your app, are you able to
> include it in a static location either in the SPARK_CLASSPATH, or set the
> conf values below (I use SPARK_CLASSPATH myself, though it's deprecated):
> 
>   spark.driver.extraClassPath
>   spark.executor.extraClassPath
> 
> Josh
> 
> On Wed, Jun 10, 2015 at 4:11 AM, Jeroen Vlek  wrote:
> > Hi Josh,
> > 
> > Thank you for your effort. Looking at your code, I feel that mine is
> > semantically the same, except written in Java. The dependencies in the
> > pom.xml
> > all have the scope provided. The job is submitted as follows:
> > 
> > $ rm spark.log && MASTER=spark://maprdemo:7077
> > /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars
> > /home/mapr/projects/customer/lib/spark-streaming-
> > 
> > kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.j
> > ar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/c
> > ustomer/lib/metrics-
> > core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
> > 
> > core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.
> > 0- HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class
> > nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector
> > KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181
> > true
> > 
> > The spark-defaults.conf is reverted back to its defaults (i.e. no
> > userClassPathFirst). In the catch-block of the Phoenix connection buildup
> > the
> > class path is printed by recursively iterating over the class loaders. The
> > first one already prints the phoenix-client jar [1]. It's also very
> > unlikely to
> > be a bug in Spark or Phoenix, if your proof-of-concept just works.
> > 
> > So if the JAR that contains the offending class is known by the class
> > loader,
> > then that might indicate that there's a second JAR providing the same
> > class
> > but with a different version, right?
> > Yet, the only Phoenix JAR on the whole class path hierarchy is the
> > aforementioned phoenix-client JAR. Furthermore, I googled the class in
> > question, ClientRpcControllerFactory, and it really only exists in the
> > Phoenix
> > project. We're not talking about some low-level AOP Alliance stuff here ;)
> > 
> > Maybe I'm missing some fundamental class loading knowledge, in that case
> > I'd
> > be very happy to be enlightened. This all seems very strange.
> > 
> > Cheers,
> > Jeroen
> > 
> > [1]
> > [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > streaming-kafka_2.10-1.3.1.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.1
> > 0-0.8.1.1.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-
> > 0.3.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4
> > .4.0- HBase-0.98-client.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > sql_2.10-1.3.1.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > core-3.1.0.jar,
> > 
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStre
> > amConsumer.jar,
> > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > core-2.2.0.jar]
> > 
> > On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> > > This may or may not be helpful for your classpath issues, but I wanted
> > > to
> > > verify that basic functionality worked, so I made a sample app here:
> > > 
> > > https://github.com/jmahonin/spark-streaming-phoenix
> > > 
> > > This consumes events off a Kafka topic using spark streaming, and writes
> > > out event counts to Phoenix using the new phoenix-spark functionality:
> > > http://phoenix.apache.org/phoenix_spark.html
> > > 
> > > It's definitely overkill, and would probably be more efficient to use
> > > the
> > > JDBC driver directly, but it serves as a proof-of-concept.
> > > 
> > > I've only tested this in local mode. To convert it to a full jobs JAR, I
> > > suspect that keeping all of the spark and phoenix dependencies marked as
> > > 'provided', and including the Phoenix client JAR in the Spark classpath
> > > would work as well.
> > > 
> > > Good luck,
> > > 
> > > Josh
> > > 
> > > On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek  wrote:
> > > > Hi,
> > > > 
> > > > I posted a question with regards to Phoenix and Spark Streaming on
> > > > StackOverflow [1]. Please find a copy of the question to this email
> > 
> > below
> > 
> > > > the
> > > > first stack trace. I also already contacted the Phoenix mailing list
> 

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-11 Thread Josh Mahonin
Hi Jeroen,

No problem. I think there's some magic involved with how the Spark
classloader(s) works, especially with regards to the HBase dependencies. I
know there's probably a more light-weight solution that doesn't require
customizing the Spark setup, but that's the most straight-forward way I've
found that works.

Looking again at the docs, I thought I had a PR that mentioned the
SPARK_CLASSPATH, but either I'm dreaming it or it got dropped on the floor.
I'll search around for it today.

Thanks for the StackOverflow heads up, but feel free to update your post
with the resolution, maybe with a GMane link to the thread?

Good luck,

Josh

On Thu, Jun 11, 2015 at 2:38 AM, Jeroen Vlek  wrote:

> Hi Josh,
>
> That worked! Thank you so much! (I can't believe it was something so
> obvious
> ;) )
>
> If you care about such a thing you could answer my question here for
> bounty:
>
> http://stackoverflow.com/questions/30639659/apache-phoenix-4-3-1-and-4-4-0-hbase-0-98-on-spark-1-3-1-classnotfoundexceptio
>
> Have a great day!
>
> Cheers,
> Jeroen
>
> On Wednesday 10 June 2015 08:58:02 Josh Mahonin wrote:
> > Hi Jeroen,
> >
> > Rather than bundle the Phoenix client JAR with your app, are you able to
> > include it in a static location either in the SPARK_CLASSPATH, or set the
> > conf values below (I use SPARK_CLASSPATH myself, though it's deprecated):
> >
> >   spark.driver.extraClassPath
> >   spark.executor.extraClassPath
> >
> > Josh
> >
> > On Wed, Jun 10, 2015 at 4:11 AM, Jeroen Vlek 
> wrote:
> > > Hi Josh,
> > >
> > > Thank you for your effort. Looking at your code, I feel that mine is
> > > semantically the same, except written in Java. The dependencies in the
> > > pom.xml
> > > all have the scope provided. The job is submitted as follows:
> > >
> > > $ rm spark.log && MASTER=spark://maprdemo:7077
> > > /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars
> > > /home/mapr/projects/customer/lib/spark-streaming-
> > >
> > >
> kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.j
> > >
> ar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/c
> > > ustomer/lib/metrics-
> > > core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
> > >
> > >
> core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.
> > > 0- HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class
> > > nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector
> > > KafkaStreamConsumer.jar maprdemo:5181 0 topic
> jdbc:phoenix:maprdemo:5181
> > > true
> > >
> > > The spark-defaults.conf is reverted back to its defaults (i.e. no
> > > userClassPathFirst). In the catch-block of the Phoenix connection
> buildup
> > > the
> > > class path is printed by recursively iterating over the class loaders.
> The
> > > first one already prints the phoenix-client jar [1]. It's also very
> > > unlikely to
> > > be a bug in Spark or Phoenix, if your proof-of-concept just works.
> > >
> > > So if the JAR that contains the offending class is known by the class
> > > loader,
> > > then that might indicate that there's a second JAR providing the same
> > > class
> > > but with a different version, right?
> > > Yet, the only Phoenix JAR on the whole class path hierarchy is the
> > > aforementioned phoenix-client JAR. Furthermore, I googled the class in
> > > question, ClientRpcControllerFactory, and it really only exists in the
> > > Phoenix
> > > project. We're not talking about some low-level AOP Alliance stuff
> here ;)
> > >
> > > Maybe I'm missing some fundamental class loading knowledge, in that
> case
> > > I'd
> > > be very happy to be enlightened. This all seems very strange.
> > >
> > > Cheers,
> > > Jeroen
> > >
> > > [1]
> > >
> [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > > streaming-kafka_2.10-1.3.1.jar,
> > >
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.1
> > > 0-0.8.1.1.jar,
> > >
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-
> > > 0.3.jar,
> > >
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4
> > > .4.0- HBase-0.98-client.jar,
> > > file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
> > > sql_2.10-1.3.1.jar,
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > > core-3.1.0.jar,
> > >
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStre
> > > amConsumer.jar,
> > >
> file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
> > > core-2.2.0.jar]
> > >
> > > On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
> > > > This may or may not be helpful for your classpath issues, but I
> wanted
> > > > to
> > > > verify that basic functionality worked, so I made a sample app here:
> > > >
> > > > https://github.com/jmahonin/spark-streaming-phoenix
> > > >
> > > > This consumes events off a Kafka topic using spark streaming, and
> writes
> > > > out event c