Re: Spark with HBase

2014-12-15 Thread Aniket Bhatnagar
In case you are still looking for help, there has been multiple discussions
in this mailing list that you can try searching for. Or you can simply use
https://github.com/unicredit/hbase-rdd :-)

Thanks,
Aniket

On Wed Dec 03 2014 at 16:11:47 Ted Yu  wrote:

> Which hbase release are you running ?
> If it is 0.98, take a look at:
>
> https://issues.apache.org/jira/browse/SPARK-1297
>
> Thanks
>
> On Dec 2, 2014, at 10:21 PM, Jai  wrote:
>
> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


RE: Spark with HBase

2014-06-29 Thread N . Venkata Naga Ravi
+user@spark.apache.org

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org
Subject: Spark with HBase
Date: Sun, 29 Jun 2014 15:28:43 +0530




I am using follwoing versiongs ..

spark-1.0.0-bin-hadoop2
hbase-0.96.1.1-hadoop2


When executing Hbase Test , i am facing following exception. Looks like some 
version incompatibility, can you please help on it.

NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example 
org.apache.spark.examples.HBaseTest local localhost:4040 test



14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 
69...@neravi-m-70hy.cisco.com
14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to 
localhost/0:0:0:0:0:0:0:1:2181, initiating session
14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated 
timeout = 4
Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port 
pair: PBUF


192.168.1.6�(
at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
at 
org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
at 
org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
at 
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Ravi

  

RE: Spark with HBase

2014-07-04 Thread N . Venkata Naga Ravi
Hi,

Any update on the solution? We are still facing this issue...
We could able to connect to HBase with independent code, but getting issue with 
Spark integration.

Thx,
Ravi

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org; user@spark.apache.org
Subject: RE: Spark with HBase
Date: Sun, 29 Jun 2014 15:32:42 +0530




+user@spark.apache.org

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org
Subject: Spark with HBase
Date: Sun, 29 Jun 2014 15:28:43 +0530




I am using follwoing versiongs ..

spark-1.0.0-bin-hadoop2
hbase-0.96.1.1-hadoop2


When executing Hbase Test , i am facing following exception. Looks like some 
version incompatibility, can you please help on it.

NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example 
org.apache.spark.examples.HBaseTest local localhost:4040 test



14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 
69...@neravi-m-70hy.cisco.com
14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to 
localhost/0:0:0:0:0:0:0:1:2181, initiating session
14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated 
timeout = 4
Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port 
pair: PBUF


192.168.1.6�(
at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
at 
org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
at 
org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
at 
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Ravi

  

Re: Spark with HBase

2014-07-04 Thread 田毅
Hi, I met this issue before.

the reason is the hbase client using in spark is 0.94.6, and your server is
0.96.1.1

to fix this issue, you could choose one way:

a) deploy a hbase cluster with version 0.94.6
b) rebuild the spark code
step 1:  modify the hbase version in pom.xml to 0.96.1.1
step 2:  modify the hbase artifactId in example/pom.xml to hbase-it
step 3:  use maven to build spark again
c) try to add hbase jars to SPARK_CLASSPATH ( i did not try this way before
)


2014-07-04 1:19 GMT-07:00 N.Venkata Naga Ravi :

> Hi,
>
> Any update on the solution? We are still facing this issue...
> We could able to connect to HBase with independent code, but getting issue
> with Spark integration.
>
> Thx,
> Ravi
>
> --
> From: nvn_r...@hotmail.com
> To: u...@spark.incubator.apache.org; user@spark.apache.org
> Subject: RE: Spark with HBase
> Date: Sun, 29 Jun 2014 15:32:42 +0530
>
> +user@spark.apache.org
>
> --
> From: nvn_r...@hotmail.com
> To: u...@spark.incubator.apache.org
> Subject: Spark with HBase
> Date: Sun, 29 Jun 2014 15:28:43 +0530
>
> I am using follwoing versiongs ..
>
> *spark-1.0.0*-bin-hadoop2
> *hbase-0.96.1.1*-hadoop2
>
>
> When executing Hbase Test , i am facing following exception. Looks like
> some version incompatibility, can you please help on it.
>
> NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example
> org.apache.spark.examples.HBaseTest local localhost:4040 test
>
>
>
> 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this
> process is 69...@neravi-m-70hy.cisco.com
> 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to
> localhost/0:0:0:0:0:0:0:1:2181, initiating session
> 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on
> server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009,
> negotiated timeout = 4
> Exception in thread "main" java.lang.IllegalArgumentException: Not a
> host:port pair: PBUF
>
>
> 192.168.1.6�(
> at
> org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
> at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
> at
> org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
> at
> org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
> at
> org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
> at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
> at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Ravi
>


Re: Spark with HBase

2014-08-07 Thread Akhil Das
You can download and compile spark against your existing hadoop version.

Here's a quick start
https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types

You can also read a bit here
http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster
( the version is quiet old)

Attached is a piece of Code (Spark Java API) to connect to HBase.



Thanks
Best Regards


On Thu, Aug 7, 2014 at 1:48 PM, Deepa Jayaveer 
wrote:

> Hi
> I read your white paper about " " . We wanted to do a Proof of Concept on
> Spark with HBase. Documents
> are not much available to set up the spark cluster  in Hadoop 2
> environment. If you have any,
> can you please give us some reference URLs
> Also, some sample program to connect to HBase using Spark Java API
>
> Thanks
> Deepa
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
import java.util.Iterator;
import java.util.List;

import org.apache.commons.configuration.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.rdd.NewHadoopRDD;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;

import com.google.common.collect.Lists;

import scala.Function1;
import scala.Tuple2;
import scala.collection.JavaConversions;
import scala.collection.Seq;
import scala.collection.JavaConverters.*;
import scala.reflect.ClassTag;

public class SparkHBaseMain {

	
	@SuppressWarnings("deprecation")
	public static void main(String[] arg){
		
		try{
			
			List jars = Lists.newArrayList("/home/akhld/Desktop/tools/spark-9/jars/spark-assembly-0.9.0-incubating-hadoop2.3.0-mr1-cdh5.0.0.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-server-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-protocol-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-hadoop2-compat-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-common-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-client-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/htrace-core-2.02.jar");

			SparkConf spconf = new SparkConf();
			spconf.setMaster("local");
			spconf.setAppName("SparkHBase");
			spconf.setSparkHome("/home/akhld/Desktop/tools/spark-9");
			spconf.setJars(jars.toArray(new String[jars.size()]));
			spconf.set("spark.executor.memory", "1g");

			final JavaSparkContext sc = new JavaSparkContext(spconf);
		
			org.apache.hadoop.conf.Configuration conf = HBaseConfiguration.create();
			conf.addResource("/home/akhld/Downloads/sparkhbasecode/hbase-site.xml");
			conf.set(TableInputFormat.INPUT_TABLE, "blogposts");
			
		
			NewHadoopRDD rdd = new NewHadoopRDD(JavaSparkContext.toSparkContext(sc), TableInputFormat.class, ImmutableBytesWritable.class, Result.class, conf);
			
			JavaRDD> jrdd = rdd.toJavaRDD();
		
			ForEachFunction f = new ForEachFunction();
			JavaRDD> retrdd = jrdd.map(f);
			System.out.println("Count =>" + retrdd.count());
			
		}catch(Exception e){
			
			e.printStackTrace();
			System.out.println("Crshed : " + e);
			
		}
		
	}
	
	@SuppressWarnings("serial")
private static class ForEachFunction extends Function, Iterator>{
   	public Iterator call(Tuple2 test) {
   		Result tmp = (Result) test._2;
List kvl = tmp.getColumn("post".getBytes(), "title".getBytes());
for(KeyValue kl:kvl){
	String sb = new String(kl.getValue());
	System.out.println("Value :" + sb);
}
   		return null;
}

 }


}

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark with HBase

2014-08-07 Thread chutium
this two posts should be good for setting up spark+hbase environment and use
the results of hbase table scan as RDD

settings
http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html

some samples:
http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp11629p11647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark with HBase

2014-12-03 Thread Akhil Das
You could go through these to start with

http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase

http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark

Thanks
Best Regards

On Wed, Dec 3, 2014 at 11:51 AM, Jai  wrote:

> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark with HBase

2014-12-03 Thread Ted Yu
Which hbase release are you running ?
If it is 0.98, take a look at:

https://issues.apache.org/jira/browse/SPARK-1297

Thanks

On Dec 2, 2014, at 10:21 PM, Jai  wrote:

> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


RE: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Guys, Please can anyone help on the issue below?

Puneet

From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com]
Sent: Thursday, July 07, 2016 12:42 PM
To: user@spark.apache.org
Subject: Spark with HBase Error - Py4JJavaError

Hi,

We are running Hbase in fully distributed mode. I tried to connect to Hbase via 
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed 
the error says:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException: 
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:


1.  I think we have to copy this to a location on HDFS, /sparkjars/ seems a 
good a directory to create as any. I think the file has to be world readable

2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g. 
hdfs:///sparkjars

It still doesn't seem to work can someone please help me with this.

Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.


Re: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread ram kumar
Hi Puneet,

Have you tried appending
 --jars $SPARK_HOME/lib/spark-examples-*.jar
to the execution command?

Ram

On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi <
puneet.tripa...@dunnhumby.com> wrote:

> Guys, Please can anyone help on the issue below?
>
>
>
> Puneet
>
>
>
> *From:* Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com]
> *Sent:* Thursday, July 07, 2016 12:42 PM
> *To:* user@spark.apache.org
> *Subject:* Spark with HBase Error - Py4JJavaError
>
>
>
> Hi,
>
>
>
> We are running Hbase in fully distributed mode. I tried to connect to
> Hbase via pyspark and then write to hbase using *saveAsNewAPIHadoopDataset
> *, but it failed the error says:
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
>
> : java.lang.ClassNotFoundException:
> org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
> I have been able to create pythonconverters.jar and then did below:
>
>
>
> 1.  I think we have to copy this to a location on HDFS, /sparkjars/
> seems a good a directory to create as any. I think the file has to be world
> readable
>
> 2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g.
> hdfs:///sparkjars
>
>
>
> It still doesn’t seem to work can someone please help me with this.
>
>
>
> Regards,
>
> Puneet
>
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
>


RE: Spark with HBase Error - Py4JJavaError

2016-07-08 Thread Puneet Tripathi
Hi Ram, Thanks very much it worked.

Puneet

From: ram kumar [mailto:ramkumarro...@gmail.com]
Sent: Thursday, July 07, 2016 6:51 PM
To: Puneet Tripathi
Cc: user@spark.apache.org
Subject: Re: Spark with HBase Error - Py4JJavaError

Hi Puneet,
Have you tried appending
 --jars $SPARK_HOME/lib/spark-examples-*.jar
to the execution command?
Ram

On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi 
mailto:puneet.tripa...@dunnhumby.com>> wrote:
Guys, Please can anyone help on the issue below?

Puneet

From: Puneet Tripathi 
[mailto:puneet.tripa...@dunnhumby.com<mailto:puneet.tripa...@dunnhumby.com>]
Sent: Thursday, July 07, 2016 12:42 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark with HBase Error - Py4JJavaError

Hi,

We are running Hbase in fully distributed mode. I tried to connect to Hbase via 
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed 
the error says:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException: 
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:


1.  I think we have to copy this to a location on HDFS, /sparkjars/ seems a 
good a directory to create as any. I think the file has to be world readable

2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g. 
hdfs:///sparkjars

It still doesn’t seem to work can someone please help me with this.

Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.

dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.