Re: Problem using Spark with Hbase

2014-05-30 Thread Vibhor Banga
Thanks Mayur for the reply.

Actually issue was the I was running Spark application on hadoop-2.2.0 and
hbase version there was 0.95.2.

But spark by default gets build by an older hbase version. So I had to
build spark again with hbase version as 0.95.2 in spark build file. And it
worked.

Thanks,
-Vibhor


On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi 
wrote:

> Try this..
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga 
> wrote:
>
>> Any one who has used spark this way or has faced similar issue, please
>> help.
>>
>> Thanks,
>> -Vibhor
>>
>>
>> On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am facing issues while using spark with HBase. I am getting
>>> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
>>> (TableName.java:288)
>>>
>>> Can someone please help to resolve this issue. What am I missing ?
>>>
>>>
>>> I am using following snippet of code -
>>>
>>> Configuration config = HBaseConfiguration.create();
>>>
>>> config.set("hbase.zookeeper.znode.parent", "hostname1");
>>> config.set("hbase.zookeeper.quorum","hostname1");
>>> config.set("hbase.zookeeper.property.clientPort","2181");
>>> config.set("hbase.master", "hostname1:
>>> config.set("fs.defaultFS","hdfs://hostname1/");
>>> config.set("dfs.namenode.rpc-address","hostname1:8020");
>>>
>>> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>>>
>>>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>>>  System.getenv(sparkHome),
>>> JavaSparkContext.jarOfClass(Simple.class));
>>>
>>>JavaPairRDD hBaseRDD
>>> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
>>> ImmutableBytesWritable.class, Result.class);
>>>
>>>   Map rddMap =
>>> hBaseRDD.collectAsMap();
>>>
>>>
>>> But when I go to the spark cluster and check the logs, I see following
>>> error -
>>>
>>> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
>>> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>>> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>>> at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>>> at 
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>> at 
>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Thanks,
>>>
>>> -Vibhor
>>>
>>>
>>
>>
>>
>


-- 
Vibhor Banga
Software Development Engineer
Flipkart Internet Pvt. Ltd., Bangalore


Re: Problem using Spark with Hbase

2014-05-28 Thread Mayur Rustagi
Try this..

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga  wrote:

> Any one who has used spark this way or has faced similar issue, please
> help.
>
> Thanks,
> -Vibhor
>
>
> On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga wrote:
>
>> Hi all,
>>
>> I am facing issues while using spark with HBase. I am getting
>> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
>> (TableName.java:288)
>>
>> Can someone please help to resolve this issue. What am I missing ?
>>
>>
>> I am using following snippet of code -
>>
>> Configuration config = HBaseConfiguration.create();
>>
>> config.set("hbase.zookeeper.znode.parent", "hostname1");
>> config.set("hbase.zookeeper.quorum","hostname1");
>> config.set("hbase.zookeeper.property.clientPort","2181");
>> config.set("hbase.master", "hostname1:
>> config.set("fs.defaultFS","hdfs://hostname1/");
>> config.set("dfs.namenode.rpc-address","hostname1:8020");
>>
>> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>>
>>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>>  System.getenv(sparkHome),
>> JavaSparkContext.jarOfClass(Simple.class));
>>
>>JavaPairRDD hBaseRDD
>> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
>> ImmutableBytesWritable.class, Result.class);
>>
>>   Map rddMap =
>> hBaseRDD.collectAsMap();
>>
>>
>> But when I go to the spark cluster and check the logs, I see following
>> error -
>>
>> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
>> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>>  at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>>  at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>>  at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>>  at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>>  at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>>  at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>  at 
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks,
>>
>> -Vibhor
>>
>>
>
>
>


SparkHBaseMain.java
Description: Binary data


Re: Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Any one who has used spark this way or has faced similar issue, please help.

Thanks,
-Vibhor

On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga  wrote:

> Hi all,
>
> I am facing issues while using spark with HBase. I am getting
> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
> (TableName.java:288)
>
> Can someone please help to resolve this issue. What am I missing ?
>
>
> I am using following snippet of code -
>
> Configuration config = HBaseConfiguration.create();
>
> config.set("hbase.zookeeper.znode.parent", "hostname1");
> config.set("hbase.zookeeper.quorum","hostname1");
> config.set("hbase.zookeeper.property.clientPort","2181");
> config.set("hbase.master", "hostname1:
> config.set("fs.defaultFS","hdfs://hostname1/");
> config.set("dfs.namenode.rpc-address","hostname1:8020");
>
> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>
>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>  System.getenv(sparkHome),
> JavaSparkContext.jarOfClass(Simple.class));
>
>JavaPairRDD hBaseRDD
> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
> ImmutableBytesWritable.class, Result.class);
>
>   Map rddMap = hBaseRDD.collectAsMap();
>
>
> But when I go to the spark cluster and check the logs, I see following
> error -
>
> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>   at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>   at org.apache.spark.scheduler.Task.run(Task.scala:53)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> -Vibhor
>
>