Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

I found a profile for Spark 2.11 and removed it. Now, it brings in 2.10. I ran 
some code and got further. Now, I get this error below when I do a “df.show”.

java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.log(HBaseFilter.scala:122)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.buildFilters(HBaseFilter.scala:125)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:59)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)

Thanks for all your help.

Cheers,
Ben


> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
> 
> Did you check the actual maven dep tree? Something might be pulling in a 
> different version. Also, if you're seeing this locally, you might want to 
> check which version of the scala sdk your IDE is using
> 
> Asher Krim
> Senior Software Engineer
> 
> 
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  > wrote:
> Hi Asher,
> 
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
> (1.8) version as our installation. The Scala (2.10.5) version is already the 
> same as ours. But I’m still getting the same error. Can you think of anything 
> else?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 2, 2017, at 11:06 AM, Asher Krim > > wrote:
>> 
>> Ben,
>> 
>> That looks like a scala version mismatch. Have you checked your dep tree?
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim > > wrote:
>> Elek,
>> 
>> Can you give me some sample code? I can’t get mine to work.
>> 
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test", 
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>> 
>> import sqlContext.implicits._
>> 
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>> 
>> val df = withCatalog(cat)
>> df.show
>> 
>> It gives me this error.
>> 
>> java.lang.NoSuchMethodError: 
>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>  at 
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> 
>> If you can please help, I would be grateful.
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek >> > wrote:
>>> 
>>> 
>>> I tested this one with hbase 1.2.4:
>>> 
>>> https://github.com/hortonworks-spark/shc 
>>> 
>>> 
>>> Marton
>>> 
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
 Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
 tried to build it from source, but I cannot get it to work.
 
 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> 
>>> 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-03 Thread Asher Krim
You can see in the tree what's pulling in 2.11. Your option then will be to
either shade them and add an explicit dependency on 2.10.5 in your pom.
Alternatively, you can explore upgrading your project to 2.11 (which will
require using a 2_11 build of spark)


On Fri, Feb 3, 2017 at 2:03 PM, Benjamin Kim  wrote:

> Asher,
>
> You’re right. I don’t see anything but 2.11 being pulled in. Do you know
> where I can change this?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017, at 10:50 AM, Asher Krim  wrote:
>
> Sorry for my persistence, but did you actually run "mvn dependency:tree
> -Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
>
> On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  wrote:
>
>> Asher,
>>
>> It’s still the same. Do you have any other ideas?
>>
>> Cheers,
>> Ben
>>
>>
>> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
>>
>> Did you check the actual maven dep tree? Something might be pulling in a
>> different version. Also, if you're seeing this locally, you might want to
>> check which version of the scala sdk your IDE is using
>>
>> Asher Krim
>> Senior Software Engineer
>>
>> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>>
>>> Hi Asher,
>>>
>>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
>>> (1.8) version as our installation. The Scala (2.10.5) version is already
>>> the same as ours. But I’m still getting the same error. Can you think of
>>> anything else?
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>>>
>>> Ben,
>>>
>>> That looks like a scala version mismatch. Have you checked your dep tree?
>>>
>>> Asher Krim
>>> Senior Software Engineer
>>>
>>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>>>
 Elek,

 Can you give me some sample code? I can’t get mine to work.

 import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}

 def cat = s"""{
 |"table":{"namespace":"ben", "name":"dmp_test",
 "tableCoder":"PrimitiveType"},
 |"rowkey":"key",
 |"columns":{
 |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
 |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
 |}
 |}""".stripMargin

 import sqlContext.implicits._

 def withCatalog(cat: String): DataFrame = {
 sqlContext
 .read
 .options(Map(HBaseTableCatalog.tableCatalog->cat))
 .format("org.apache.spark.sql.execution.datasources.hbase")
 .load()
 }

 val df = withCatalog(cat)
 df.show


 It gives me this error.

 java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
 (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
 at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
 atalog$.apply(HBaseTableCatalog.scala:232)
 at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
 on.(HBaseRelation.scala:77)
 at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
 ce.createRelation(HBaseRelation.scala:51)
 at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
 e$.apply(ResolvedDataSource.scala:158)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)


 If you can please help, I would be grateful.

 Cheers,
 Ben


 On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:


 I tested this one with hbase 1.2.4:

 https://github.com/hortonworks-spark/shc

 Marton

 On 01/31/2017 09:17 PM, Benjamin Kim wrote:

 Does anyone know how to backport the HBase Spark module to HBase 1.2.0?
 I tried to build it from source, but I cannot get it to work.

 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org


 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org



>>>
>>>
>>
>>
>
>


Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

You’re right. I don’t see anything but 2.11 being pulled in. Do you know where 
I can change this?

Cheers,
Ben


> On Feb 3, 2017, at 10:50 AM, Asher Krim  wrote:
> 
> Sorry for my persistence, but did you actually run "mvn dependency:tree 
> -Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
> 
> On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  > wrote:
> Asher,
> 
> It’s still the same. Do you have any other ideas?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 3, 2017, at 8:16 AM, Asher Krim > > wrote:
>> 
>> Did you check the actual maven dep tree? Something might be pulling in a 
>> different version. Also, if you're seeing this locally, you might want to 
>> check which version of the scala sdk your IDE is using
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim > > wrote:
>> Hi Asher,
>> 
>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
>> (1.8) version as our installation. The Scala (2.10.5) version is already the 
>> same as ours. But I’m still getting the same error. Can you think of 
>> anything else?
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Feb 2, 2017, at 11:06 AM, Asher Krim >> > wrote:
>>> 
>>> Ben,
>>> 
>>> That looks like a scala version mismatch. Have you checked your dep tree?
>>> 
>>> Asher Krim
>>> Senior Software Engineer
>>> 
>>> 
>>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim >> > wrote:
>>> Elek,
>>> 
>>> Can you give me some sample code? I can’t get mine to work.
>>> 
>>> import org.apache.spark.sql.{SQLContext, _}
>>> import org.apache.spark.sql.execution.datasources.hbase._
>>> import org.apache.spark.{SparkConf, SparkContext}
>>> 
>>> def cat = s"""{
>>> |"table":{"namespace":"ben", "name":"dmp_test", 
>>> "tableCoder":"PrimitiveType"},
>>> |"rowkey":"key",
>>> |"columns":{
>>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>>> |}
>>> |}""".stripMargin
>>> 
>>> import sqlContext.implicits._
>>> 
>>> def withCatalog(cat: String): DataFrame = {
>>> sqlContext
>>> .read
>>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>>> .format("org.apache.spark.sql.execution.datasources.hbase")
>>> .load()
>>> }
>>> 
>>> val df = withCatalog(cat)
>>> df.show
>>> 
>>> It gives me this error.
>>> 
>>> java.lang.NoSuchMethodError: 
>>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>> at 
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>> 
>>> If you can please help, I would be grateful.
>>> 
>>> Cheers,
>>> Ben
>>> 
>>> 
 On Jan 31, 2017, at 1:02 PM, Marton, Elek > wrote:
 
 
 I tested this one with hbase 1.2.4:
 
 https://github.com/hortonworks-spark/shc 
 
 
 Marton
 
 On 01/31/2017 09:17 PM, Benjamin Kim wrote:
> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
> tried to build it from source, but I cannot get it to work.
> 
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
 
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-03 Thread Asher Krim
Sorry for my persistence, but did you actually run "mvn dependency:tree
-Dverbose=true"? And did you see only scala 2.10.5 being pulled in?

On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  wrote:

> Asher,
>
> It’s still the same. Do you have any other ideas?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
>
> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>
>> Hi Asher,
>>
>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
>> (1.8) version as our installation. The Scala (2.10.5) version is already
>> the same as ours. But I’m still getting the same error. Can you think of
>> anything else?
>>
>> Cheers,
>> Ben
>>
>>
>> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>>
>> Ben,
>>
>> That looks like a scala version mismatch. Have you checked your dep tree?
>>
>> Asher Krim
>> Senior Software Engineer
>>
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>>
>>> Elek,
>>>
>>> Can you give me some sample code? I can’t get mine to work.
>>>
>>> import org.apache.spark.sql.{SQLContext, _}
>>> import org.apache.spark.sql.execution.datasources.hbase._
>>> import org.apache.spark.{SparkConf, SparkContext}
>>>
>>> def cat = s"""{
>>> |"table":{"namespace":"ben", "name":"dmp_test",
>>> "tableCoder":"PrimitiveType"},
>>> |"rowkey":"key",
>>> |"columns":{
>>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>>> |}
>>> |}""".stripMargin
>>>
>>> import sqlContext.implicits._
>>>
>>> def withCatalog(cat: String): DataFrame = {
>>> sqlContext
>>> .read
>>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>>> .format("org.apache.spark.sql.execution.datasources.hbase")
>>> .load()
>>> }
>>>
>>> val df = withCatalog(cat)
>>> df.show
>>>
>>>
>>> It gives me this error.
>>>
>>> java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
>>> (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>> at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
>>> atalog$.apply(HBaseTableCatalog.scala:232)
>>> at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
>>> on.(HBaseRelation.scala:77)
>>> at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
>>> ce.createRelation(HBaseRelation.scala:51)
>>> at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
>>> e$.apply(ResolvedDataSource.scala:158)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>
>>>
>>> If you can please help, I would be grateful.
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>>>
>>>
>>> I tested this one with hbase 1.2.4:
>>>
>>> https://github.com/hortonworks-spark/shc
>>>
>>> Marton
>>>
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>>>
>>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0?
>>> I tried to build it from source, but I cannot get it to work.
>>>
>>> Thanks,
>>> Ben
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>
>>
>
>


Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

It’s still the same. Do you have any other ideas?

Cheers,
Ben


> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
> 
> Did you check the actual maven dep tree? Something might be pulling in a 
> different version. Also, if you're seeing this locally, you might want to 
> check which version of the scala sdk your IDE is using
> 
> Asher Krim
> Senior Software Engineer
> 
> 
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  > wrote:
> Hi Asher,
> 
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
> (1.8) version as our installation. The Scala (2.10.5) version is already the 
> same as ours. But I’m still getting the same error. Can you think of anything 
> else?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 2, 2017, at 11:06 AM, Asher Krim > > wrote:
>> 
>> Ben,
>> 
>> That looks like a scala version mismatch. Have you checked your dep tree?
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim > > wrote:
>> Elek,
>> 
>> Can you give me some sample code? I can’t get mine to work.
>> 
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test", 
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>> 
>> import sqlContext.implicits._
>> 
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>> 
>> val df = withCatalog(cat)
>> df.show
>> 
>> It gives me this error.
>> 
>> java.lang.NoSuchMethodError: 
>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>  at 
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> 
>> If you can please help, I would be grateful.
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek >> > wrote:
>>> 
>>> 
>>> I tested this one with hbase 1.2.4:
>>> 
>>> https://github.com/hortonworks-spark/shc 
>>> 
>>> 
>>> Marton
>>> 
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
 Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
 tried to build it from source, but I cannot get it to work.
 
 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> 
>>> 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-03 Thread Benjamin Kim
I'll clean up any .m2 or .ivy directories. And try again.

I ran this on our lab cluster for testing.

Cheers,
Ben


On Fri, Feb 3, 2017 at 8:16 AM Asher Krim  wrote:

> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>
> Hi Asher,
>
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
> (1.8) version as our installation. The Scala (2.10.5) version is already
> the same as ours. But I’m still getting the same error. Can you think of
> anything else?
>
> Cheers,
> Ben
>
>
> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>
> Ben,
>
> That looks like a scala version mismatch. Have you checked your dep tree?
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>
> Elek,
>
> Can you give me some sample code? I can’t get mine to work.
>
> import org.apache.spark.sql.{SQLContext, _}
> import org.apache.spark.sql.execution.datasources.hbase._
> import org.apache.spark.{SparkConf, SparkContext}
>
> def cat = s"""{
> |"table":{"namespace":"ben", "name":"dmp_test",
> "tableCoder":"PrimitiveType"},
> |"rowkey":"key",
> |"columns":{
> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
> |}
> |}""".stripMargin
>
> import sqlContext.implicits._
>
> def withCatalog(cat: String): DataFrame = {
> sqlContext
> .read
> .options(Map(HBaseTableCatalog.tableCatalog->cat))
> .format("org.apache.spark.sql.execution.datasources.hbase")
> .load()
> }
>
> val df = withCatalog(cat)
> df.show
>
>
> It gives me this error.
>
> java.lang.NoSuchMethodError:
> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
> at
> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
> at
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
> at
> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>
>
> If you can please help, I would be grateful.
>
> Cheers,
> Ben
>
>
> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>
>
> I tested this one with hbase 1.2.4:
>
> https://github.com/hortonworks-spark/shc
>
> Marton
>
> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>
> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I
> tried to build it from source, but I cannot get it to work.
>
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
>
>


Re: HBase Spark

2017-02-03 Thread Asher Krim
Did you check the actual maven dep tree? Something might be pulling in a
different version. Also, if you're seeing this locally, you might want to
check which version of the scala sdk your IDE is using

Asher Krim
Senior Software Engineer

On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:

> Hi Asher,
>
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
> (1.8) version as our installation. The Scala (2.10.5) version is already
> the same as ours. But I’m still getting the same error. Can you think of
> anything else?
>
> Cheers,
> Ben
>
>
> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>
> Ben,
>
> That looks like a scala version mismatch. Have you checked your dep tree?
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>
>> Elek,
>>
>> Can you give me some sample code? I can’t get mine to work.
>>
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test",
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>>
>> import sqlContext.implicits._
>>
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>>
>> val df = withCatalog(cat)
>> df.show
>>
>>
>> It gives me this error.
>>
>> java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
>> (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>> at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
>> atalog$.apply(HBaseTableCatalog.scala:232)
>> at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
>> on.(HBaseRelation.scala:77)
>> at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
>> ce.createRelation(HBaseRelation.scala:51)
>> at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
>> e$.apply(ResolvedDataSource.scala:158)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>
>>
>> If you can please help, I would be grateful.
>>
>> Cheers,
>> Ben
>>
>>
>> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>>
>>
>> I tested this one with hbase 1.2.4:
>>
>> https://github.com/hortonworks-spark/shc
>>
>> Marton
>>
>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>>
>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I
>> tried to build it from source, but I cannot get it to work.
>>
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>>
>
>


Re: HBase Spark

2017-02-02 Thread Benjamin Kim
Hi Asher,

I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java (1.8) 
version as our installation. The Scala (2.10.5) version is already the same as 
ours. But I’m still getting the same error. Can you think of anything else?

Cheers,
Ben


> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
> 
> Ben,
> 
> That looks like a scala version mismatch. Have you checked your dep tree?
> 
> Asher Krim
> Senior Software Engineer
> 
> 
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  > wrote:
> Elek,
> 
> Can you give me some sample code? I can’t get mine to work.
> 
> import org.apache.spark.sql.{SQLContext, _}
> import org.apache.spark.sql.execution.datasources.hbase._
> import org.apache.spark.{SparkConf, SparkContext}
> 
> def cat = s"""{
> |"table":{"namespace":"ben", "name":"dmp_test", 
> "tableCoder":"PrimitiveType"},
> |"rowkey":"key",
> |"columns":{
> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
> |}
> |}""".stripMargin
> 
> import sqlContext.implicits._
> 
> def withCatalog(cat: String): DataFrame = {
> sqlContext
> .read
> .options(Map(HBaseTableCatalog.tableCatalog->cat))
> .format("org.apache.spark.sql.execution.datasources.hbase")
> .load()
> }
> 
> val df = withCatalog(cat)
> df.show
> 
> It gives me this error.
> 
> java.lang.NoSuchMethodError: 
> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>   at 
> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>   at 
> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>   at 
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
> 
> If you can please help, I would be grateful.
> 
> Cheers,
> Ben
> 
> 
>> On Jan 31, 2017, at 1:02 PM, Marton, Elek > > wrote:
>> 
>> 
>> I tested this one with hbase 1.2.4:
>> 
>> https://github.com/hortonworks-spark/shc 
>> 
>> 
>> Marton
>> 
>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
>>> tried to build it from source, but I cannot get it to work.
>>> 
>>> Thanks,
>>> Ben
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> 
>>> 
>> 
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-02 Thread Asher Krim
Ben,

That looks like a scala version mismatch. Have you checked your dep tree?

Asher Krim
Senior Software Engineer

On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:

> Elek,
>
> Can you give me some sample code? I can’t get mine to work.
>
> import org.apache.spark.sql.{SQLContext, _}
> import org.apache.spark.sql.execution.datasources.hbase._
> import org.apache.spark.{SparkConf, SparkContext}
>
> def cat = s"""{
> |"table":{"namespace":"ben", "name":"dmp_test",
> "tableCoder":"PrimitiveType"},
> |"rowkey":"key",
> |"columns":{
> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
> |}
> |}""".stripMargin
>
> import sqlContext.implicits._
>
> def withCatalog(cat: String): DataFrame = {
> sqlContext
> .read
> .options(Map(HBaseTableCatalog.tableCatalog->cat))
> .format("org.apache.spark.sql.execution.datasources.hbase")
> .load()
> }
>
> val df = withCatalog(cat)
> df.show
>
>
> It gives me this error.
>
> java.lang.NoSuchMethodError: scala.runtime.ObjectRef.
> create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
> at org.apache.spark.sql.execution.datasources.hbase.
> HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
> at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(
> HBaseRelation.scala:77)
> at org.apache.spark.sql.execution.datasources.hbase.
> DefaultSource.createRelation(HBaseRelation.scala:51)
> at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(
> ResolvedDataSource.scala:158)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>
>
> If you can please help, I would be grateful.
>
> Cheers,
> Ben
>
>
> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>
>
> I tested this one with hbase 1.2.4:
>
> https://github.com/hortonworks-spark/shc
>
> Marton
>
> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>
> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I
> tried to build it from source, but I cannot get it to work.
>
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>


Re: HBase Spark

2017-02-02 Thread Benjamin Kim
Elek,

Can you give me some sample code? I can’t get mine to work.

import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}

def cat = s"""{
|"table":{"namespace":"ben", "name":"dmp_test", 
"tableCoder":"PrimitiveType"},
|"rowkey":"key",
|"columns":{
|"col0":{"cf":"rowkey", "col":"key", "type":"string"},
|"col1":{"cf":"d", "col":"google_gid", "type":"string"}
|}
|}""".stripMargin

import sqlContext.implicits._

def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}

val df = withCatalog(cat)
df.show

It gives me this error.

java.lang.NoSuchMethodError: 
scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
at 
org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)

If you can please help, I would be grateful.

Cheers,
Ben


> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
> 
> 
> I tested this one with hbase 1.2.4:
> 
> https://github.com/hortonworks-spark/shc
> 
> Marton
> 
> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
>> tried to build it from source, but I cannot get it to work.
>> 
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 



Re: HBase Spark

2017-01-31 Thread Benjamin Kim
Elek,

If I cannot use the HBase Spark module, then I’ll give it a try.

Thanks,
Ben


> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
> 
> 
> I tested this one with hbase 1.2.4:
> 
> https://github.com/hortonworks-spark/shc
> 
> Marton
> 
> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
>> tried to build it from source, but I cannot get it to work.
>> 
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: HBase Spark

2017-01-31 Thread Marton, Elek


I tested this one with hbase 1.2.4:

https://github.com/hortonworks-spark/shc

Marton

On 01/31/2017 09:17 PM, Benjamin Kim wrote:

Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried 
to build it from source, but I cannot get it to work.

Thanks,
Ben
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: HBase-Spark Module

2016-07-29 Thread David Newberger
Hi Ben,

This seems more like a question for community.cloudera.com. However, it would 
be in hbase not spark I believe. 

https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark

David Newberger


-Original Message-
From: Benjamin Kim [mailto:bbuil...@gmail.com] 
Sent: Friday, July 29, 2016 12:57 PM
To: user@spark.apache.org
Subject: HBase-Spark Module

I would like to know if anyone has tried using the hbase-spark module? I tried 
to follow the examples in conjunction with CDH 5.8.0. I cannot find the 
HBaseTableCatalog class in the module or in any of the Spark jars. Can someone 
help?

Thanks,
Ben
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: HBase / Spark Kerberos problem

2016-05-19 Thread Arun Natva
Some of the Hadoop services cannot make use of the ticket obtained by 
oginUserFromKeytab.

I was able to get past it using gss Jaas configuration where you can pass 
either Keytab file or ticketCache to spark executors that access HBase.

Sent from my iPhone

> On May 19, 2016, at 4:51 AM, Ellis, Tom (Financial Markets IT) 
> <tom.el...@lloydsbanking.com.INVALID> wrote:
> 
> Yeah we ran into this issue. Key part is to have the hbase jars and 
> hbase-site.xml config on the classpath of the spark submitter.
>  
> We did it slightly differently from Y Bodnar, where we set the required jars 
> and config on the env var SPARK_DIST_CLASSPATH in our spark env file (rather 
> than SPARK_CLASSPATH which is deprecated).
>  
> With this and –principal/--keytab, if you turn DEBUG logging for 
> org.apache.spark.deploy.yarn you should see “Added HBase security token to 
> credentials.”
>  
> Otherwise you should at least hopefully see the error where it fails to add 
> the HBase tokens.
>  
> Check out the source of Client [1] and YarnSparkHadoopUtil  [2] – you’ll see 
> how obtainTokenForHBase is being done.
>  
> It’s a bit confusing as to why it says you haven’t kinited even when you do 
> loginUserFromKeytab – I haven’t quite worked through the reason for that yet.
>  
> Cheers,
>  
> Tom Ellis
> telli...@gmail.com
>  
> [1] 
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
> [2] 
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
>  
>  
> From: John Trengrove [mailto:john.trengr...@servian.com.au] 
> Sent: 19 May 2016 08:09
> To: philipp.meyerhoe...@thomsonreuters.com
> Cc: user
> Subject: Re: HBase / Spark Kerberos problem
>  
> -- This email has reached the Bank via an external source -- 
>  
> Have you had a look at this issue?
>  
> https://issues.apache.org/jira/browse/SPARK-12279 
>  
> There is a comment by Y Bodnar on how they successfully got Kerberos and 
> HBase working.
>  
> 2016-05-18 18:13 GMT+10:00 <philipp.meyerhoe...@thomsonreuters.com>:
> Hi all,
> 
> I have been puzzling over a Kerberos problem for a while now and wondered if 
> anyone can help.
> 
> For spark-submit, I specify --keytab x --principal y, which creates my 
> SparkContext fine.
> Connections to Zookeeper Quorum to find the HBase master work well too.
> But when it comes to a .count() action on the RDD, I am always presented with 
> the stack trace at the end of this mail.
> 
> We are using CDH5.5.2 (spark 1.5.0), and 
> com.cloudera.spark.hbase.HBaseContext is a wrapper around 
> TableInputFormat/hadoopRDD (see 
> https://github.com/cloudera-labs/SparkOnHBase), as you can see in the stack 
> trace.
> 
> Am I doing something obvious wrong here?
> A similar flow, inside test code, works well, only going via spark-submit 
> exposes this issue.
> 
> Code snippet (I have tried using the commented-out lines in various 
> combinations, without success):
> 
>val conf = new SparkConf().
>   set("spark.shuffle.consolidateFiles", "true").
>   set("spark.kryo.registrationRequired", "false").
>   set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").
>   set("spark.kryoserializer.buffer", "30m")
> val sc = new SparkContext(conf)
> val cfg = sc.hadoopConfiguration
> //cfg.addResource(new 
> org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml"))
> //
> UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS)
> //cfg.set("hbase.security.authentication", "kerberos")
> val hc = new HBaseContext(sc, cfg)
> val scan = new Scan
> scan.setTimeRange(startMillis, endMillis)
> val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch)
> val cnt = matchesInRange.count()
> log.info(s"matches in range $cnt")
> 
> Stack trace / log:
> 
> 16/05/17 17:04:47 INFO SparkContext: Starting job: count at Analysis.scala:93
> 16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at Analysis.scala:93) 
> with 1 output partitions
> 16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at 
> Analysis.scala:93)
> 16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List()
> 16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List()
> 16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0 
> (MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no missing 
> parents
> 16/05/17 17:04:47 INFO MemoryStore: ensureFr

RE: HBase / Spark Kerberos problem

2016-05-19 Thread philipp.meyerhoefer
Thanks Tom & John!

modifying spark-env.sh did the trick - my last line in the file is now:

export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt"):`hbase 
classpath`:/etc/hbase/conf:/etc/hbase/conf/hbase-site.xml

Now o.a.s.d.y.Client logs “Added HBase security token to credentials” and the 
.count() on my HBase RDD works fine.

From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com] 
Sent: 19 May 2016 09:51
To: 'John Trengrove'; Meyerhoefer, Philipp (TR Technology & Ops)
Cc: user
Subject: RE: HBase / Spark Kerberos problem

Yeah we ran into this issue. Key part is to have the hbase jars and 
hbase-site.xml config on the classpath of the spark submitter.

We did it slightly differently from Y Bodnar, where we set the required jars 
and config on the env var SPARK_DIST_CLASSPATH in our spark env file (rather 
than SPARK_CLASSPATH which is deprecated).

With this and –principal/--keytab, if you turn DEBUG logging for 
org.apache.spark.deploy.yarn you should see “Added HBase security token to 
credentials.”

Otherwise you should at least hopefully see the error where it fails to add the 
HBase tokens.

Check out the source of Client [1] and YarnSparkHadoopUtil  [2] – you’ll see 
how obtainTokenForHBase is being done.

It’s a bit confusing as to why it says you haven’t kinited even when you do 
loginUserFromKeytab – I haven’t quite worked through the reason for that yet.

Cheers,

Tom Ellis
telli...@gmail.com

[1] 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
[2] 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala


From: John Trengrove [mailto:john.trengr...@servian.com.au] 
Sent: 19 May 2016 08:09
To: philipp.meyerhoe...@thomsonreuters.com
Cc: user
Subject: Re: HBase / Spark Kerberos problem

-- This email has reached the Bank via an external source -- 
  
Have you had a look at this issue?

https://issues.apache.org/jira/browse/SPARK-12279 

There is a comment by Y Bodnar on how they successfully got Kerberos and HBase 
working.

2016-05-18 18:13 GMT+10:00 <philipp.meyerhoe...@thomsonreuters.com>:
Hi all,

I have been puzzling over a Kerberos problem for a while now and wondered if 
anyone can help.

For spark-submit, I specify --keytab x --principal y, which creates my 
SparkContext fine.
Connections to Zookeeper Quorum to find the HBase master work well too.
But when it comes to a .count() action on the RDD, I am always presented with 
the stack trace at the end of this mail.

We are using CDH5.5.2 (spark 1.5.0), and com.cloudera.spark.hbase.HBaseContext 
is a wrapper around TableInputFormat/hadoopRDD (see 
https://github.com/cloudera-labs/SparkOnHBase), as you can see in the stack 
trace.

Am I doing something obvious wrong here?
A similar flow, inside test code, works well, only going via spark-submit 
exposes this issue.

Code snippet (I have tried using the commented-out lines in various 
combinations, without success):

   val conf = new SparkConf().
      set("spark.shuffle.consolidateFiles", "true").
      set("spark.kryo.registrationRequired", "false").
      set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").
      set("spark.kryoserializer.buffer", "30m")
    val sc = new SparkContext(conf)
    val cfg = sc.hadoopConfiguration
//    cfg.addResource(new 
org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml"))
//    
UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS)
//    cfg.set("hbase.security.authentication", "kerberos")
    val hc = new HBaseContext(sc, cfg)
    val scan = new Scan
    scan.setTimeRange(startMillis, endMillis)
    val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch)
    val cnt = matchesInRange.count()
    log.info(s"matches in range $cnt")

Stack trace / log:

16/05/17 17:04:47 INFO SparkContext: Starting job: count at Analysis.scala:93
16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at Analysis.scala:93) 
with 1 output partitions
16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at 
Analysis.scala:93)
16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List()
16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List()
16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no missing 
parents
16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(3248) called with 
curMem=428022, maxMem=244187136
16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 3.2 KB, free 232.5 MB)
16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(2022) called with 
curMem=431270, maxMem=244187136
16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3_piece0

RE: HBase / Spark Kerberos problem

2016-05-19 Thread Ellis, Tom (Financial Markets IT)
Yeah we ran into this issue. Key part is to have the hbase jars and 
hbase-site.xml config on the classpath of the spark submitter.

We did it slightly differently from Y Bodnar, where we set the required jars 
and config on the env var SPARK_DIST_CLASSPATH in our spark env file (rather 
than SPARK_CLASSPATH which is deprecated).

With this and –principal/--keytab, if you turn DEBUG logging for 
org.apache.spark.deploy.yarn you should see “Added HBase security token to 
credentials.”

Otherwise you should at least hopefully see the error where it fails to add the 
HBase tokens.

Check out the source of Client [1] and YarnSparkHadoopUtil  [2] – you’ll see 
how obtainTokenForHBase is being done.

It’s a bit confusing as to why it says you haven’t kinited even when you do 
loginUserFromKeytab – I haven’t quite worked through the reason for that yet.

Cheers,

Tom Ellis
telli...@gmail.com<mailto:telli...@gmail.com>

[1] 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
[2] 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala


From: John Trengrove [mailto:john.trengr...@servian.com.au]
Sent: 19 May 2016 08:09
To: philipp.meyerhoe...@thomsonreuters.com
Cc: user
Subject: Re: HBase / Spark Kerberos problem

-- This email has reached the Bank via an external source --

Have you had a look at this issue?

https://issues.apache.org/jira/browse/SPARK-12279

There is a comment by Y Bodnar on how they successfully got Kerberos and HBase 
working.

2016-05-18 18:13 GMT+10:00 
<philipp.meyerhoe...@thomsonreuters.com<mailto:philipp.meyerhoe...@thomsonreuters.com>>:
Hi all,

I have been puzzling over a Kerberos problem for a while now and wondered if 
anyone can help.

For spark-submit, I specify --keytab x --principal y, which creates my 
SparkContext fine.
Connections to Zookeeper Quorum to find the HBase master work well too.
But when it comes to a .count() action on the RDD, I am always presented with 
the stack trace at the end of this mail.

We are using CDH5.5.2 (spark 1.5.0), and com.cloudera.spark.hbase.HBaseContext 
is a wrapper around TableInputFormat/hadoopRDD (see 
https://github.com/cloudera-labs/SparkOnHBase), as you can see in the stack 
trace.

Am I doing something obvious wrong here?
A similar flow, inside test code, works well, only going via spark-submit 
exposes this issue.

Code snippet (I have tried using the commented-out lines in various 
combinations, without success):

   val conf = new SparkConf().
  set("spark.shuffle.consolidateFiles", "true").
  set("spark.kryo.registrationRequired", "false").
  set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").
  set("spark.kryoserializer.buffer", "30m")
val sc = new SparkContext(conf)
val cfg = sc.hadoopConfiguration
//cfg.addResource(new 
org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml"))
//
UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS)
//cfg.set("hbase.security.authentication", "kerberos")
val hc = new HBaseContext(sc, cfg)
val scan = new Scan
scan.setTimeRange(startMillis, endMillis)
val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch)
val cnt = matchesInRange.count()
log.info<http://log.info>(s"matches in range $cnt")

Stack trace / log:

16/05/17 17:04:47 INFO SparkContext: Starting job: count at Analysis.scala:93
16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at Analysis.scala:93) 
with 1 output partitions
16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at 
Analysis.scala:93)
16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List()
16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List()
16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no missing 
parents
16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(3248) called with 
curMem=428022, maxMem=244187136
16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 3.2 KB, free 232.5 MB)
16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(2022) called with 
curMem=431270, maxMem=244187136
16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in 
memory (estimated size 2022.0 B, free 232.5 MB)
16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 
10.6.164.40:33563<http://10.6.164.40:33563> (size: 2022.0 B, free: 232.8 MB)
16/05/17 17:04:47 INFO SparkContext: Created broadcast 3 from broadcast at 
DAGScheduler.scala:861
16/05/17 17:04:47 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 0 (MapPartitionsRDD[1] at map at HBaseContext.scala:580)
16/

Re: HBase / Spark Kerberos problem

2016-05-19 Thread John Trengrove
Have you had a look at this issue?

https://issues.apache.org/jira/browse/SPARK-12279

There is a comment by Y Bodnar on how they successfully got Kerberos and
HBase working.

2016-05-18 18:13 GMT+10:00 :

> Hi all,
>
> I have been puzzling over a Kerberos problem for a while now and wondered
> if anyone can help.
>
> For spark-submit, I specify --keytab x --principal y, which creates my
> SparkContext fine.
> Connections to Zookeeper Quorum to find the HBase master work well too.
> But when it comes to a .count() action on the RDD, I am always presented
> with the stack trace at the end of this mail.
>
> We are using CDH5.5.2 (spark 1.5.0), and
> com.cloudera.spark.hbase.HBaseContext is a wrapper around
> TableInputFormat/hadoopRDD (see
> https://github.com/cloudera-labs/SparkOnHBase), as you can see in the
> stack trace.
>
> Am I doing something obvious wrong here?
> A similar flow, inside test code, works well, only going via spark-submit
> exposes this issue.
>
> Code snippet (I have tried using the commented-out lines in various
> combinations, without success):
>
>val conf = new SparkConf().
>   set("spark.shuffle.consolidateFiles", "true").
>   set("spark.kryo.registrationRequired", "false").
>   set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer").
>   set("spark.kryoserializer.buffer", "30m")
> val sc = new SparkContext(conf)
> val cfg = sc.hadoopConfiguration
> //cfg.addResource(new
> org.apache.hadoop.fs.Path("/etc/hbase/conf/hbase-site.xml"))
> //
> UserGroupInformation.getCurrentUser.setAuthenticationMethod(UserGroupInformation.AuthenticationMethod.KERBEROS)
> //cfg.set("hbase.security.authentication", "kerberos")
> val hc = new HBaseContext(sc, cfg)
> val scan = new Scan
> scan.setTimeRange(startMillis, endMillis)
> val matchesInRange = hc.hbaseRDD(MY_TABLE, scan, resultToMatch)
> val cnt = matchesInRange.count()
> log.info(s"matches in range $cnt")
>
> Stack trace / log:
>
> 16/05/17 17:04:47 INFO SparkContext: Starting job: count at
> Analysis.scala:93
> 16/05/17 17:04:47 INFO DAGScheduler: Got job 0 (count at
> Analysis.scala:93) with 1 output partitions
> 16/05/17 17:04:47 INFO DAGScheduler: Final stage: ResultStage 0(count at
> Analysis.scala:93)
> 16/05/17 17:04:47 INFO DAGScheduler: Parents of final stage: List()
> 16/05/17 17:04:47 INFO DAGScheduler: Missing parents: List()
> 16/05/17 17:04:47 INFO DAGScheduler: Submitting ResultStage 0
> (MapPartitionsRDD[1] at map at HBaseContext.scala:580), which has no
> missing parents
> 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(3248) called with
> curMem=428022, maxMem=244187136
> 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3 stored as values in
> memory (estimated size 3.2 KB, free 232.5 MB)
> 16/05/17 17:04:47 INFO MemoryStore: ensureFreeSpace(2022) called with
> curMem=431270, maxMem=244187136
> 16/05/17 17:04:47 INFO MemoryStore: Block broadcast_3_piece0 stored as
> bytes in memory (estimated size 2022.0 B, free 232.5 MB)
> 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in
> memory on 10.6.164.40:33563 (size: 2022.0 B, free: 232.8 MB)
> 16/05/17 17:04:47 INFO SparkContext: Created broadcast 3 from broadcast at
> DAGScheduler.scala:861
> 16/05/17 17:04:47 INFO DAGScheduler: Submitting 1 missing tasks from
> ResultStage 0 (MapPartitionsRDD[1] at map at HBaseContext.scala:580)
> 16/05/17 17:04:47 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
> 16/05/17 17:04:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
> 0, hpg-dev-vm, partition 0,PROCESS_LOCAL, 2208 bytes)
> 16/05/17 17:04:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in
> memory on hpg-dev-vm:52698 (size: 2022.0 B, free: 388.4 MB)
> 16/05/17 17:04:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in
> memory on hpg-dev-vm:52698 (size: 26.0 KB, free: 388.4 MB)
> 16/05/17 17:04:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> hpg-dev-vm): org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Can't get the location
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:155)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)
> at
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161)
> at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156)
> at
> 

Re: HBase Spark Streaming giving error after restore

2015-10-17 Thread Amit Hora
Hi,

Regresta for delayed resoonse
please find below full stack trace

ava.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to
org.apache.hadoop.hbase.client.Mutation
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/10/16 18:50:03 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1, localhost, ANY, 1185 bytes)
15/10/16 18:50:03 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/10/16 18:50:03 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
localhost): java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be
cast to org.apache.hadoop.hbase.client.Mutation
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

15/10/16 18:50:03 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times;
aborting job
15/10/16 18:50:03 INFO TaskSchedulerImpl: Cancelling stage 0
15/10/16 18:50:03 INFO Executor: Executor is trying to kill task 1.0 in
stage 0.0 (TID 1)
15/10/16 18:50:03 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/10/16 18:50:03 INFO DAGScheduler: Job 0 failed: foreachRDD at
TwitterStream.scala:150, took 5.956054 s
15/10/16 18:50:03 INFO JobScheduler: Starting job streaming job
144500141 ms.0 from job set of time 144500141 ms
15/10/16 18:50:03 ERROR JobScheduler: Error running job streaming job
144500140 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0, localhost): java.lang.ClassCastException:
scala.runtime.BoxedUnit cannot be cast to
org.apache.hadoop.hbase.client.Mutation
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 1 times, most recent 

Re: HBase Spark Streaming giving error after restore

2015-10-17 Thread Aniket Bhatnagar
Can you try changing classOf[OutputFormat[String,
BoxedUnit]] to classOf[OutputFormat[String,
Put]] while configuring hconf?

On Sat, Oct 17, 2015, 11:44 AM Amit Hora  wrote:

> Hi,
>
> Regresta for delayed resoonse
> please find below full stack trace
>
> ava.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to
> org.apache.hadoop.hbase.client.Mutation
> at
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 15/10/16 18:50:03 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
> 1, localhost, ANY, 1185 bytes)
> 15/10/16 18:50:03 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 15/10/16 18:50:03 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> localhost): java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be
> cast to org.apache.hadoop.hbase.client.Mutation
> at
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> 15/10/16 18:50:03 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
> times; aborting job
> 15/10/16 18:50:03 INFO TaskSchedulerImpl: Cancelling stage 0
> 15/10/16 18:50:03 INFO Executor: Executor is trying to kill task 1.0 in
> stage 0.0 (TID 1)
> 15/10/16 18:50:03 INFO TaskSchedulerImpl: Stage 0 was cancelled
> 15/10/16 18:50:03 INFO DAGScheduler: Job 0 failed: foreachRDD at
> TwitterStream.scala:150, took 5.956054 s
> 15/10/16 18:50:03 INFO JobScheduler: Starting job streaming job
> 144500141 ms.0 from job set of time 144500141 ms
> 15/10/16 18:50:03 ERROR JobScheduler: Error running job streaming job
> 144500140 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> 0.0 (TID 0, localhost): java.lang.ClassCastException:
> scala.runtime.BoxedUnit cannot be cast to
> org.apache.hadoop.hbase.client.Mutation
> at
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
> at
> 

Re: HBase Spark Streaming giving error after restore

2015-10-16 Thread Ted Yu
Can you show the complete stack trace ?

Subclass of Mutation is expected. Put is a subclass.

Have you tried replacing BoxedUnit with Put in your code ?

Cheers

On Fri, Oct 16, 2015 at 6:02 AM, Amit Singh Hora 
wrote:

> Hi All,
>
> I am using below code to stream data from kafka to hbase ,everything works
> fine until i restart the job so that it can restore the state from
> checkpoint directory ,but while trying to restore the state it give me
> below
> error
>
> ge 0.0 (TID 0, localhost): java.lang.ClassCastException:
> scala.runtime.BoxedUnit cannot be cast to
> org.apache.hadoop.hbase.client.Mutation
>
> please find below code
>
> tweetsRDD.foreachRDD(rdd=>{
>   val hconf = HBaseConfiguration.create();
> hconf.set(TableOutputFormat.OUTPUT_TABLE, hbasetablename)
> hconf.set("zookeeper.session.timeout",
> conf.getString("hbase.zookeepertimeout"));
> hconf.set("hbase.client.retries.number", Integer.toString(1));
> hconf.set("zookeeper.recovery.retry", Integer.toString(1));
> hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
>
> hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
> hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
> hconf.set("hbase.zookeeper.property.clientPort",
> conf.getString("hbase.hbase_zk_port"));
>  hconf.setClass("mapreduce.outputformat.class",
> classOf[TableOutputFormat[String]], classOf[OutputFormat[String,
> BoxedUnit]])
>
>  rdd.map ( record =>(new ImmutableBytesWritable,{
>
>
> var maprecord = new HashMap[String, String];
>   val mapper = new ObjectMapper();
>
>   //convert JSON string to Map
>
>   maprecord = mapper.readValue(record.toString(),
> new TypeReference[HashMap[String, String]]() {});
>
>
>   var ts:Long= maprecord.get("ts").toLong
>   var tweetID:Long= maprecord.get("id").toLong
>   val key=ts+"_"+tweetID;
>   val   put=new Put(Bytes.toBytes(key))
>maprecord.foreach(kv => {
> //  println(kv._1+" - "+kv._2)
>
>
> put.add(Bytes.toBytes(colfamily.value),Bytes.toBytes(kv._1),Bytes.toBytes(kv._2))
>
>
>   }
>)
>put
>
> }
>  )
>  ).saveAsNewAPIHadoopDataset(hconf)
>
>   })
>
>
>
> help me out in solving this as it is urgent for me
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/HBase-Spark-Streaming-giving-error-after-restore-tp25089.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Hbase Spark streaming issue.

2015-09-24 Thread Shixiong Zhu
Looks like you have an incompatible hbase-default.xml in some place. You
can use the following code to find the location of "hbase-default.xml"

println(Thread.currentThread().getContextClassLoader().getResource("hbase-default.xml"))

Best Regards,
Shixiong Zhu

2015-09-21 15:46 GMT+08:00 Siva :

> Hi,
>
> I m seeing some strange error while inserting data from spark streaming to
> hbase.
>
> I can able to write the data from spark (without streaming) to hbase
> successfully, but when i use the same code to write dstream I m seeing the
> below error.
>
> I tried setting the below parameters, still didnt help. Did any face the
> similar issue?
>
> conf.set("hbase.defaults.for.version.skip", "true")
> conf.set("hbase.defaults.for.version", "0.98.4.2.2.4.2-2-hadoop2")
>
> 15/09/20 22:39:10 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID
> 16)
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (null), this version is 0.98.4.2.2.4.2-2-hadoop2
> at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73)
> at
> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105)
> at
> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116)
> at
> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:125)
> at
> $line51.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$HBaseConn$.hbaseConnection(:49)
> at
> $line52.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$TestHbaseSpark$$anonfun$run$1$$anonfun$apply$1.apply(:73)
> at
> $line52.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$TestHbaseSpark$$anonfun$run$1$$anonfun$apply$1.apply(:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:782)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/09/20 22:39:10 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID
> 16, localhost): java.lang.RuntimeException: hbase-default.xml file seems to
> be for and old version of HBase (null), this version is
> 0.98.4.2.2.4.2-2-hadoop2
>
>
> Thanks,
> Siva.
>