Re: HBaseContext with Spark

2017-01-27 Thread Chetan Khatri
storage handler bulk load:

SET hive.hbase.bulk=true;
INSERT OVERWRITE TABLE users SELECT … ;
But for now, you have to do some work and issue multiple Hive commands
Sample source data for range partitioning
Save sampling results to a file
Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner
(sorts data, producing a large number of region files)
Import HFiles into HBase
HBase can merge files if necessary

On Sat, Jan 28, 2017 at 11:32 AM, Chetan Khatri  wrote:

> @Ted, I dont think so.
>
> On Thu, Jan 26, 2017 at 6:35 AM, Ted Yu  wrote:
>
>> Does the storage handler provide bulk load capability ?
>>
>> Cheers
>>
>> On Jan 25, 2017, at 3:39 AM, Amrit Jangid 
>> wrote:
>>
>> Hi chetan,
>>
>> If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE
>> with
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.
>>
>>
>> Try this if you problem can be solved
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
>>
>>
>> Regards
>>
>> Amrit
>>
>>
>> .
>>
>> On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Spark Community Folks,
>>>
>>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
>>> Load from Hbase to Hive.
>>>
>>> I have seen couple of good example at HBase Github Repo:
>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>
>>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done
>>> ? Or which version of HBase has more stability with HBaseContext ?
>>>
>>> Thanks.
>>>
>>
>>
>>
>>
>>
>


Re: HBaseContext with Spark

2017-01-27 Thread Chetan Khatri
@Ted, I dont think so.

On Thu, Jan 26, 2017 at 6:35 AM, Ted Yu  wrote:

> Does the storage handler provide bulk load capability ?
>
> Cheers
>
> On Jan 25, 2017, at 3:39 AM, Amrit Jangid 
> wrote:
>
> Hi chetan,
>
> If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE
> with
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.
>
>
> Try this if you problem can be solved
>
>
> https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
>
>
> Regards
>
> Amrit
>
>
> .
>
> On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>> Hello Spark Community Folks,
>>
>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
>> Load from Hbase to Hive.
>>
>> I have seen couple of good example at HBase Github Repo:
>> https://github.com/apache/hbase/tree/master/hbase-spark
>>
>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done
>> ? Or which version of HBase has more stability with HBaseContext ?
>>
>> Thanks.
>>
>
>
>
>
>


Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Does the storage handler provide bulk load capability ?

Cheers

> On Jan 25, 2017, at 3:39 AM, Amrit Jangid  wrote:
> 
> Hi chetan,
> 
> If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.
> 
> Try this if you problem can be solved 
> 
> https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
> 
> Regards
> Amrit
> 
> .
> 
>> On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri  
>> wrote:
>> Hello Spark Community Folks,
>> 
>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load 
>> from Hbase to Hive.
>> 
>> I have seen couple of good example at HBase Github Repo: 
>> https://github.com/apache/hbase/tree/master/hbase-spark
>> 
>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? 
>> Or which version of HBase has more stability with HBaseContext ?
>> 
>> Thanks.
> 
> 
>  


Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
The references are vendor specific.

Suggest contacting vendor's mailing list for your PR.

My initial interpretation of HBase repository is that of Apache.

Cheers

On Wed, Jan 25, 2017 at 7:38 AM, Chetan Khatri 
wrote:

> @Ted Yu, Correct but HBase-Spark module available at HBase repository
> seems too old and written code is not optimized yet, I have been already
> submitted PR for the same. I dont know if it is clearly mentioned that now
> it is part of HBase itself then people are committing to older repo where
> original code is still old. [1]
>
> Other sources has updated info [2]
>
> Ref.
> [1] http://blog.cloudera.com/blog/2015/08/apache-spark-
> comes-to-apache-hbase-with-hbase-spark-module/
> [2] https://github.com/cloudera-labs/SparkOnHBase ,
> https://github.com/esamson/SparkOnHBase
>
> On Wed, Jan 25, 2017 at 8:13 PM, Ted Yu  wrote:
>
>> Though no hbase release has the hbase-spark module, you can find the
>> backport patch on HBASE-14160 (for Spark 1.6)
>>
>> You can build the hbase-spark module yourself.
>>
>> Cheers
>>
>> On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Spark Community Folks,
>>>
>>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
>>> Load from Hbase to Hive.
>>>
>>> I have seen couple of good example at HBase Github Repo:
>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>
>>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done
>>> ? Or which version of HBase has more stability with HBaseContext ?
>>>
>>> Thanks.
>>>
>>
>>
>


Re: HBaseContext with Spark

2017-01-25 Thread Chetan Khatri
@Ted Yu, Correct but HBase-Spark module available at HBase repository seems
too old and written code is not optimized yet, I have been already
submitted PR for the same. I dont know if it is clearly mentioned that now
it is part of HBase itself then people are committing to older repo where
original code is still old. [1]

Other sources has updated info [2]

Ref.
[1]
http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/
[2] https://github.com/cloudera-labs/SparkOnHBase ,
https://github.com/esamson/SparkOnHBase

On Wed, Jan 25, 2017 at 8:13 PM, Ted Yu  wrote:

> Though no hbase release has the hbase-spark module, you can find the
> backport patch on HBASE-14160 (for Spark 1.6)
>
> You can build the hbase-spark module yourself.
>
> Cheers
>
> On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>> Hello Spark Community Folks,
>>
>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
>> Load from Hbase to Hive.
>>
>> I have seen couple of good example at HBase Github Repo:
>> https://github.com/apache/hbase/tree/master/hbase-spark
>>
>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done
>> ? Or which version of HBase has more stability with HBaseContext ?
>>
>> Thanks.
>>
>
>


Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Though no hbase release has the hbase-spark module, you can find the
backport patch on HBASE-14160 (for Spark 1.6)

You can build the hbase-spark module yourself.

Cheers

On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri 
wrote:

> Hello Spark Community Folks,
>
> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
> Load from Hbase to Hive.
>
> I have seen couple of good example at HBase Github Repo:
> https://github.com/apache/hbase/tree/master/hbase-spark
>
> If I would like to use HBaseContext with HBase 1.2.4, how it can be done ?
> Or which version of HBase has more stability with HBaseContext ?
>
> Thanks.
>


Re: HBaseContext with Spark

2017-01-25 Thread Amrit Jangid
Hi chetan,

If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.


Try this if you problem can be solved


https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration


Regards

Amrit


.

On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri 
wrote:

> Hello Spark Community Folks,
>
> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
> Load from Hbase to Hive.
>
> I have seen couple of good example at HBase Github Repo:
> https://github.com/apache/hbase/tree/master/hbase-spark
>
> If I would like to use HBaseContext with HBase 1.2.4, how it can be done ?
> Or which version of HBase has more stability with HBaseContext ?
>
> Thanks.
>


HBaseContext with Spark

2017-01-25 Thread Chetan Khatri
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load
from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/
apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ?
Or which version of HBase has more stability with HBaseContext ?

Thanks.