Re: Generalised Spark-HBase integration

Michal Haris Tue, 28 Jul 2015 09:18:13 -0700

Cool, will revisit, is your latest code visible publicly somewhere ?

On 28 July 2015 at 17:14, Ted Malaska <[email protected]> wrote:


> Yup you should be able to do that with the APIs that are going into HBase.
>
> Let me know if you need to chat about the problem and how to implement it
> with the HBase apis.
>
> We have tried to cover any possible way to use HBase with Spark.  Let us
> know if we missed anything if we did we will add it.
>
> On Tue, Jul 28, 2015 at 12:12 PM, Michal Haris <[email protected]
> > wrote:
>
>> Hi Ted, yes, cloudera blog and your code was my starting point - but I
>> needed something more spark-centric rather than on hbase. Basically doing a
>> lot of ad-hoc transformations with RDDs that were based on HBase tables and
>> then mutating them after series of iterative (bsp-like) steps.
>>
>> On 28 July 2015 at 17:06, Ted Malaska <[email protected]> wrote:
>>
>>> Thanks Michal,
>>>
>>> Just to share what I'm working on in a related topic.  So a long time
>>> ago I build SparkOnHBase and put it into Cloudera Labs in this link.
>>> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
>>>
>>> Also recently I am working on getting this into HBase core.  It will
>>> hopefully be in HBase core with in the next couple of weeks.
>>>
>>> https://issues.apache.org/jira/browse/HBASE-13992
>>>
>>> Then I'm planing on adding dataframe and bulk load support through
>>>
>>> https://issues.apache.org/jira/browse/HBASE-14149
>>> https://issues.apache.org/jira/browse/HBASE-14150
>>>
>>> Also if you are interested this is running today a at least a half a
>>> dozen companies with Spark Streaming.  Here is one blog post of successful
>>> implementation
>>>
>>>
>>> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
>>>
>>> Also here is an additional example blog I also put together
>>>
>>>
>>> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
>>>
>>> Let me know if you have any questions, also let me know if you want to
>>> connect to join efforts.
>>>
>>> Ted Malaska
>>>
>>> On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris <
>>> [email protected]> wrote:
>>>
>>>> Hi all, last couple of months I've been working on a large graph
>>>> analytics and along the way have written from scratch a HBase-Spark
>>>> integration as none of the ones out there worked either in terms of scale
>>>> or in the way they integrated with the RDD interface. This week I have
>>>> generalised it into an (almost) spark module, which works with the latest
>>>> spark and the new hbase api, so... sharing! :
>>>> https://github.com/michal-harish/spark-on-hbase
>>>>
>>>>
>>>> --
>>>> Michal Haris
>>>> Technical Architect
>>>> direct line: +44 (0) 207 749 0229
>>>> www.visualdna.com | t: +44 (0) 207 734 7033
>>>> 31 Old Nichol Street
>>>> London
>>>> E2 7HR
>>>>
>>>
>>>
>>
>>
>> --
>> Michal Haris
>> Technical Architect
>> direct line: +44 (0) 207 749 0229
>> www.visualdna.com | t: +44 (0) 207 734 7033
>> 31 Old Nichol Street
>> London
>> E2 7HR
>>
>
>


-- 
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.com | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR

Re: Generalised Spark-HBase integration

Reply via email to