Re: Generalised Spark-HBase integration

Ted Malaska Tue, 28 Jul 2015 09:08:08 -0700

Thanks Michal,

Just to share what I'm working on in a related topic.  So a long time ago I
build SparkOnHBase and put it into Cloudera Labs in this link.
http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/


Also recently I am working on getting this into HBase core.  It will
hopefully be in HBase core with in the next couple of weeks.

https://issues.apache.org/jira/browse/HBASE-13992

Then I'm planing on adding dataframe and bulk load support through

https://issues.apache.org/jira/browse/HBASE-14149
https://issues.apache.org/jira/browse/HBASE-14150

Also if you are interested this is running today a at least a half a dozen
companies with Spark Streaming.  Here is one blog post of successful
implementation

http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/

Also here is an additional example blog I also put together

http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/

Let me know if you have any questions, also let me know if you want to
connect to join efforts.

Ted Malaska

On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris <[email protected]>
wrote:

> Hi all, last couple of months I've been working on a large graph analytics
> and along the way have written from scratch a HBase-Spark integration as
> none of the ones out there worked either in terms of scale or in the way
> they integrated with the RDD interface. This week I have generalised it
> into an (almost) spark module, which works with the latest spark and the
> new hbase api, so... sharing! :
> https://github.com/michal-harish/spark-on-hbase
>
>
> --
> Michal Haris
> Technical Architect
> direct line: +44 (0) 207 749 0229
> www.visualdna.com | t: +44 (0) 207 734 7033
> 31 Old Nichol Street
> London
> E2 7HR
>

Re: Generalised Spark-HBase integration

Reply via email to