Thanks Michal, Just to share what I'm working on in a related topic. So a long time ago I build SparkOnHBase and put it into Cloudera Labs in this link. http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
Also recently I am working on getting this into HBase core. It will hopefully be in HBase core with in the next couple of weeks. https://issues.apache.org/jira/browse/HBASE-13992 Then I'm planing on adding dataframe and bulk load support through https://issues.apache.org/jira/browse/HBASE-14149 https://issues.apache.org/jira/browse/HBASE-14150 Also if you are interested this is running today a at least a half a dozen companies with Spark Streaming. Here is one blog post of successful implementation http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Also here is an additional example blog I also put together http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ Let me know if you have any questions, also let me know if you want to connect to join efforts. Ted Malaska On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris <michal.ha...@visualdna.com> wrote: > Hi all, last couple of months I've been working on a large graph analytics > and along the way have written from scratch a HBase-Spark integration as > none of the ones out there worked either in terms of scale or in the way > they integrated with the RDD interface. This week I have generalised it > into an (almost) spark module, which works with the latest spark and the > new hbase api, so... sharing! : > https://github.com/michal-harish/spark-on-hbase > > > -- > Michal Haris > Technical Architect > direct line: +44 (0) 207 749 0229 > www.visualdna.com | t: +44 (0) 207 734 7033 > 31 Old Nichol Street > London > E2 7HR >