Cool, will revisit, is your latest code visible publicly somewhere ? On 28 July 2015 at 17:14, Ted Malaska <ted.mala...@cloudera.com> wrote:
> Yup you should be able to do that with the APIs that are going into HBase. > > Let me know if you need to chat about the problem and how to implement it > with the HBase apis. > > We have tried to cover any possible way to use HBase with Spark. Let us > know if we missed anything if we did we will add it. > > On Tue, Jul 28, 2015 at 12:12 PM, Michal Haris <michal.ha...@visualdna.com > > wrote: > >> Hi Ted, yes, cloudera blog and your code was my starting point - but I >> needed something more spark-centric rather than on hbase. Basically doing a >> lot of ad-hoc transformations with RDDs that were based on HBase tables and >> then mutating them after series of iterative (bsp-like) steps. >> >> On 28 July 2015 at 17:06, Ted Malaska <ted.mala...@cloudera.com> wrote: >> >>> Thanks Michal, >>> >>> Just to share what I'm working on in a related topic. So a long time >>> ago I build SparkOnHBase and put it into Cloudera Labs in this link. >>> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ >>> >>> Also recently I am working on getting this into HBase core. It will >>> hopefully be in HBase core with in the next couple of weeks. >>> >>> https://issues.apache.org/jira/browse/HBASE-13992 >>> >>> Then I'm planing on adding dataframe and bulk load support through >>> >>> https://issues.apache.org/jira/browse/HBASE-14149 >>> https://issues.apache.org/jira/browse/HBASE-14150 >>> >>> Also if you are interested this is running today a at least a half a >>> dozen companies with Spark Streaming. Here is one blog post of successful >>> implementation >>> >>> >>> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ >>> >>> Also here is an additional example blog I also put together >>> >>> >>> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ >>> >>> Let me know if you have any questions, also let me know if you want to >>> connect to join efforts. >>> >>> Ted Malaska >>> >>> On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris < >>> michal.ha...@visualdna.com> wrote: >>> >>>> Hi all, last couple of months I've been working on a large graph >>>> analytics and along the way have written from scratch a HBase-Spark >>>> integration as none of the ones out there worked either in terms of scale >>>> or in the way they integrated with the RDD interface. This week I have >>>> generalised it into an (almost) spark module, which works with the latest >>>> spark and the new hbase api, so... sharing! : >>>> https://github.com/michal-harish/spark-on-hbase >>>> >>>> >>>> -- >>>> Michal Haris >>>> Technical Architect >>>> direct line: +44 (0) 207 749 0229 >>>> www.visualdna.com | t: +44 (0) 207 734 7033 >>>> 31 Old Nichol Street >>>> London >>>> E2 7HR >>>> >>> >>> >> >> >> -- >> Michal Haris >> Technical Architect >> direct line: +44 (0) 207 749 0229 >> www.visualdna.com | t: +44 (0) 207 734 7033 >> 31 Old Nichol Street >> London >> E2 7HR >> > > -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com | t: +44 (0) 207 734 7033 31 Old Nichol Street London E2 7HR