Clarification: in my previous email, I was not talking about spark-streaming-flume artifact or spark-streaming-kafka artifact.
I was talking about examples for these projects, such as examples//src/main/python/streaming/flume_wordcount.py On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> The same question can be asked w.r.t. examples for other projects, such >> as flume and kafka. >> > > The main difference being that flume and kafka integration are part of > Spark itself. HBase integration is not. > > > >> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mtus...@handybook.com> >> wrote: >> >>> Let's posit that the spark example is much better than what is available >>> in HBase. Why is that a reason to keep it within Spark? >>> >>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> bq. HBase's current support, even if there are bugs or things that >>>> still need to be done, is much better than the Spark example >>>> >>>> In my opinion, a simple example that works is better than a buggy >>>> package. >>>> >>>> I hope before long the hbase-spark module in HBase can arrive at a >>>> state which we can advertise as mature - but we're not there yet. >>>> >>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <van...@cloudera.com> >>>> wrote: >>>> >>>>> You're completely missing my point. I'm saying that HBase's current >>>>> support, even if there are bugs or things that still need to be done, >>>>> is much better than the Spark example, which is basically a call to >>>>> "SparkContext.hadoopRDD". >>>>> >>>>> Spark's example is not helpful in learning how to build an HBase >>>>> application on Spark, and clashes head on with how the HBase >>>>> developers think it should be done. That, and because it brings too >>>>> many dependencies for something that is not really useful, is why I'm >>>>> suggesting removing it. >>>>> >>>>> >>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473 >>>>> > >>>>> > I would say the refguide link you provided should not be considered >>>>> as >>>>> > complete. >>>>> > >>>>> > Note it is marked as Blocker by Sean B. >>>>> > >>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin < >>>>> van...@cloudera.com> >>>>> > wrote: >>>>> >> >>>>> >> You're entitled to your own opinions. >>>>> >> >>>>> >> While you're at it, here's some much better documentation, from the >>>>> >> HBase project themselves, than what the Spark example provides: >>>>> >> http://hbase.apache.org/book.html#spark >>>>> >> >>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >> > bq. it's actually in use right now in spite of not being in any >>>>> upstream >>>>> >> > HBase release >>>>> >> > >>>>> >> > If it is not in upstream, then it is not relevant for discussion >>>>> on >>>>> >> > Apache >>>>> >> > mailing list. >>>>> >> > >>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin < >>>>> van...@cloudera.com> >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> Alright, if you prefer, I'll say "it's actually in use right now >>>>> in >>>>> >> >> spite of not being in any upstream HBase release", and it's more >>>>> >> >> useful than a single example file in the Spark repo for those who >>>>> >> >> really want to integrate with HBase. >>>>> >> >> >>>>> >> >> Spark's example is really very trivial (just uses one of HBase's >>>>> input >>>>> >> >> formats), which makes it not very useful as a blueprint for >>>>> developing >>>>> >> >> HBase apps with Spark. >>>>> >> >> >>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >> >> > bq. I wouldn't call it "incomplete". >>>>> >> >> > >>>>> >> >> > I would call it incomplete. >>>>> >> >> > >>>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short, >>>>> integer, >>>>> >> >> > long, >>>>> >> >> > float and double' which is a bug fix. >>>>> >> >> > >>>>> >> >> > Please exclude presence of related of module in vendor distro >>>>> from >>>>> >> >> > this >>>>> >> >> > discussion. >>>>> >> >> > >>>>> >> >> > Thanks >>>>> >> >> > >>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin >>>>> >> >> > <van...@cloudera.com> >>>>> >> >> > wrote: >>>>> >> >> >> >>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yuzhih...@gmail.com >>>>> > >>>>> >> >> >> wrote: >>>>> >> >> >> > I want to note that the hbase-spark module in HBase is >>>>> incomplete. >>>>> >> >> >> > Zhan >>>>> >> >> >> > has >>>>> >> >> >> > several patches pending review. >>>>> >> >> >> >>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is >>>>> there, >>>>> >> >> >> which >>>>> >> >> >> doesn't mean new ones, or more efficient implementations of >>>>> existing >>>>> >> >> >> ones, can't be added. >>>>> >> >> >> >>>>> >> >> >> > hbase-spark module is currently only in master branch which >>>>> would >>>>> >> >> >> > be >>>>> >> >> >> > released as 2.0 >>>>> >> >> >> >>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it >>>>> matters >>>>> >> >> >> much >>>>> >> >> >> for upstream HBase. >>>>> >> >> >> >>>>> >> >> >> -- >>>>> >> >> >> Marcelo >>>>> >> >> > >>>>> >> >> > >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> -- >>>>> >> >> Marcelo >>>>> >> > >>>>> >> > >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Marcelo >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Marcelo >>>>> >>>> >>>> >>> >>> Want to work at Handy? Check out our culture deck and open roles >>> <http://www.handy.com/careers> >>> Latest news <http://www.handy.com/press> at Handy >>> Handy just raised $50m >>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >>> led >>> by Fidelity >>> >>> >> > > > -- > Marcelo >