I think someone who uses third party benchmarks to assess a system like HBase or Accumulo (or Cassandra...) is taking a foolish shortcut, so perhaps we must agree to disagree.
On Wed, Aug 19, 2015 at 2:34 PM, Jeremy Kepner <kep...@ll.mit.edu> wrote: > I agree, that performance on real apps is the most important for > any particular organization, but as technologists how do we measure > ourselves? > Hence imperfect benchmarking remains our only recourse. > > On Wed, Aug 19, 2015 at 12:34:44PM -0700, Andrew Purtell wrote: > > I can't speak for anyone other than myself in the HBase community, but > I'm > > much more interested and focused on performance analysis and > > developing/deploying for the use cases of my employer than participating > in > > generic bench-marketing to make weapons for happy OSS warriors. Perhaps > > this does a disservice to the HBase project overall and if so then I > > apologize to others on the project for that. > > > > That said, from long and bitter experience let me state the only > benchmarks > > that every really matter are the comparative benchmarks you make for your > > own use cases in your own environments, preferably exercising those > > candidates with real data and operating conditions. See: > > https://pbs.twimg.com/media/CMnTyKVUEAA1tOm.jpg (smile) > > > > > > > > On Wed, Aug 19, 2015 at 12:27 PM, Josh Elser <josh.el...@gmail.com> > wrote: > > > > > Alright, I have to ask... are you referring to the paper that cites > > > Accumulo performance without write-ahead logs enabled? I have some > serious > > > reservations about the relevance of that paper to this conversation and > > > just want to make sure people aren't led astray by what the actual > takeaway > > > should be. > > > > > > Jeremy Kepner wrote: > > > > > >> A big difference between Accumulo and HBase is the published > performance > > >> numbers. > > >> The Accumulo community has done a good job of continuing to publish > > >> up-to-date performance > > >> numbers in peer-reviewed venues which allow Accumulo to claim best in > the > > >> world performance. > > >> > > >> The HBase community hasn't been doing that so much. It would be > great if > > >> they did because > > >> the HBase points on the graphs are old and it would be good to get new > > >> ones. > > >> > > >> > > >> > > >> On Wed, Aug 19, 2015 at 02:30:58PM -0400, Josh Elser wrote: > > >> > > >>> Like I've said many times now, it's relative to your actual problem. > > >>> If you don't have that much data (or intend to grow into that much > > >>> data), it's not an issue. Obviously, this is the case for you. > > >>> > > >>> However, it is an architectural difference between the two projects > > >>> with known limitations for a single metadata region. It's a > > >>> difference as what was asked for by Jerry. > > >>> > > >>> Ted Malaska wrote: > > >>> > > >>>> I've been doing HBase for a long time and never had an issue with > region > > >>>> count limits and I have clusters with 10s of billions of records. > Many > > >>>> there would be issues around a couple Trillion records, but never > got > > >>>> that > > >>>> high yet. > > >>>> > > >>>> Ted Malaska > > >>>> > > >>>> On Wed, Aug 19, 2015 at 2:24 PM, Josh Elser<josh.el...@gmail.com> > > >>>> wrote: > > >>>> > > >>>> Oh, one other thing that I should mention (was prompted off-list). > > >>>>> > > >>>>> (definition time since cross-list now: HBase regions == Accumulo > > >>>>> tablets) > > >>>>> > > >>>>> Accumulo will handle many more regions than HBase does now due to a > > >>>>> splittable metadata table. While I was told this was a very long > and > > >>>>> arduous journey to implement correctly (WRT splitting, merges and > bulk > > >>>>> loading), users with "too many regions" problems are extremely few > and > > >>>>> far > > >>>>> between for Accumulo. > > >>>>> > > >>>>> I was very happy to see effort/design being put into this in HBase. > > >>>>> And, > > >>>>> just to be fair in criticism/praises, HBase does appear to me to do > > >>>>> assignments of regions much faster than Accumulo does on a small > > >>>>> cluster > > >>>>> (~5-10 nodes). Accumulo may take a few seconds to notice and > reassign > > >>>>> tablets. I have yet to notice this with HBase (which also could be > due > > >>>>> to > > >>>>> lack of personal testing). > > >>>>> > > >>>>> > > >>>>> Jerry He wrote: > > >>>>> > > >>>>> Hi, folks > > >>>>>> > > >>>>>> We have people that are evaluating HBase vs Accumulo. > > >>>>>> Security is an important factor. > > >>>>>> > > >>>>>> But I think after the Cell security was added in HBase, there is > no > > >>>>>> more > > >>>>>> real gap compared to Accumulo. > > >>>>>> > > >>>>>> I know we have both HBase and Accumulo experts on this list. > > >>>>>> Could someone shred more light? > > >>>>>> I am looking for real gap comparing HBase to Accumulo if there is > any > > >>>>>> so > > >>>>>> that I can be prepared to address them. This is not limited to the > > >>>>>> security > > >>>>>> area. > > >>>>>> > > >>>>>> There are differences in some features and implementations. But > they > > >>>>>> don't > > >>>>>> see like real 'gaps'. > > >>>>>> > > >>>>>> Any comments and feedbacks are welcome. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Jerry > > >>>>>> > > >>>>>> > > >>>>>> > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)