I have a slab allocated cache coded up, testing in YCSB right now :).

On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <jason.rutherg...@gmail.com
> wrote:

> > Especially when a perf solution is already here. Use Mapr or
> > hdfs-347/local reads.
>
> Right.  It goes back to avoiding GC and performing memory deallocation
> manually (like C).  I think this makes sense given the number of
> issues people have with HBase and GC (more so than Lucene for
> example).  MapR doesn't help with the GC issues.  If MapR had a JNI
> interface into an external block cache then that'd be a different
> story.  :)  And I'm sure it's quite doable.
>
> > But even beyond that the performance improvements are insane. We are
> talking
> > like 8-9x perf on my tests. Not to mention substantially reduced latency.
>
> Was the comparison against HDFS-347?
>
> On Fri, Jul 8, 2011 at 7:31 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> > On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
> > wrote:
> >>
> >> > When running on top of Mapr, hbase has fast cached access to locally
> > stored
> >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure
> > that
> >> > local reads are fast and come out of cache as necessary. Eg: the
> kernel
> >> > block cache.
> >>
> >> Agreed!  However I don't see how that's possible today.  Eg, it'd
> >> require more of a byte buffer type of API to HDFS, random reads not
> >> using streams.  It's easy to add.
> >
> > I don't think its as easy as you say. And even using the stream API Mapr
> > delivers a lot more performance. And this is from my own tests not a
> white
> > paper.
> >
> >>
> >> I think the biggest win for HBase with MapR is the lack of the
> >> NameNode issues and snapshotting.  In particular, snapshots are pretty
> >> much a standard RDBMS feature.
> >
> > That is good too - if you are using hbase in real time prod you need to
> look
> > at Mapr.
> >
> > But even beyond that the performance improvements are insane. We are
> talking
> > like 8-9x perf on my tests. Not to mention substantially reduced latency.
> >
> > I'll repeat again, local accelerated access is going to be a required
> > feature. It already is.
> >
> > I investigated using dbb once upon a time, I concluded that managing the
> ref
> > counts would be a nightmare, and the better solution was to copy
> keyvalues
> > out of the dbb during scans.
> >
> > Injecting refcount code seems like a worse remedy than the problem. Hbase
> > doesn't have as many bugs but explicit ref counting everywhere seems
> > dangerous. Especially when a perf solution is already here. Use Mapr or
> > hdfs-347/local reads.
> >>
> >> > Managing the block cache in not heap might work but you also might get
> > there and find the dbb accounting
> >> > overhead kills.
> >>
> >> Lucene uses/abuses ref counting so I'm familiar with the downsides.
> >> When it works, it's great, when it doesn't it's a nightmare to debug.
> >> It is possible to make it work though.  I don't think there would be
> >> overhead from it, ie, any pool of objects implements ref counting.
> >>
> >> It'd be nice to not have a block cache however it's necessary for
> >> caching compressed [on disk] blocks.
> >>
> >> On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >> > Hey,
> >> >
> >> > When running on top of Mapr, hbase has fast cached access to locally
> > stored
> >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure
> > that
> >> > local reads are fast and come out of cache as necessary. Eg: the
> kernel
> >> > block cache.
> >> >
> >> > I wouldn't support mmap, it would require 2 different read path
> >> > implementations. You will never know when a read is not local.
> >> >
> >> > Hdfs needs to provide faster local reads imo. Managing the block cache
> > in
> >> > not heap might work but you also might get there and find the dbb
> > accounting
> >> > overhead kills.
> >> > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <
> jason.rutherg...@gmail.com>
> >> > wrote:
> >> >> There are couple of things here, one is direct byte buffers to put
> the
> >> >> blocks outside of heap, the other is MMap'ing the blocks directly
> from
> >> >> the underlying HDFS file.
> >> >>
> >> >> I think they both make sense. And I'm not sure MapR's solution will
> >> >> be that much better if the latter is implemented in HBase.
> >> >>
> >> >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com>
> wrote:
> >> >>> The overhead in a byte buffer is the extra integers to keep track of
> > the
> >> >>> mark, position, limit.
> >> >>>
> >> >>> I am not sure that putting the block cache in to heap is the way to
> > go.
> >> >>> Getting faster local dfs reads is important, and if you run hbase on
> > top
> >> > of
> >> >>> Mapr, these things are taken care of for you.
> >> >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <
> jason.rutherg...@gmail.com
> >>
> >> >>> wrote:
> >> >>>> Also, it's for a good cause, moving the blocks out of main heap
> using
> >> >>>> direct byte buffers or some other more native-like facility (if
> DBB's
> >> >>>> don't work).
> >> >>>>
> >> >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com>
> > wrote:
> >> >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the
> > API
> >> >>>>> is...annoying.
> >> >>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
> > jason.rutherg...@gmail.com>
> >> >>>>> wrote:
> >> >>>>>> Is there an open issue for this? How hard will this be? :)
> >> >>>>>
> >> >>>
> >> >
> >
>

Reply via email to