Unsafe et al

Gian Merlino Thu, 06 Feb 2020 11:33:19 -0800

We could make an interface that is a subset of Memory and use that. But I
think once the new JEP 370 stuff becomes mainstream it would be best to use
it directly, since it offers a richer API and will probably become
considered idiomatic Java over time. So we may still want to drop the
abstraction later, once we drop support for Java pre-14.


Separately, I think if we do build an abstraction layer here, we need to
make sure the performance overhead is zero — it's important that the jvm be
able to inline the underlying calls.

> @Gian Merlino I think i am not 100% sure about the scope you are talking
about, thought you are talking about
https://github.com/apache/druid/issues/3892 and doing the work from bottom
up (storage to processing ...)

For now I'm only proposing using a non-BB API (whether it's Memory or our
own abstraction) for the VectorAggregator interface and the query engines
that use it. I don't see a strong motivation to tackle the storage layer at
this time. The reason is that when I profile Druid queries I usually see
the lions share of time being spend in query engines and aggregators, so I
think that's what we need to tackle first from a performance perspective.

Btw, a side note, this talk is great:
https://www.youtube.com/watch?v=iwSCtxMbBLI

On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <[email protected]>
wrote:

> @Jad Sounds good approach was thinking the same as well how can we expose
> those using higher level API.
> @Gian Merlino <[email protected]> I think i am not 100% sure about the
> scope you are talking about, thought you are talking about
> https://github.com/apache/druid/issues/3892 and doing the work from
> bottom up (storage to processing ...)
> Can you please highlight the scope and would love to help if you think
> adding such abstraction will add an extra work overhead that you do want to
> avoid.
>
>
> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <[email protected]> wrote:
>
>> But then you could never get rid of it... Ideally abstraction layers
>> should
>> be constrained to the actual uses within the system using them instead of
>> making something that may be too general and would be hard to replace.
>>
>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <[email protected]> wrote:
>>
>> > I wonder if Memory itself could be that layer?
>> >
>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <[email protected]> wrote:
>> >
>> > > We could build an abstraction layer on top of the memory interface
>> > provided
>> > > by DataSketches. When the JDK gets the new stuff, we can just change
>> the
>> > > implementation of the abstraction.
>> > >
>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <[email protected]> wrote:
>> > >
>> > > > The thing that worries me about JEP 370 is that if historical Java
>> user
>> > > > migration patterns hold up, we will need to support Java 11 for a
>> while
>> > > > (probably another 2–3 years), and we would therefore need to wait
>> that
>> > > long
>> > > > to use JEP 370. It seems like a long time and until then we would be
>> > > stuck
>> > > > with a pretty inferior API.
>> > > >
>> > > > I also would prefer not having to rewrite code a bunch of times, but
>> > > that's
>> > > > why I suggested starting by using Memory for the VectorAggregator
>> > > interface
>> > > > and stuff that interacts with it. There isn't that much code there
>> yet
>> > > > (only a few aggregators implement VectorAggregator). So we will
>> need to
>> > > > write most of it for the first time, and since it is fresh code, I
>> > think
>> > > > it'd be nice to use the best API currently available in Java 8 / 11.
>> > From
>> > > > what I see that is Memory.
>> > > >
>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hi Gian,
>> > > > > Thanks for bringing this up.
>> > > > > IMO for the long run and looking at how much code will have to
>> > change,
>> > > it
>> > > > > makes more sense to rely on JDK based API JEP 370 and have this
>> work
>> > > done
>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is far
>> > > away,
>> > > > > seems like there is a good momentum around it.
>> > > > > This does not exclude or means we should not use Memory API for
>> other
>> > > > stuff
>> > > > > like sketches et al, in fact i think even for project like
>> Sketches
>> > it
>> > > > > makes more sense to move to newer API offered by the JDK rather
>> that
>> > do
>> > > > it
>> > > > > your self.
>> > > > >
>> > > > >
>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <[email protected]>
>> > wrote:
>> > > > >
>> > > > > > Hey Druids,
>> > > > > >
>> > > > > > There has generally been a lot of talk about moving away from
>> > > > ByteBuffer
>> > > > > > and towards the DataSketches Memory package (
>> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
>> or
>> > > > even
>> > > > > > using Unsafe directly. Much of that discussion happened on
>> > > > > > https://github.com/apache/druid/issues/3892.
>> > > > > >
>> > > > > > Recently a patch was merged that added datasketches-memory as a
>> > > > > dependency
>> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308.
>> > The
>> > > > > reason
>> > > > > > was partially due to better performance and partially due to
>> nicer
>> > > API
>> > > > > > (both reasons mentioned in #3892 as well).
>> > > > > >
>> > > > > > JEP 370 is a potential long term solution but it seems a while
>> away
>> > > > from
>> > > > > > being ready: https://openjdk.java.net/jeps/370
>> > > > > >
>> > > > > > I wanted to bring the larger discussion back up and see what
>> people
>> > > > think
>> > > > > > is a good path forward.
>> > > > > >
>> > > > > > My suggestion is that we migrate the VectorAggregator interface
>> to
>> > > use
>> > > > > > Memory, but keep BufferAggregator the way it is. That way, as we
>> > > build
>> > > > > out
>> > > > > > support for vectorization (right now, only timeseries/groupby
>> > support
>> > > > it,
>> > > > > > and only a few aggregators, but we should be building this out)
>> > we'll
>> > > > be
>> > > > > > doing it with a nicer and potentially faster API. But we won't
>> need
>> > > to
>> > > > go
>> > > > > > back and redo a bunch of old code, since we'll keep the
>> > > non-vectorized
>> > > > > code
>> > > > > > paths the way they are. (And hopefully, one day, delete them all
>> > > > > outright.)
>> > > > > >
>> > > > > > Gian
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
> --
>
> B-Slim
> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
>

Re: ByteBuffer / Memory / Unsafe et al

Reply via email to