We could make an interface that is a subset of Memory and use that. But I think once the new JEP 370 stuff becomes mainstream it would be best to use it directly, since it offers a richer API and will probably become considered idiomatic Java over time. So we may still want to drop the abstraction later, once we drop support for Java pre-14.
Separately, I think if we do build an abstraction layer here, we need to make sure the performance overhead is zero — it's important that the jvm be able to inline the underlying calls. > @Gian Merlino I think i am not 100% sure about the scope you are talking about, thought you are talking about https://github.com/apache/druid/issues/3892 and doing the work from bottom up (storage to processing ...) For now I'm only proposing using a non-BB API (whether it's Memory or our own abstraction) for the VectorAggregator interface and the query engines that use it. I don't see a strong motivation to tackle the storage layer at this time. The reason is that when I profile Druid queries I usually see the lions share of time being spend in query engines and aggregators, so I think that's what we need to tackle first from a performance perspective. Btw, a side note, this talk is great: https://www.youtube.com/watch?v=iwSCtxMbBLI On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <slim.bougue...@gmail.com> wrote: > @Jad Sounds good approach was thinking the same as well how can we expose > those using higher level API. > @Gian Merlino <g...@apache.org> I think i am not 100% sure about the > scope you are talking about, thought you are talking about > https://github.com/apache/druid/issues/3892 and doing the work from > bottom up (storage to processing ...) > Can you please highlight the scope and would love to help if you think > adding such abstraction will add an extra work overhead that you do want to > avoid. > > > On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <jad.na...@imply.io> wrote: > >> But then you could never get rid of it... Ideally abstraction layers >> should >> be constrained to the actual uses within the system using them instead of >> making something that may be too general and would be hard to replace. >> >> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <g...@apache.org> wrote: >> >> > I wonder if Memory itself could be that layer? >> > >> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <jad.na...@imply.io> wrote: >> > >> > > We could build an abstraction layer on top of the memory interface >> > provided >> > > by DataSketches. When the JDK gets the new stuff, we can just change >> the >> > > implementation of the abstraction. >> > > >> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <g...@apache.org> wrote: >> > > >> > > > The thing that worries me about JEP 370 is that if historical Java >> user >> > > > migration patterns hold up, we will need to support Java 11 for a >> while >> > > > (probably another 2–3 years), and we would therefore need to wait >> that >> > > long >> > > > to use JEP 370. It seems like a long time and until then we would be >> > > stuck >> > > > with a pretty inferior API. >> > > > >> > > > I also would prefer not having to rewrite code a bunch of times, but >> > > that's >> > > > why I suggested starting by using Memory for the VectorAggregator >> > > interface >> > > > and stuff that interacts with it. There isn't that much code there >> yet >> > > > (only a few aggregators implement VectorAggregator). So we will >> need to >> > > > write most of it for the first time, and since it is fresh code, I >> > think >> > > > it'd be nice to use the best API currently available in Java 8 / 11. >> > From >> > > > what I see that is Memory. >> > > > >> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> >> > wrote: >> > > > >> > > > > Hi Gian, >> > > > > Thanks for bringing this up. >> > > > > IMO for the long run and looking at how much code will have to >> > change, >> > > it >> > > > > makes more sense to rely on JDK based API JEP 370 and have this >> work >> > > done >> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is far >> > > away, >> > > > > seems like there is a good momentum around it. >> > > > > This does not exclude or means we should not use Memory API for >> other >> > > > stuff >> > > > > like sketches et al, in fact i think even for project like >> Sketches >> > it >> > > > > makes more sense to move to newer API offered by the JDK rather >> that >> > do >> > > > it >> > > > > your self. >> > > > > >> > > > > >> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <g...@apache.org> >> > wrote: >> > > > > >> > > > > > Hey Druids, >> > > > > > >> > > > > > There has generally been a lot of talk about moving away from >> > > > ByteBuffer >> > > > > > and towards the DataSketches Memory package ( >> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) >> or >> > > > even >> > > > > > using Unsafe directly. Much of that discussion happened on >> > > > > > https://github.com/apache/druid/issues/3892. >> > > > > > >> > > > > > Recently a patch was merged that added datasketches-memory as a >> > > > > dependency >> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308. >> > The >> > > > > reason >> > > > > > was partially due to better performance and partially due to >> nicer >> > > API >> > > > > > (both reasons mentioned in #3892 as well). >> > > > > > >> > > > > > JEP 370 is a potential long term solution but it seems a while >> away >> > > > from >> > > > > > being ready: https://openjdk.java.net/jeps/370 >> > > > > > >> > > > > > I wanted to bring the larger discussion back up and see what >> people >> > > > think >> > > > > > is a good path forward. >> > > > > > >> > > > > > My suggestion is that we migrate the VectorAggregator interface >> to >> > > use >> > > > > > Memory, but keep BufferAggregator the way it is. That way, as we >> > > build >> > > > > out >> > > > > > support for vectorization (right now, only timeseries/groupby >> > support >> > > > it, >> > > > > > and only a few aggregators, but we should be building this out) >> > we'll >> > > > be >> > > > > > doing it with a nicer and potentially faster API. But we won't >> need >> > > to >> > > > go >> > > > > > back and redo a bunch of old code, since we'll keep the >> > > non-vectorized >> > > > > code >> > > > > > paths the way they are. (And hopefully, one day, delete them all >> > > > > outright.) >> > > > > > >> > > > > > Gian >> > > > > > >> > > > > >> > > > >> > > >> > >> > > > -- > > B-Slim > _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______ >