Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Jacques Nadeau Wed, 30 Mar 2016 19:35:20 -0700

Yup. Will do.

The discussion today was limited to "let's meet".




On Wed, Mar 30, 2016 at 7:13 PM, P. Taylor Goetz <[email protected]> wrote:

> +1
>
> Discussions should be summarized and brought back to the mailing list(s).
> Recommendations are fine, but any decisions should be made on-list.
>
> -Taylor
>
> > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <[email protected]> wrote:
> >
> > Remember that no decisions should be made at the meeting. It's fine to
> > have discussions, but those need to be brought back to the community
> > before decisions are made. Summarizing for the dev@ mailing list, also
> > jiras, etc... are good ways to socialize the issues.
> >
> > Patrick
> >
> >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <[email protected]>
> wrote:
> >> The community for both podlings are bigger than the ones show up at
> Strata
> >> =)
> >>
> >> Would love to have the summary of the discussions in the dev@ list if
> >> indeed some discussions happening at Strata.
> >>
> >> - Henry
> >>
> >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <[email protected]>
> >> wrote:
> >>
> >>> Hi, All
> >>>
> >>> I met with Jacques today at Strata, we think it would be great that
> Arrow
> >>> and Mnemonic communities can have a F2F meeting together to talk about
> our
> >>> integration.
> >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
> >>> We can meet at  intel SC campus.
> >>>
> >>> Would you let me know if you are able to join us and which day you'd
> >>> prefer?
> >>>
> >>> Thanks
> >>> Yanping
> >>>
> >>>
> >>> On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:
> >>> [email protected]>> wrote:
> >>>
> >>> Yes, I agree with you and that's great if we could brainstorm here to
> >>> collect more ideas about enabling non-volatile memory usage for Apache
> >>> Arrow through Mnemonic.
> >>>
> >>> for the questions, my ideas are:
> >>>
> >>>
> >>> - Right now you are using unpooled persistent memory. Does that make
> sense
> >>> or does chunking make more sense?
> >>>
> >>> Gary: I think it could make some sense if developer knows that their
> >>> datasets are very big and they want Apache Arrow to keep most of them
> in
> >>> memory for intensive computing e.g. sort.
> >>>          the developer certainly can spill their Mnemonic managed
> >>> datasets into disk but this way seems a bit inefficient in some
> scenarios
> >>> that might depend on concrete application logic .
> >>>
> >>>
> >>> - What do you think is the right way to transition back and forth
> between
> >>> persistent and ephemeral memory? What do you think will be the first
> >>> pattern to be adopted. For example, do you think we should try to use
> it as
> >>> a tiered storage for sort spilling (before hitting the disk), or
> should we
> >>> use it for caching?
> >>> Gary: my 2 cents, the netty library looks not yet provide a elegant
> switch
> >>> mechanism for Arrow to use, probably we can change the logic around
> >>> "initialCapacity > directArena.chunkSize" to control which buffer put
> on
> >>> off-heap or managed by Mnemonic, another approach is to let memory
> >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces
> instead
> >>> of part logics of class PooledByteBufAllocatorL.
> >>> Regarding the sorting, I think it is a typical case of random access to
> >>> the data, we should avoid spilling as much as possible.
> >>> my 2 cents, the performance could be
> >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
> >>> using NVMe/disk >  off-heap + spilling
> >>> the code simplicity would be
> >>> all in off-heap if possible >  all in mnemonic using NVMe/disk >
> mnemonic
> >>> used as cache >  off-heap + spilling
> >>>
> >>> the reason why the mode "mnemonic used as cache + spilling" probably
> >>> unnecessary is mnemonic could provide nearly equivalent capacity of
> disk.
> >>>
> >>> Thanks.
> >>> Gary.
> >>>
> >>>
> >>> -----Original Message-----
> >>>
> >>> From: Jacques Nadeau [mailto:[email protected]]
> >>>
> >>> Sent: Tuesday, March 29, 2016 8:05 AM
> >>>
> >>> To: <mailto:[email protected]> [email protected]<mailto:
> >>> [email protected]>
> >>>
> >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative
> infra.
> >>> for Apache Arrow
> >>>
> >>>
> >>>
> >>> This is super cool. A couple of questions:
> >>>
> >>>
> >>>
> >>> - Right now you are using unpooled persistent memory. Does that make
> sense
> >>> or does chunking make more sense?
> >>>
> >>> - What do you think is the right way to transition back and forth
> between
> >>> persistent and ephemeral memory? What do you think will be the first
> >>> pattern to be adopted. For example, do you think we should try to use
> it as
> >>> a tiered storage for sort spilling (before hitting the disk), or
> should we
> >>> use it for caching?
> >>>
> >>>
> >>>
> >>> I think it will be much easier to think about this in the context of a
> >>> primary or first use case. Do you have something in mind or should we
> >>> brainstorm here?
> >>>
> >>>
> >>>
> >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <[email protected]<mailto:
> >>> [email protected]>> wrote:
> >>>
> >>>
> >>>
> >>>> Hello,
> >>>
> >>>
> >>>>   We have created a patch for Apache Arrow to leverage Apache
> >>>
> >>>> incubator Mnemonic as an alternative infra. for underlying memory
> >>>
> >>>> resources allocation, you can find it as below forked repo.
> >>>
> >>>
> >>>> <https://github.com/NonVolatileComputing/arrow>
> >>> https://github.com/NonVolatileComputing/arrow
> >>>
> >>>
> >>>>    By this way, Apache Arrow could take some structural benefits from
> >>>
> >>>> Mnemonic project they are
> >>>
> >>>
> >>>>    - Arrow is able to leverage larger capacity of high performance
> >>>
> >>>> hybrid storage devices. e.g. high-end SSD, NVMe
> >>>
> >>>
> >>>>    - Mnemonic provide a potential opportunity for Arrow to
> >>>
> >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented
> >>>
> >>>> allocation services
> >>>
> >>>
> >>>>    - The non-volatile features of  Mnemonic make it possible that
> >>>
> >>>> Arrow could make its columnar in-memory data shared between different
> >>>
> >>>> applications or across life-cycle of single application
> >>>
> >>>
> >>>>    - Arrow could take advantages of coming Mnemonic features of
> >>>
> >>>> memory clustering/DOG (distributed object graph) and massive native
> >>>
> >>>> computing
> >>>
> >>>
> >>>>    - Mnemonic helps to reduce the pressure of main memory utilization
> >>>
> >>>> and its related system wide overheads.
> >>>
> >>>
> >>>>   Our this patch is designed to minimize the changes for user to use
> >>>
> >>>> Arrow, please check out the test cases provided by this patch for your
> >>>
> >>>> reference.
> >>>
> >>>
> >>>>   Note that, we need to put allocator services to a specified
> >>>
> >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test
> >>>
> >>>> cases to run because those services are required for external
> >>>
> >>>> memory-like device management.
> >>>
> >>>
> >>>>   Please give your comments and review feedback for better
> >>>
> >>>> collaboration of Apache Arrow and Mnemonic, Thanks.
> >>>
> >>>
> >>>> Best Regards.
> >>>
> >>>> Gary.
> >>>
> >>>
> >>>
> >>>
> >>> <smime.p7m>
> >>> <gpgol000.txt>
> >>>
>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to