Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Patrick Hunt Thu, 31 Mar 2016 07:55:37 -0700

fwiw I've seen some projects use hangouts/webex pretty effectively.

Patrick


On Wed, Mar 30, 2016 at 11:15 PM, Wang, Yanping <[email protected]> wrote:
> Yeah, I was so busy and in hurry to catch other sessions. We only talked 
> about 2 minutes :-)
> After Jacques and Wes's Arrow presentation, someone in audiences asked if 
> Arrow is going to use RDMA, I answered: RDMA is going to be used in Mnemonic 
> project to support data transfer among nodes and clusters.
> It makes perfect sense we position Mnemonic under Arrow to support its use of 
> persistent storage media.
>
> Thanks Patrick, Henry, Tayler G for the guideline. We can brainstorm ideas in 
> both dev lists, and post those ideas in jira so developers can see where our 
> projects are heading to.
> Gary and I are located in Portland Oregon, we usually plan our SC visits 2 
> weeks ahead.
>
> Thanks,
> Yanping
>
>
> -----Original Message-----
> From: Jacques Nadeau [mailto:[email protected]]
> Sent: Wednesday, March 30, 2016 7:34 PM
> To: [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. 
> for Apache Arrow
>
> Yup. Will do.
>
> The discussion today was limited to "let's meet".
>
>
>
> On Wed, Mar 30, 2016 at 7:13 PM, P. Taylor Goetz <[email protected]> wrote:
>
>> +1
>>
>> Discussions should be summarized and brought back to the mailing list(s).
>> Recommendations are fine, but any decisions should be made on-list.
>>
>> -Taylor
>>
>> > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <[email protected]> wrote:
>> >
>> > Remember that no decisions should be made at the meeting. It's fine to
>> > have discussions, but those need to be brought back to the community
>> > before decisions are made. Summarizing for the dev@ mailing list, also
>> > jiras, etc... are good ways to socialize the issues.
>> >
>> > Patrick
>> >
>> >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <[email protected]>
>> wrote:
>> >> The community for both podlings are bigger than the ones show up at
>> Strata
>> >> =)
>> >>
>> >> Would love to have the summary of the discussions in the dev@ list if
>> >> indeed some discussions happening at Strata.
>> >>
>> >> - Henry
>> >>
>> >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <[email protected]>
>> >> wrote:
>> >>
>> >>> Hi, All
>> >>>
>> >>> I met with Jacques today at Strata, we think it would be great that
>> Arrow
>> >>> and Mnemonic communities can have a F2F meeting together to talk about
>> our
>> >>> integration.
>> >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
>> >>> We can meet at  intel SC campus.
>> >>>
>> >>> Would you let me know if you are able to join us and which day you'd
>> >>> prefer?
>> >>>
>> >>> Thanks
>> >>> Yanping
>> >>>
>> >>>
>> >>> On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:
>> >>> [email protected]>> wrote:
>> >>>
>> >>> Yes, I agree with you and that's great if we could brainstorm here to
>> >>> collect more ideas about enabling non-volatile memory usage for Apache
>> >>> Arrow through Mnemonic.
>> >>>
>> >>> for the questions, my ideas are:
>> >>>
>> >>>
>> >>> - Right now you are using unpooled persistent memory. Does that make
>> sense
>> >>> or does chunking make more sense?
>> >>>
>> >>> Gary: I think it could make some sense if developer knows that their
>> >>> datasets are very big and they want Apache Arrow to keep most of them
>> in
>> >>> memory for intensive computing e.g. sort.
>> >>>          the developer certainly can spill their Mnemonic managed
>> >>> datasets into disk but this way seems a bit inefficient in some
>> scenarios
>> >>> that might depend on concrete application logic .
>> >>>
>> >>>
>> >>> - What do you think is the right way to transition back and forth
>> between
>> >>> persistent and ephemeral memory? What do you think will be the first
>> >>> pattern to be adopted. For example, do you think we should try to use
>> it as
>> >>> a tiered storage for sort spilling (before hitting the disk), or
>> should we
>> >>> use it for caching?
>> >>> Gary: my 2 cents, the netty library looks not yet provide a elegant
>> switch
>> >>> mechanism for Arrow to use, probably we can change the logic around
>> >>> "initialCapacity > directArena.chunkSize" to control which buffer put
>> on
>> >>> off-heap or managed by Mnemonic, another approach is to let memory
>> >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces
>> instead
>> >>> of part logics of class PooledByteBufAllocatorL.
>> >>> Regarding the sorting, I think it is a typical case of random access to
>> >>> the data, we should avoid spilling as much as possible.
>> >>> my 2 cents, the performance could be
>> >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
>> >>> using NVMe/disk >  off-heap + spilling
>> >>> the code simplicity would be
>> >>> all in off-heap if possible >  all in mnemonic using NVMe/disk >
>> mnemonic
>> >>> used as cache >  off-heap + spilling
>> >>>
>> >>> the reason why the mode "mnemonic used as cache + spilling" probably
>> >>> unnecessary is mnemonic could provide nearly equivalent capacity of
>> disk.
>> >>>
>> >>> Thanks.
>> >>> Gary.
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>>
>> >>> From: Jacques Nadeau [mailto:[email protected]]
>> >>>
>> >>> Sent: Tuesday, March 29, 2016 8:05 AM
>> >>>
>> >>> To: <mailto:[email protected]> [email protected]<mailto:
>> >>> [email protected]>
>> >>>
>> >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative
>> infra.
>> >>> for Apache Arrow
>> >>>
>> >>>
>> >>>
>> >>> This is super cool. A couple of questions:
>> >>>
>> >>>
>> >>>
>> >>> - Right now you are using unpooled persistent memory. Does that make
>> sense
>> >>> or does chunking make more sense?
>> >>>
>> >>> - What do you think is the right way to transition back and forth
>> between
>> >>> persistent and ephemeral memory? What do you think will be the first
>> >>> pattern to be adopted. For example, do you think we should try to use
>> it as
>> >>> a tiered storage for sort spilling (before hitting the disk), or
>> should we
>> >>> use it for caching?
>> >>>
>> >>>
>> >>>
>> >>> I think it will be much easier to think about this in the context of a
>> >>> primary or first use case. Do you have something in mind or should we
>> >>> brainstorm here?
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <[email protected]<mailto:
>> >>> [email protected]>> wrote:
>> >>>
>> >>>
>> >>>
>> >>>> Hello,
>> >>>
>> >>>
>> >>>>   We have created a patch for Apache Arrow to leverage Apache
>> >>>
>> >>>> incubator Mnemonic as an alternative infra. for underlying memory
>> >>>
>> >>>> resources allocation, you can find it as below forked repo.
>> >>>
>> >>>
>> >>>> <https://github.com/NonVolatileComputing/arrow>
>> >>> https://github.com/NonVolatileComputing/arrow
>> >>>
>> >>>
>> >>>>    By this way, Apache Arrow could take some structural benefits from
>> >>>
>> >>>> Mnemonic project they are
>> >>>
>> >>>
>> >>>>    - Arrow is able to leverage larger capacity of high performance
>> >>>
>> >>>> hybrid storage devices. e.g. high-end SSD, NVMe
>> >>>
>> >>>
>> >>>>    - Mnemonic provide a potential opportunity for Arrow to
>> >>>
>> >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented
>> >>>
>> >>>> allocation services
>> >>>
>> >>>
>> >>>>    - The non-volatile features of  Mnemonic make it possible that
>> >>>
>> >>>> Arrow could make its columnar in-memory data shared between different
>> >>>
>> >>>> applications or across life-cycle of single application
>> >>>
>> >>>
>> >>>>    - Arrow could take advantages of coming Mnemonic features of
>> >>>
>> >>>> memory clustering/DOG (distributed object graph) and massive native
>> >>>
>> >>>> computing
>> >>>
>> >>>
>> >>>>    - Mnemonic helps to reduce the pressure of main memory utilization
>> >>>
>> >>>> and its related system wide overheads.
>> >>>
>> >>>
>> >>>>   Our this patch is designed to minimize the changes for user to use
>> >>>
>> >>>> Arrow, please check out the test cases provided by this patch for your
>> >>>
>> >>>> reference.
>> >>>
>> >>>
>> >>>>   Note that, we need to put allocator services to a specified
>> >>>
>> >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test
>> >>>
>> >>>> cases to run because those services are required for external
>> >>>
>> >>>> memory-like device management.
>> >>>
>> >>>
>> >>>>   Please give your comments and review feedback for better
>> >>>
>> >>>> collaboration of Apache Arrow and Mnemonic, Thanks.
>> >>>
>> >>>
>> >>>> Best Regards.
>> >>>
>> >>>> Gary.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> <smime.p7m>
>> >>> <gpgol000.txt>
>> >>>
>>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to