Yup. Will do. The discussion today was limited to "let's meet".
On Wed, Mar 30, 2016 at 7:13 PM, P. Taylor Goetz <ptgo...@gmail.com> wrote: > +1 > > Discussions should be summarized and brought back to the mailing list(s). > Recommendations are fine, but any decisions should be made on-list. > > -Taylor > > > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <ph...@apache.org> wrote: > > > > Remember that no decisions should be made at the meeting. It's fine to > > have discussions, but those need to be brought back to the community > > before decisions are made. Summarizing for the dev@ mailing list, also > > jiras, etc... are good ways to socialize the issues. > > > > Patrick > > > >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <henry.sapu...@gmail.com> > wrote: > >> The community for both podlings are bigger than the ones show up at > Strata > >> =) > >> > >> Would love to have the summary of the discussions in the dev@ list if > >> indeed some discussions happening at Strata. > >> > >> - Henry > >> > >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com> > >> wrote: > >> > >>> Hi, All > >>> > >>> I met with Jacques today at Strata, we think it would be great that > Arrow > >>> and Mnemonic communities can have a F2F meeting together to talk about > our > >>> integration. > >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday. > >>> We can meet at intel SC campus. > >>> > >>> Would you let me know if you are able to join us and which day you'd > >>> prefer? > >>> > >>> Thanks > >>> Yanping > >>> > >>> > >>> On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto: > >>> ga...@apache.org>> wrote: > >>> > >>> Yes, I agree with you and that's great if we could brainstorm here to > >>> collect more ideas about enabling non-volatile memory usage for Apache > >>> Arrow through Mnemonic. > >>> > >>> for the questions, my ideas are: > >>> > >>> > >>> - Right now you are using unpooled persistent memory. Does that make > sense > >>> or does chunking make more sense? > >>> > >>> Gary: I think it could make some sense if developer knows that their > >>> datasets are very big and they want Apache Arrow to keep most of them > in > >>> memory for intensive computing e.g. sort. > >>> the developer certainly can spill their Mnemonic managed > >>> datasets into disk but this way seems a bit inefficient in some > scenarios > >>> that might depend on concrete application logic . > >>> > >>> > >>> - What do you think is the right way to transition back and forth > between > >>> persistent and ephemeral memory? What do you think will be the first > >>> pattern to be adopted. For example, do you think we should try to use > it as > >>> a tiered storage for sort spilling (before hitting the disk), or > should we > >>> use it for caching? > >>> Gary: my 2 cents, the netty library looks not yet provide a elegant > switch > >>> mechanism for Arrow to use, probably we can change the logic around > >>> "initialCapacity > directArena.chunkSize" to control which buffer put > on > >>> off-heap or managed by Mnemonic, another approach is to let memory > >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces > instead > >>> of part logics of class PooledByteBufAllocatorL. > >>> Regarding the sorting, I think it is a typical case of random access to > >>> the data, we should avoid spilling as much as possible. > >>> my 2 cents, the performance could be > >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic > >>> using NVMe/disk > off-heap + spilling > >>> the code simplicity would be > >>> all in off-heap if possible > all in mnemonic using NVMe/disk > > mnemonic > >>> used as cache > off-heap + spilling > >>> > >>> the reason why the mode "mnemonic used as cache + spilling" probably > >>> unnecessary is mnemonic could provide nearly equivalent capacity of > disk. > >>> > >>> Thanks. > >>> Gary. > >>> > >>> > >>> -----Original Message----- > >>> > >>> From: Jacques Nadeau [mailto:jacq...@apache.org] > >>> > >>> Sent: Tuesday, March 29, 2016 8:05 AM > >>> > >>> To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto: > >>> dev@arrow.apache.org> > >>> > >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative > infra. > >>> for Apache Arrow > >>> > >>> > >>> > >>> This is super cool. A couple of questions: > >>> > >>> > >>> > >>> - Right now you are using unpooled persistent memory. Does that make > sense > >>> or does chunking make more sense? > >>> > >>> - What do you think is the right way to transition back and forth > between > >>> persistent and ephemeral memory? What do you think will be the first > >>> pattern to be adopted. For example, do you think we should try to use > it as > >>> a tiered storage for sort spilling (before hitting the disk), or > should we > >>> use it for caching? > >>> > >>> > >>> > >>> I think it will be much easier to think about this in the context of a > >>> primary or first use case. Do you have something in mind or should we > >>> brainstorm here? > >>> > >>> > >>> > >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto: > >>> ga...@apache.org>> wrote: > >>> > >>> > >>> > >>>> Hello, > >>> > >>> > >>>> We have created a patch for Apache Arrow to leverage Apache > >>> > >>>> incubator Mnemonic as an alternative infra. for underlying memory > >>> > >>>> resources allocation, you can find it as below forked repo. > >>> > >>> > >>>> <https://github.com/NonVolatileComputing/arrow> > >>> https://github.com/NonVolatileComputing/arrow > >>> > >>> > >>>> By this way, Apache Arrow could take some structural benefits from > >>> > >>>> Mnemonic project they are > >>> > >>> > >>>> - Arrow is able to leverage larger capacity of high performance > >>> > >>>> hybrid storage devices. e.g. high-end SSD, NVMe > >>> > >>> > >>>> - Mnemonic provide a potential opportunity for Arrow to > >>> > >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented > >>> > >>>> allocation services > >>> > >>> > >>>> - The non-volatile features of Mnemonic make it possible that > >>> > >>>> Arrow could make its columnar in-memory data shared between different > >>> > >>>> applications or across life-cycle of single application > >>> > >>> > >>>> - Arrow could take advantages of coming Mnemonic features of > >>> > >>>> memory clustering/DOG (distributed object graph) and massive native > >>> > >>>> computing > >>> > >>> > >>>> - Mnemonic helps to reduce the pressure of main memory utilization > >>> > >>>> and its related system wide overheads. > >>> > >>> > >>>> Our this patch is designed to minimize the changes for user to use > >>> > >>>> Arrow, please check out the test cases provided by this patch for your > >>> > >>>> reference. > >>> > >>> > >>>> Note that, we need to put allocator services to a specified > >>> > >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test > >>> > >>>> cases to run because those services are required for external > >>> > >>>> memory-like device management. > >>> > >>> > >>>> Please give your comments and review feedback for better > >>> > >>>> collaboration of Apache Arrow and Mnemonic, Thanks. > >>> > >>> > >>>> Best Regards. > >>> > >>>> Gary. > >>> > >>> > >>> > >>> > >>> <smime.p7m> > >>> <gpgol000.txt> > >>> >