I'm on vacation the week of 4/11 and 4/18, and I'm very interested in the implications / work that can be done on the C++ side as well, so I look forward to the mailing list discussion after you meet to talk through some of the mutual efforts.
Thanks Wes On Thu, Mar 31, 2016 at 7:47 AM, Patrick Hunt <ph...@apache.org> wrote: > fwiw I've seen some projects use hangouts/webex pretty effectively. > > Patrick > > On Wed, Mar 30, 2016 at 11:15 PM, Wang, Yanping <yanping.w...@intel.com> > wrote: >> Yeah, I was so busy and in hurry to catch other sessions. We only talked >> about 2 minutes :-) >> After Jacques and Wes's Arrow presentation, someone in audiences asked if >> Arrow is going to use RDMA, I answered: RDMA is going to be used in Mnemonic >> project to support data transfer among nodes and clusters. >> It makes perfect sense we position Mnemonic under Arrow to support its use >> of persistent storage media. >> >> Thanks Patrick, Henry, Tayler G for the guideline. We can brainstorm ideas >> in both dev lists, and post those ideas in jira so developers can see where >> our projects are heading to. >> Gary and I are located in Portland Oregon, we usually plan our SC visits 2 >> weeks ahead. >> >> Thanks, >> Yanping >> >> >> -----Original Message----- >> From: Jacques Nadeau [mailto:jacq...@apache.org] >> Sent: Wednesday, March 30, 2016 7:34 PM >> To: dev@arrow.apache.org >> Cc: d...@mnemonic.incubator.apache.org; d...@mnemonic.apache.org >> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. >> for Apache Arrow >> >> Yup. Will do. >> >> The discussion today was limited to "let's meet". >> >> >> >> On Wed, Mar 30, 2016 at 7:13 PM, P. Taylor Goetz <ptgo...@gmail.com> wrote: >> >>> +1 >>> >>> Discussions should be summarized and brought back to the mailing list(s). >>> Recommendations are fine, but any decisions should be made on-list. >>> >>> -Taylor >>> >>> > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <ph...@apache.org> wrote: >>> > >>> > Remember that no decisions should be made at the meeting. It's fine to >>> > have discussions, but those need to be brought back to the community >>> > before decisions are made. Summarizing for the dev@ mailing list, also >>> > jiras, etc... are good ways to socialize the issues. >>> > >>> > Patrick >>> > >>> >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <henry.sapu...@gmail.com> >>> wrote: >>> >> The community for both podlings are bigger than the ones show up at >>> Strata >>> >> =) >>> >> >>> >> Would love to have the summary of the discussions in the dev@ list if >>> >> indeed some discussions happening at Strata. >>> >> >>> >> - Henry >>> >> >>> >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com> >>> >> wrote: >>> >> >>> >>> Hi, All >>> >>> >>> >>> I met with Jacques today at Strata, we think it would be great that >>> Arrow >>> >>> and Mnemonic communities can have a F2F meeting together to talk about >>> our >>> >>> integration. >>> >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday. >>> >>> We can meet at intel SC campus. >>> >>> >>> >>> Would you let me know if you are able to join us and which day you'd >>> >>> prefer? >>> >>> >>> >>> Thanks >>> >>> Yanping >>> >>> >>> >>> >>> >>> On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto: >>> >>> ga...@apache.org>> wrote: >>> >>> >>> >>> Yes, I agree with you and that's great if we could brainstorm here to >>> >>> collect more ideas about enabling non-volatile memory usage for Apache >>> >>> Arrow through Mnemonic. >>> >>> >>> >>> for the questions, my ideas are: >>> >>> >>> >>> >>> >>> - Right now you are using unpooled persistent memory. Does that make >>> sense >>> >>> or does chunking make more sense? >>> >>> >>> >>> Gary: I think it could make some sense if developer knows that their >>> >>> datasets are very big and they want Apache Arrow to keep most of them >>> in >>> >>> memory for intensive computing e.g. sort. >>> >>> the developer certainly can spill their Mnemonic managed >>> >>> datasets into disk but this way seems a bit inefficient in some >>> scenarios >>> >>> that might depend on concrete application logic . >>> >>> >>> >>> >>> >>> - What do you think is the right way to transition back and forth >>> between >>> >>> persistent and ephemeral memory? What do you think will be the first >>> >>> pattern to be adopted. For example, do you think we should try to use >>> it as >>> >>> a tiered storage for sort spilling (before hitting the disk), or >>> should we >>> >>> use it for caching? >>> >>> Gary: my 2 cents, the netty library looks not yet provide a elegant >>> switch >>> >>> mechanism for Arrow to use, probably we can change the logic around >>> >>> "initialCapacity > directArena.chunkSize" to control which buffer put >>> on >>> >>> off-heap or managed by Mnemonic, another approach is to let memory >>> >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces >>> instead >>> >>> of part logics of class PooledByteBufAllocatorL. >>> >>> Regarding the sorting, I think it is a typical case of random access to >>> >>> the data, we should avoid spilling as much as possible. >>> >>> my 2 cents, the performance could be >>> >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic >>> >>> using NVMe/disk > off-heap + spilling >>> >>> the code simplicity would be >>> >>> all in off-heap if possible > all in mnemonic using NVMe/disk > >>> mnemonic >>> >>> used as cache > off-heap + spilling >>> >>> >>> >>> the reason why the mode "mnemonic used as cache + spilling" probably >>> >>> unnecessary is mnemonic could provide nearly equivalent capacity of >>> disk. >>> >>> >>> >>> Thanks. >>> >>> Gary. >>> >>> >>> >>> >>> >>> -----Original Message----- >>> >>> >>> >>> From: Jacques Nadeau [mailto:jacq...@apache.org] >>> >>> >>> >>> Sent: Tuesday, March 29, 2016 8:05 AM >>> >>> >>> >>> To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto: >>> >>> dev@arrow.apache.org> >>> >>> >>> >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative >>> infra. >>> >>> for Apache Arrow >>> >>> >>> >>> >>> >>> >>> >>> This is super cool. A couple of questions: >>> >>> >>> >>> >>> >>> >>> >>> - Right now you are using unpooled persistent memory. Does that make >>> sense >>> >>> or does chunking make more sense? >>> >>> >>> >>> - What do you think is the right way to transition back and forth >>> between >>> >>> persistent and ephemeral memory? What do you think will be the first >>> >>> pattern to be adopted. For example, do you think we should try to use >>> it as >>> >>> a tiered storage for sort spilling (before hitting the disk), or >>> should we >>> >>> use it for caching? >>> >>> >>> >>> >>> >>> >>> >>> I think it will be much easier to think about this in the context of a >>> >>> primary or first use case. Do you have something in mind or should we >>> >>> brainstorm here? >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto: >>> >>> ga...@apache.org>> wrote: >>> >>> >>> >>> >>> >>> >>> >>>> Hello, >>> >>> >>> >>> >>> >>>> We have created a patch for Apache Arrow to leverage Apache >>> >>> >>> >>>> incubator Mnemonic as an alternative infra. for underlying memory >>> >>> >>> >>>> resources allocation, you can find it as below forked repo. >>> >>> >>> >>> >>> >>>> <https://github.com/NonVolatileComputing/arrow> >>> >>> https://github.com/NonVolatileComputing/arrow >>> >>> >>> >>> >>> >>>> By this way, Apache Arrow could take some structural benefits from >>> >>> >>> >>>> Mnemonic project they are >>> >>> >>> >>> >>> >>>> - Arrow is able to leverage larger capacity of high performance >>> >>> >>> >>>> hybrid storage devices. e.g. high-end SSD, NVMe >>> >>> >>> >>> >>> >>>> - Mnemonic provide a potential opportunity for Arrow to >>> >>> >>> >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented >>> >>> >>> >>>> allocation services >>> >>> >>> >>> >>> >>>> - The non-volatile features of Mnemonic make it possible that >>> >>> >>> >>>> Arrow could make its columnar in-memory data shared between different >>> >>> >>> >>>> applications or across life-cycle of single application >>> >>> >>> >>> >>> >>>> - Arrow could take advantages of coming Mnemonic features of >>> >>> >>> >>>> memory clustering/DOG (distributed object graph) and massive native >>> >>> >>> >>>> computing >>> >>> >>> >>> >>> >>>> - Mnemonic helps to reduce the pressure of main memory utilization >>> >>> >>> >>>> and its related system wide overheads. >>> >>> >>> >>> >>> >>>> Our this patch is designed to minimize the changes for user to use >>> >>> >>> >>>> Arrow, please check out the test cases provided by this patch for your >>> >>> >>> >>>> reference. >>> >>> >>> >>> >>> >>>> Note that, we need to put allocator services to a specified >>> >>> >>> >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test >>> >>> >>> >>>> cases to run because those services are required for external >>> >>> >>> >>>> memory-like device management. >>> >>> >>> >>> >>> >>>> Please give your comments and review feedback for better >>> >>> >>> >>>> collaboration of Apache Arrow and Mnemonic, Thanks. >>> >>> >>> >>> >>> >>>> Best Regards. >>> >>> >>> >>>> Gary. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> <smime.p7m> >>> >>> <gpgol000.txt> >>> >>> >>>