I've been under the impression that exposing memory to be shared directly
and not copied WAS, in fact, the responsibility of Arrow. In fact, I read
this in [1] and this is turned me on to Arrow in the first place.


[1]
http://www.datanami.com/2016/02/17/arrow-aims-to-defrag-big-in-memory-analytics/

On Wed, Mar 16, 2016 at 1:00 PM, Julian Hyde <jh...@apache.org> wrote:

> This is all very interesting stuff, but just so we’re clear: it is not
> Arrow’s responsibility to provide an RPC/IPC/LPC mechanism, nor facilities
> for resource management. If we DID decide to make this Arrow’s
> responsibility it would overlap with other components which specialize in
> such stuff.
>
>
>
> > On Mar 16, 2016, at 9:49 AM, Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > @Todd: agree entirely on prototyping design. My goal is throw out some
> > ideas and some POC code and then we can explore from there.
> >
> > My main thoughts have initially been around lifecycle management. I've
> done
> > some work previously where a consistently sized shared buffer using mmap
> > has improved performance. This is more complicated given the requirements
> > for providing collaborative allocation and cross process reference
> counts.
> >
> > With regards to whether this is more generally applicable: I think it
> could
> > ultimately be more general but I suggest we focus on the particular
> > application of moving long-lived arrow record batches between a producer
> > and a consumer initially. Constraining the problems seems like we will
> get
> > to something workable sooner. We can abstract to a more general solution
> as
> > there are other clear requirements.
> >
> > With regards to capnproto, I believe they are simply saying when they
> talk
> > about zero-copy shared memory that the structure supports that (same as
> any
> > memory-layout based design). I don't believe they actually implemented a
> > protocol and multi-language implementation for zero-copy cross process
> > communication.
> >
> > One other note to make here is that my goal here is not just about
> > performance but also about memory footprint. Being able to have a shared
> > memory protocol that allows multiple tools to interact with the same hot
> > dataset.
> >
> > RE: ACL, for the initial focus, I suggest that we consider the two
> sharing
> > processes are "trusted" and expect the initial Arrow API reference
> > implementations to manage memory access.
> >
> > Regarding other questions that Todd threw out:
> >
> > - if you are using an mmapped file in /dev/shm/, how do you make sure it
> > gets cleaned up if the process crashes?
> >
> >>> Agreed that it needs to get resolve. If I recall, destruction can be
> > applied once associated process are attached to memory and this allows
> the
> > kernel to recover once all attaching processes are destroyed. If this
> isn't
> > enough, then we may very well need a simple  external coordinator.
> >
> > - how do you allocate memory to it? there's nothing ensuring that
> /dev/shm
> > doesn't swap out if you try to put too much in there, and then your
> > in-memory super-fast access will basically collapse under swap thrashing
> >
> >>> Simplest model initially is probably one where we assume a master and a
> > slave. (Ideally negotiated on initial connection.) The master is
> > responsible for allocating memory and giving that to the slave. The
> master
> > then is responsible for managing reasonable memory allocation limits just
> > like any other. Slaves that need to allocated memory must ask the master
> > (at whatever chunk makes sense) and will get rejected if they are too
> > aggressive. (this probably means that at any point an IPC can fall back
> to
> > RPC??)
> >
> > - how do you do lifecycle management across the two processes? If, say,
> > Kudu wants to pass a block of data to some Python program, how does it
> know
> > when the Python program is done reading it and it should be deleted? What
> > if the python program crashed in the middle - when can Kudu release it?
> >
> >>> My thinking, as mentioned earlier, is a shared reference count model
> for
> > complex situations. Possibly a "request/response" ownership model for
> > simpler cases.
> >
> > - how do you do security? If both sides of the connection don't trust
> each
> > other, and use length prefixes and offsets, you have to be constantly
> > validating and re-validating everything you read.
> >
> > I'm suggesting that we start with trusting so we don't get too wrapped up
> > in all the extra complexities of security. My experience with these
> things
> > is that a lot of users will frequently pick performance or footprint over
> > security for quite some time. For example, if I recall correctly, on the
> > shared file descriptor model that was initially implemented in the HDFS
> > client, that people used short-circuit reads for years before security
> was
> > correctly implemented. (Am I remembering this right?)
> >
> > Lastly, as I mentioned above, I don't think there should be any
> requirement
> > that Arrow communication be limited to only 'IPC'. As Todd points out, in
> > many cases unix domain sockets will be just fine.
> >
> > We need to implement both models because we all know that locality will
> > never be guaranteed. The IPC design/implementation needs to be good for
> > anything to make into arrow.
> >
> > thanks
> > Jacques
> >
> >
> >
> > On Wed, Mar 16, 2016 at 8:54 AM, Zhe Zhang <z...@apache.org> wrote:
> >
> >> I have similar concerns as Todd stated below. With an mmap-based
> approach,
> >> we are treating shared memory objects like files. This brings in all
> >> filesystem related considerations like ACL and lifecycle mgmt.
> >>
> >> Stepping back a little, the shared-memory work isn't really specific to
> >> Arrow. A few questions related to this:
> >> 1) Has the topic been discussed in the context of protobuf (or other IPC
> >> protocols) before? Seems Cap'n Proto (https://capnproto.org/) has
> >> zero-copy
> >> shared memory. I haven't read implementation detail though.
> >> 2) If the shared-memory work benefits a wide range of protocols, should
> it
> >> be a generalized and standalone library?
> >>
> >> Thanks,
> >> Zhe
> >>
> >> On Tue, Mar 15, 2016 at 8:30 PM Todd Lipcon <t...@cloudera.com> wrote:
> >>
> >>> Having thought about this quite a bit in the past, I think the
> mechanics
> >> of
> >>> how to share memory are by far the easiest part. The much harder part
> is
> >>> the resource management and ownership. Questions like:
> >>>
> >>> - if you are using an mmapped file in /dev/shm/, how do you make sure
> it
> >>> gets cleaned up if the process crashes?
> >>> - how do you allocate memory to it? there's nothing ensuring that
> >> /dev/shm
> >>> doesn't swap out if you try to put too much in there, and then your
> >>> in-memory super-fast access will basically collapse under swap
> thrashing
> >>> - how do you do lifecycle management across the two processes? If, say,
> >>> Kudu wants to pass a block of data to some Python program, how does it
> >> know
> >>> when the Python program is done reading it and it should be deleted?
> What
> >>> if the python program crashed in the middle - when can Kudu release it?
> >>> - how do you do security? If both sides of the connection don't trust
> >> each
> >>> other, and use length prefixes and offsets, you have to be constantly
> >>> validating and re-validating everything you read.
> >>>
> >>> Another big factor is that shared memory is not, in my experience,
> >>> immediately faster than just copying data over a unix domain socket. In
> >>> particular, the first time you read an mmapped file, you'll end up
> paying
> >>> minor page fault overhead on every page. This can be improved with
> >>> HugePages, but huge page mmaps are not supported yet in current Linux
> >> (work
> >>> going on currently to address this). So you're left with hugetlbfs,
> which
> >>> involves static allocations and much more pain.
> >>>
> >>> All the above is a long way to say: let's make sure we do the write
> >>> prototyping and up-front design before jumping into code.
> >>>
> >>> -Todd
> >>>
> >>>
> >>>
> >>> On Tue, Mar 15, 2016 at 5:54 PM, Jacques Nadeau <jacq...@apache.org>
> >>> wrote:
> >>>
> >>>> @Corey
> >>>> The POC Steven and Wes are working on is based on MappedBuffer but I'm
> >>>> looking at using netty's fork of tcnative to use shared memory
> >> directly.
> >>>>
> >>>> @Yiannis
> >>>> We need to have both RPC and a shared memory mechanisms (what I'm
> >>> inclined
> >>>> to call IPC but is a specific kind of IPC). The idea is we negotiate
> >> via
> >>>> RPC and then if we determine shared locality, we work over shared
> >> memory
> >>>> (preferably for both data and control). So the system interacting with
> >>>> HBase in your example would be the one responsible for placing
> >> collocated
> >>>> execution to take advantage of IPC.
> >>>>
> >>>> How do others feel of my redefinition of IPC to mean the same memory
> >>> space
> >>>> communication (either via shared memory or rdma) versus RPC as socket
> >>> based
> >>>> communication?
> >>>>
> >>>>
> >>>> On Tue, Mar 15, 2016 at 5:38 PM, Corey Nolet <cjno...@gmail.com>
> >> wrote:
> >>>>
> >>>>> I was seeing Netty's unsafe classes being used here, not mapped byte
> >>>>> buffer  not sure if that statement is completely correct but I'll
> >> have
> >>> to
> >>>>> dog through the code again to figure that out.
> >>>>>
> >>>>> The more I was looking at unsafe, it makes sense why that would be
> >>>>> used.apparently it's also supposed to be included on Java 9 as a
> >> first
> >>>>> class API
> >>>>> On Mar 15, 2016 7:03 PM, "Wes McKinney" <w...@cloudera.com> wrote:
> >>>>>
> >>>>>> My understanding is that you can use java.nio.MappedByteBuffer to
> >>> work
> >>>>>> with memory-mapped files as one way to share memory pages between
> >>> Java
> >>>>>> (and non-Java) processes without copying.
> >>>>>>
> >>>>>> I am hoping that we can reach a POC of zero-copy Arrow memory
> >> sharing
> >>>>>> Java-to-Java and Java-to-C++ in the near future. Indeed this will
> >>> have
> >>>>>> huge implications once we get it working end to end (for example,
> >>>>>> receiving memory from a Java process in Python without a heavy
> >> ser-de
> >>>>>> step -- it's what we've always dreamed of) and with the metadata
> >> and
> >>>>>> shared memory control flow standardized.
> >>>>>>
> >>>>>> - Wes
> >>>>>>
> >>>>>> On Wed, Mar 9, 2016 at 9:25 PM, Corey J Nolet <cjno...@gmail.com>
> >>>> wrote:
> >>>>>>> If I understand correctly, Arrow is using Netty underneath which
> >> is
> >>>>>> using Sun's Unsafe API in order to allocate direct byte buffers off
> >>>> heap.
> >>>>>> It is using Netty to communicate between "client" and "server",
> >>>>> information
> >>>>>> about memory addresses for data that is being requested.
> >>>>>>>
> >>>>>>> I've never attempted to use the Unsafe API to access off heap
> >>> memory
> >>>>>> that has been allocated in one JVM from another JVM but I'm
> >> assuming
> >>>> this
> >>>>>> must be the case in order to claim that the memory is being
> >> accessed
> >>>>>> directly without being copied, correct?
> >>>>>>>
> >>>>>>> The implication here is huge. If the memory is being directly
> >>> shared
> >>>>>> across processes by them being allowed to directly reach into the
> >>>> direct
> >>>>>> byte buffers, that's true shared memory. Otherwise, if there's
> >> copies
> >>>>> going
> >>>>>> on, it's less appealing.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Sent from my iPad
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>>
> >>
>
>

Reply via email to