[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors

Philipp Moritz (JIRA) Sun, 21 Aug 2016 18:44:53 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429948#comment-15429948
 ]


Philipp Moritz commented on ARROW-263:
--------------------------------------

Hey Micah,

thanks for your answer!

I got the trick of unlinking the domain socket from here: 
https://troydhanson.github.io/network/Unix_domain_sockets.html ("Unlink before 
bind"). On Linux and Mac OS it seems to work and prevents leaking of the file. 
Note that at some point we need to introduce a named object that can be seen by 
all processes to bootstrap the communication between processes and this has 
been the least problematic way of doing that I have seen.

At the moment I'm also working on a distributed version of the object store 
(with a separate process that can be used to ship objects between object stores 
on different nodes in a network) and investigating libuv to do it in a platform 
independent way. Libuv is a small dependency and my experience so far is pretty 
enjoyable. It also includes limited functionality to exchange file descriptors, 
but this might not work on windows (see also 
https://groups.google.com/forum/#!msg/libuv/0xxXBIGlzLc/H1HbL-igb84J, I haven't 
tried it yet).

Concerning your last comment: The plasma store is a long running process that 
keeps its file descriptor and the data alive. Are page faults still a problem 
if data does not need to be reloaded from hard disk?

If somebody else has a platform independent way of achieving some of these 
goals, I'd be happy to learn about their ideas.


> Design an initial IPC mechanism for Arrow Vectors
> -------------------------------------------------
>
>                 Key: ARROW-263
>                 URL: https://issues.apache.org/jira/browse/ARROW-263
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>
> Prior discussion on this topic [1].
> Use-cases:
> 1.  User defined function (UDF) execution:  One process wants to execute a 
> user defined function written in another language (e.g. Java executing a 
> function defined in python, this involves creating Arrow Arrays in java, 
> sending them to python and receiving a new set of Arrow Arrays produced in 
> python back in the java process).
> 2.  If a storage system and a query engine are running on the same host we 
> might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)
> Assumptions:
> 1.  IPC mechanism should be useable from the core set of supported languages 
> (Java, Python, C) on POSIX and ideally windows systems.  Ideally, we would 
> not need to add dependencies on additional libraries outside of each 
> languages outside of this document.
> We want leverage shared memory for Arrays to avoid doubling RAM requirements 
> by duplicating the same Array in different memory locations.  
> 2. Under some circumstances shared memory might be more efficient than FIFOs 
> or sockets (in other scenarios they won’t see thread below).
> 3. Security is not a concern for V1, we assume all processes running are 
> “trusted”.
> Requirements:
> 1.Resource management: 
>     a.  Both processes need a way of allocating memory for Arrow Arrays so 
> that data can be passed from one process to another.
>     b. There must be a mechanism to cleanup unused Arrow Arrays to limit 
> resource usage but avoid race conditions when processing arrays
> 2.  Schema negotiation - before sending data, both processes need to agree on 
> schema each one will produce.
> Out of scope requirements:
> 1.  IPC channel metadata discovery is out of scope of this document.  
> Discovery can be provided by passing appropriate command line arguments, 
> configuration files or other mechanisms like RPC (in which case RPC channel 
> discovery is still an issue).
> [1] 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors

Reply via email to