[jira] [Comment Edited] (ARROW-1163) [Plasma] Java client for Plasma
[ https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243420#comment-16243420 ] Philipp Moritz edited comment on ARROW-1163 at 11/8/17 5:59 AM: That makes sense for now and I agree it's a little sad; for the future maybe you can get some insights from https://github.com/deeplearning4j/deeplearning4j on how to write the Tensor class in the "right" way; unfortunately Java doesn't really have a long tradition of scientific computing like Python has so there is no good standard Tensor classes like numpy. Edit: This is also an opportunity for Arrow, if we had a good Java tensor class it could be widely used because of the increasing importance of deep learning. Another project to look at is https://github.com/intel-analytics/BigDL. We also wrote our own in the past: https://github.com/amplab/SparkNet/blob/master/src/main/scala/libs/NDArray.scala and https://github.com/amplab/SparkNet/blob/master/src/main/java/libs/JavaNDArray.java to interop with Caffe and TensorFlow, but it might not be too useful for shared memory. was (Author: pcmoritz): That makes sense for now and I agree it's a little sad; for the future maybe you can get some insights from https://github.com/deeplearning4j/deeplearning4j on how to write the Tensor class in the "right" way; unfortunately Java doesn't really have a long tradition of scientific computing like Python has so there is no good standard Tensor classes like numpy. > [Plasma] Java client for Plasma > --- > > Key: ARROW-1163 > URL: https://issues.apache.org/jira/browse/ARROW-1163 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Philipp Moritz > > We should start thinking about how a Java client for plasma would look like. > Given the focus of arrow to support Python, C++ and Java really well, it is > the next important target after Python and C++. > My preliminary thoughts on it are the following ones: We can either go with > JNI and wrap the C++ client or (in my opinion preferable) write a pure Java > client. It would communicate with the Plasma store via Java flatbuffers over > sockets. > It seems that the only thing blocking a pure Java client at the moment is the > way we ship file descriptors for the memory mapped files between store and > client (see the file fling.cc in the Plasma repo). We would need to get rid > of that because there is no pure Java API that allows transferring file > descriptors over a process boundary. So the way to transfer memory mapped > files over process boundaries then is probably to use the file system and > keep the memory mapped files in the file system instead of unlinking them > immediately (as we do at the moment), so they can be opened by the client > process via their path. > The challenge in this case is how to clean the files up and make sure they > are not lying around if the plasma store crashes. One option is to store the > plasma store PID with the file (i.e. as part of the file name) and let the > plasma store clean them up the next time it is started); maybe there is OS > level support for temporary files we can reuse. > I probably won't get to this for a while, so if anybody needs this or has > free cycles, they should feel free to chime in. Also opinions on the design > are appreciated! > -- Philipp. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ARROW-1163) [Plasma] Java client for Plasma
[ https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243388#comment-16243388 ] Lu Qi edited comment on ARROW-1163 at 11/8/17 5:10 AM: Hi, Philipp, Thanks for providing me these material. I see that numpy uses "PyArray_NewFromDescr" to wrap a memory without copying data. So, on Java side, we will mimic this method and provide a wrapper class for viewing or modify the underlying "mmap" share memory. But , for now , as in my case, I have an already defined Tensor using float array . I have to copy data into it , which is pretty sad. Maybe one day I can drop my Tensor was (Author: luchy0120): Hi, Philipp, Thanks for providing me these material. I see that numpy uses "PyArray_NewFromDescr" to wrap a memory without copying data. So, on Java side, we will mimic this method and provide a wrapper class for viewing or modify the underlying "mmap" share memory. But , for now , as in my case, I have an already defined Tensor using float array . I have to copy data into it , which is pretty sad. Maybe one day we can drop our Tensor > [Plasma] Java client for Plasma > --- > > Key: ARROW-1163 > URL: https://issues.apache.org/jira/browse/ARROW-1163 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Philipp Moritz > > We should start thinking about how a Java client for plasma would look like. > Given the focus of arrow to support Python, C++ and Java really well, it is > the next important target after Python and C++. > My preliminary thoughts on it are the following ones: We can either go with > JNI and wrap the C++ client or (in my opinion preferable) write a pure Java > client. It would communicate with the Plasma store via Java flatbuffers over > sockets. > It seems that the only thing blocking a pure Java client at the moment is the > way we ship file descriptors for the memory mapped files between store and > client (see the file fling.cc in the Plasma repo). We would need to get rid > of that because there is no pure Java API that allows transferring file > descriptors over a process boundary. So the way to transfer memory mapped > files over process boundaries then is probably to use the file system and > keep the memory mapped files in the file system instead of unlinking them > immediately (as we do at the moment), so they can be opened by the client > process via their path. > The challenge in this case is how to clean the files up and make sure they > are not lying around if the plasma store crashes. One option is to store the > plasma store PID with the file (i.e. as part of the file name) and let the > plasma store clean them up the next time it is started); maybe there is OS > level support for temporary files we can reuse. > I probably won't get to this for a while, so if anybody needs this or has > free cycles, they should feel free to chime in. Also opinions on the design > are appreciated! > -- Philipp. -- This message was sent by Atlassian JIRA (v6.4.14#64029)