[
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573282#comment-16573282
]
Adam Gibson commented on ARROW-1163:
------------------------------------
Hey folks - adam from deeplearning4j here. nd4j is likely the closest thing to
a "numpy" on the jvm you are going to get.
This is on top of being able to directly read numpy and tensorflow arrays
directly in memory with zero copy, this is on top of being able to work with
mkl/cuda while also having a fairly friendly managed buffers story:
[https://deeplearning4j.org/workspaces]
Apache tika and apache solr have not been afraid to work with us. I'd encourage
folks to reach out to us in the future rather than just skimming and making
some assumptions.
We'd be more than glad to engage the arrow community. We already have our own
support for reading/writing apache arrow tensors:
[https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-serde/nd4j-arrow]
Apache mahout also uses our underlying JNI stack javacpp:
[https://github.com/apache/mahout/blob/master/viennacl-omp/pom.xml]
We've also based our ETL software for pre processing data based on arrow as
well:
https://github.com/deeplearning4j/deeplearning4j/tree/master/datavec/datavec-local/src/main/java/org/datavec/local/transforms
We've done quite a few tricks with the javacpp tensorflow bindings as well to
coax tensorflow graphs in to the nd4j environment for graph execution:
[https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-tensorflow/src/main/java/org/nd4j/tensorflow/conversion]
There is some neat work we could do together here if folks are interested.
There doesn't seem to be too much interest in making the java
bindings work well with tensors (mainly because of the focus on python) but if
there's anyone interested in making it work well, we'd be more than glad to
support folks with such efforts.
> [Plasma][Java] Java client for Plasma
> -------------------------------------
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Java, Plasma (C++)
> Reporter: Philipp Moritz
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
> Time Spent: 11h 50m
> Remaining Estimate: 0h
>
> We should start thinking about how a Java client for plasma would look like.
> Given the focus of arrow to support Python, C++ and Java really well, it is
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java
> client. It would communicate with the Plasma store via Java flatbuffers over
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the
> way we ship file descriptors for the memory mapped files between store and
> client (see the file fling.cc in the Plasma repo). We would need to get rid
> of that because there is no pure Java API that allows transferring file
> descriptors over a process boundary. So the way to transfer memory mapped
> files over process boundaries then is probably to use the file system and
> keep the memory mapped files in the file system instead of unlinking them
> immediately (as we do at the moment), so they can be opened by the client
> process via their path.
> The challenge in this case is how to clean the files up and make sure they
> are not lying around if the plasma store crashes. One option is to store the
> plasma store PID with the file (i.e. as part of the file name) and let the
> plasma store clean them up the next time it is started); maybe there is OS
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has
> free cycles, they should feel free to chime in. Also opinions on the design
> are appreciated!
> -- Philipp.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)