There was a little bit of effort previously in Arrow to start building this
out (see the algorithms package), but we tabled it due to the large scope
and availability of maintainers for it.
On Tue, Mar 16, 2021 at 4:36 PM Wes McKinney wrote:
> This has been asked several times in the past but
There is a JVM based dataframe library:
https://github.com/techascent/tech.ml.dataset
There are dplyr-like bindings for it: https://github.com/scicloj/tablecloth
It supports mmap/in-place loading of array files (which the Java SDK does
not): https://techascent.com/blog/memory-mapping-arrow.html
This isn't directly related to the question, but I was reading about the
newly released JDK 16 today and there is initial support for explicit
vectorized operations, which might be interesting to explore for anyone
considering building a Java DataFrame implementation.
I can't speak to how complete it is, but I looked earlier for
something similar and ran across
https://github.com/deeplearning4j/nd4j .. it's probably not an exact
fit, but it does appear to be able to consume arrow buffers and expose
them to java.
Cheers
Andrew
On Tue, Mar 16, 2021 at 6:36 PM
Hi,
I've been using Arrow for some time now, mostly in the context of Arrow
Flight between Java and Python. While it's quite easy to convert Arrow
data in Python to a pandas dataframe and manipulate it, I'm struggling to
find an obvious analogue on the Java side. VectorSchemaRoot is useful for