If you're going to do any work in this area, I would highly encourage you 
to do in as part of the core.matrix library. That is what Incanter is or 
will be using for it's dataset implementation. But it's nice that those 
abstractions and implementations be separate from Incanter itself, since 
Incanter is a rather large dependency.

Core.matrix is certainly (in my eyes) becoming the de facto matrix 
computation library in the Clojure ecosystem, and I think in the level of 
interop between different implementations there, and extent of utilization 
by the clojure community, we rival the python offerings. However, while 
core.matrix has some dataset protocols, api functions and basic 
implementations, there's still some work to get the full expressiveness of 
the data.frame pattern as seen in R and Pandas. Specifically, there is no 
support for setting rownames (or arbitrary "name" assignments beyond that 
of a single dimension (columns...)). This is something I started working on 
a while back, but wasn't able to finish. I could potentially push what I 
came up with to a fork, but unfortunately, I don't have any more time to 
work on the problem at the moment.

Mike Anderson is a great project maintainer, and will probably be happy to 
help guide you in stitching together a solution.

Best

Chris





On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
wrote:
>
> Is there any desire or need for a Clojure DataFrame?
>
>
> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
> pandas.DataFrame.
>
> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
> like to know if and how people are using it.
>
> From quickly researching, I see that some prior work has been done in this 
> space, such as:
>
> * https://github.com/cardillo/joinery
> * https://github.com/mattrepl/data-frame
> * 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>
> Rather than going off and creating a competing implementation (
> https://xkcd.com/927/), I'd like to know if anyone here is actively 
> working on, or would like to work on a DataFrame and related utilities for 
> Clojure (and by extension Java)? Is it something that's sorely needed, or 
> is everybody happy with using Incanter or some other library that I'm not 
> aware of? If there's already a defacto standard out there, would anyone 
> care to please point it out?
>
> As background information:
>
> My specific use-case is in NLP and ML, where I often explore and prototype 
> in Python, but I'm then left to deal with a smattering of libraries on the 
> JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with 
> their own ad-hoc implementations of algorithms, matrices, and utilities for 
> reading data. It would be great to have a unified way to explore my data in 
> the Clojure REPL, and then serve the same code and models in production.
>
> I would love for Clojure to have a broadly compatible ecosystem similar to 
> Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and 
> Incanter appear to fulfill a large chunk of those roles, but I am not aware 
> if they've yet become the defacto standards in the community.
>
> Any feedback is greatly appreciated.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to