On Fri, Feb 28, 2014 at 1:37 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Here are some goals that I think would be good in the area of numerics, > classifiers and clustering: > - simple programming model > +1 > > - programmable via Java or R > +1 > > - runs clustered or not > I think both. > > > What does everybody think? > Good thread. Some of the comments are a bit above my head when it comes to specific topics yet here are my 2 cents. I come from the perspective of a Java developer who likes to add text clustering, classification and recommendation algorithms to an existing application and data, whether it's smallish data from a SQL database or larger amounts of data that requires distributed computing. So ideally I would like to see 1 A Java beans API for every algorithm. 2 Have a unified way to vectorize data, no matter where it comes from (SQL database or NoSQL store, filesystem, Lucene index, etc) 3 Have the option to use Hadoop or some other distributed computing framework to scale out. I have some ideas on these topics but maybe that's better for another thread.