2012/7/9 Thomas Jungblut <[email protected]> > For the matrix/vector I would propose my library interface: (quite like > mahouts math, but without boundary checks) > > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java > > > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java > Full Writable for Vector and basic Writable for Matrix: > > https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable > > It is an enough to make all machine learning algorithms I've seen until now > and the builder pattern allows really nice chaining of commands to easily > code equations or translate code from matlab/octave. > See for example logistic regression cost function > > https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java
very nice, +1! > > > For the interfaces of the algorithms: > I guess we need to get some more experience, I can not tell how the > interfaces for them should look like, mainly because I don't know how the > BSP version of them will call the algorithm logic. > you're right, it's more reasonable to just proceed bottom - up with this as we're going to have a clearer idea while developing the different algorithms. So for now I'd introduce your library Writables and then proceed 1 step at a time with the more common API. Thanks, Tommaso > > But having stable math interfaces is the key point. > > 2012/7/9 Tommaso Teofili <[email protected]> > > > Ok, so let's sketch up here what these interfaces should look like. > > Any proposal is more than welcome. > > Regards, > > Tommaso > > > > 2012/7/7 Thomas Jungblut <[email protected]> > > > > > Looks fine to me. > > > The key are the interfaces for learning and predicting so we should > > define > > > some vectors and matrices. > > > It would be enough to define the algorithms via the interfaces and a > > > generic BSP should just run them based on the given input. > > > > > > 2012/7/7 Tommaso Teofili <[email protected]> > > > > > > > Hi all, > > > > > > > > in my spare time I started writing some basic BSP based machine > > learning > > > > algorithms for our ml module, now I'm wondering, from a design point > of > > > > view, where it'd make sense to put the training data / model. I'd > > assume > > > > the obvious answer would be HDFS so this makes me think we should > come > > > with > > > > (at least) two BSP jobs for each algorithm: one for learning and one > > for > > > > "predicting" each to be run separately. > > > > This would allow to read the training data from HDFS, and > consequently > > > > create a model (also on HDFS) and then the created model could be > read > > > > (again from HDFS) in order to predict an output for a new input. > > > > Does that make sense? > > > > I'm just wondering what a general purpose design for Hama based ML > > stuff > > > > would look like so this is just to start the discussion, any opinion > is > > > > welcome. > > > > > > > > Cheers, > > > > Tommaso > > > > > > > > > >
