Thomas, while inspecting the code I realize I may need to import most/all of the classes inside your math library for the writables to compile, is it ok for you or you don't want that? Regards, Tommaso
2012/7/10 Thomas Jungblut <[email protected]> > great, thank you for taking care of it ;) > > 2012/7/10 Tommaso Teofili <[email protected]> > > > Ok, sure, I'll just add the writables along with DoubleMatrix/Vector with > > the AL2 headers on top. > > Thanks Thomas for the contribution and feedback. > > Tommaso > > > > 2012/7/10 Thomas Jungblut <[email protected]> > > > > > Feel free to commit this, but take care to add the apache license > > headers. > > > Also I wanted to add a few testcases over the next few weekends. > > > > > > 2012/7/10 Tommaso Teofili <[email protected]> > > > > > > > nice idea, quickly thinking to it it looks to me that (C)GD is a good > > fit > > > > for BSP. > > > > Also I was trying to implement some easy meta learning algorithm like > > the > > > > weighed majority algorithm where each peer as a proper learning > > algorithm > > > > and gest penalized for each mistaken prediction. > > > > Regarding your math library do you plan to commit it yourself? > > Otherwise > > > I > > > > can do it. > > > > Regards, > > > > Tommaso > > > > > > > > > > > > 2012/7/10 Thomas Jungblut <[email protected]> > > > > > > > > > Maybe a first good step towards algorithms would be to try to > > evaluate > > > > how > > > > > we can implement some non-linear optimizers in BSP. (BFGS or > > conjugate > > > > > gradient method) > > > > > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> > > > > > > > > > > > 2012/7/9 Thomas Jungblut <[email protected]> > > > > > > > > > > > > > For the matrix/vector I would propose my library interface: > > (quite > > > > like > > > > > > > mahouts math, but without boundary checks) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java > > > > > > > Full Writable for Vector and basic Writable for Matrix: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable > > > > > > > > > > > > > > It is an enough to make all machine learning algorithms I've > seen > > > > until > > > > > > now > > > > > > > and the builder pattern allows really nice chaining of commands > > to > > > > > easily > > > > > > > code equations or translate code from matlab/octave. > > > > > > > See for example logistic regression cost function > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java > > > > > > > > > > > > > > > > > > very nice, +1! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the interfaces of the algorithms: > > > > > > > I guess we need to get some more experience, I can not tell how > > the > > > > > > > interfaces for them should look like, mainly because I don't > know > > > how > > > > > the > > > > > > > BSP version of them will call the algorithm logic. > > > > > > > > > > > > > > > > > > > you're right, it's more reasonable to just proceed bottom - up > with > > > > this > > > > > as > > > > > > we're going to have a clearer idea while developing the different > > > > > > algorithms. > > > > > > So for now I'd introduce your library Writables and then proceed > 1 > > > step > > > > > at > > > > > > a time with the more common API. > > > > > > Thanks, > > > > > > Tommaso > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But having stable math interfaces is the key point. > > > > > > > > > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> > > > > > > > > > > > > > > > Ok, so let's sketch up here what these interfaces should look > > > like. > > > > > > > > Any proposal is more than welcome. > > > > > > > > Regards, > > > > > > > > Tommaso > > > > > > > > > > > > > > > > 2012/7/7 Thomas Jungblut <[email protected]> > > > > > > > > > > > > > > > > > Looks fine to me. > > > > > > > > > The key are the interfaces for learning and predicting so > we > > > > should > > > > > > > > define > > > > > > > > > some vectors and matrices. > > > > > > > > > It would be enough to define the algorithms via the > > interfaces > > > > and > > > > > a > > > > > > > > > generic BSP should just run them based on the given input. > > > > > > > > > > > > > > > > > > 2012/7/7 Tommaso Teofili <[email protected]> > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > in my spare time I started writing some basic BSP based > > > machine > > > > > > > > learning > > > > > > > > > > algorithms for our ml module, now I'm wondering, from a > > > design > > > > > > point > > > > > > > of > > > > > > > > > > view, where it'd make sense to put the training data / > > model. > > > > I'd > > > > > > > > assume > > > > > > > > > > the obvious answer would be HDFS so this makes me think > we > > > > should > > > > > > > come > > > > > > > > > with > > > > > > > > > > (at least) two BSP jobs for each algorithm: one for > > learning > > > > and > > > > > > one > > > > > > > > for > > > > > > > > > > "predicting" each to be run separately. > > > > > > > > > > This would allow to read the training data from HDFS, and > > > > > > > consequently > > > > > > > > > > create a model (also on HDFS) and then the created model > > > could > > > > be > > > > > > > read > > > > > > > > > > (again from HDFS) in order to predict an output for a new > > > > input. > > > > > > > > > > Does that make sense? > > > > > > > > > > I'm just wondering what a general purpose design for Hama > > > based > > > > > ML > > > > > > > > stuff > > > > > > > > > > would look like so this is just to start the discussion, > > any > > > > > > opinion > > > > > > > is > > > > > > > > > > welcome. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Tommaso > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
