very nice, thank you very much 2012/7/10 Tommaso Teofili <tommaso.teof...@gmail.com>
> I've done the first import, we can start from that now, thanks Thomas. > Tommaso > > 2012/7/10 Tommaso Teofili <tommaso.teof...@gmail.com> > > > ok, I'll try that, thanks :) > > Tommaso > > > > 2012/7/10 Thomas Jungblut <thomas.jungb...@gmail.com> > > > >> I don't know if we need sparse/named vectors for the first scratch. > >> You can just use the interface and the dense implementations and remove > >> all > >> the uncompilable code in the writables. > >> > >> 2012/7/10 Tommaso Teofili <tommaso.teof...@gmail.com> > >> > >> > Thomas, while inspecting the code I realize I may need to import > >> most/all > >> > of the classes inside your math library for the writables to compile, > >> is it > >> > ok for you or you don't want that? > >> > Regards, > >> > Tommaso > >> > > >> > 2012/7/10 Thomas Jungblut <thomas.jungb...@gmail.com> > >> > > >> > > great, thank you for taking care of it ;) > >> > > > >> > > 2012/7/10 Tommaso Teofili <tommaso.teof...@gmail.com> > >> > > > >> > > > Ok, sure, I'll just add the writables along with > DoubleMatrix/Vector > >> > with > >> > > > the AL2 headers on top. > >> > > > Thanks Thomas for the contribution and feedback. > >> > > > Tommaso > >> > > > > >> > > > 2012/7/10 Thomas Jungblut <thomas.jungb...@gmail.com> > >> > > > > >> > > > > Feel free to commit this, but take care to add the apache > license > >> > > > headers. > >> > > > > Also I wanted to add a few testcases over the next few weekends. > >> > > > > > >> > > > > 2012/7/10 Tommaso Teofili <tommaso.teof...@gmail.com> > >> > > > > > >> > > > > > nice idea, quickly thinking to it it looks to me that (C)GD > is a > >> > good > >> > > > fit > >> > > > > > for BSP. > >> > > > > > Also I was trying to implement some easy meta learning > algorithm > >> > like > >> > > > the > >> > > > > > weighed majority algorithm where each peer as a proper > learning > >> > > > algorithm > >> > > > > > and gest penalized for each mistaken prediction. > >> > > > > > Regarding your math library do you plan to commit it yourself? > >> > > > Otherwise > >> > > > > I > >> > > > > > can do it. > >> > > > > > Regards, > >> > > > > > Tommaso > >> > > > > > > >> > > > > > > >> > > > > > 2012/7/10 Thomas Jungblut <thomas.jungb...@gmail.com> > >> > > > > > > >> > > > > > > Maybe a first good step towards algorithms would be to try > to > >> > > > evaluate > >> > > > > > how > >> > > > > > > we can implement some non-linear optimizers in BSP. (BFGS or > >> > > > conjugate > >> > > > > > > gradient method) > >> > > > > > > > >> > > > > > > 2012/7/9 Tommaso Teofili <tommaso.teof...@gmail.com> > >> > > > > > > > >> > > > > > > > 2012/7/9 Thomas Jungblut <thomas.jungb...@gmail.com> > >> > > > > > > > > >> > > > > > > > > For the matrix/vector I would propose my library > >> interface: > >> > > > (quite > >> > > > > > like > >> > > > > > > > > mahouts math, but without boundary checks) > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java > >> > > > > > > > > Full Writable for Vector and basic Writable for Matrix: > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable > >> > > > > > > > > > >> > > > > > > > > It is an enough to make all machine learning algorithms > >> I've > >> > > seen > >> > > > > > until > >> > > > > > > > now > >> > > > > > > > > and the builder pattern allows really nice chaining of > >> > commands > >> > > > to > >> > > > > > > easily > >> > > > > > > > > code equations or translate code from matlab/octave. > >> > > > > > > > > See for example logistic regression cost function > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > very nice, +1! > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > For the interfaces of the algorithms: > >> > > > > > > > > I guess we need to get some more experience, I can not > >> tell > >> > how > >> > > > the > >> > > > > > > > > interfaces for them should look like, mainly because I > >> don't > >> > > know > >> > > > > how > >> > > > > > > the > >> > > > > > > > > BSP version of them will call the algorithm logic. > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > you're right, it's more reasonable to just proceed bottom > - > >> up > >> > > with > >> > > > > > this > >> > > > > > > as > >> > > > > > > > we're going to have a clearer idea while developing the > >> > different > >> > > > > > > > algorithms. > >> > > > > > > > So for now I'd introduce your library Writables and then > >> > proceed > >> > > 1 > >> > > > > step > >> > > > > > > at > >> > > > > > > > a time with the more common API. > >> > > > > > > > Thanks, > >> > > > > > > > Tommaso > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > But having stable math interfaces is the key point. > >> > > > > > > > > > >> > > > > > > > > 2012/7/9 Tommaso Teofili <tommaso.teof...@gmail.com> > >> > > > > > > > > > >> > > > > > > > > > Ok, so let's sketch up here what these interfaces > should > >> > look > >> > > > > like. > >> > > > > > > > > > Any proposal is more than welcome. > >> > > > > > > > > > Regards, > >> > > > > > > > > > Tommaso > >> > > > > > > > > > > >> > > > > > > > > > 2012/7/7 Thomas Jungblut <thomas.jungb...@gmail.com> > >> > > > > > > > > > > >> > > > > > > > > > > Looks fine to me. > >> > > > > > > > > > > The key are the interfaces for learning and > >> predicting so > >> > > we > >> > > > > > should > >> > > > > > > > > > define > >> > > > > > > > > > > some vectors and matrices. > >> > > > > > > > > > > It would be enough to define the algorithms via the > >> > > > interfaces > >> > > > > > and > >> > > > > > > a > >> > > > > > > > > > > generic BSP should just run them based on the given > >> > input. > >> > > > > > > > > > > > >> > > > > > > > > > > 2012/7/7 Tommaso Teofili <tommaso.teof...@gmail.com > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi all, > >> > > > > > > > > > > > > >> > > > > > > > > > > > in my spare time I started writing some basic BSP > >> based > >> > > > > machine > >> > > > > > > > > > learning > >> > > > > > > > > > > > algorithms for our ml module, now I'm wondering, > >> from a > >> > > > > design > >> > > > > > > > point > >> > > > > > > > > of > >> > > > > > > > > > > > view, where it'd make sense to put the training > >> data / > >> > > > model. > >> > > > > > I'd > >> > > > > > > > > > assume > >> > > > > > > > > > > > the obvious answer would be HDFS so this makes me > >> think > >> > > we > >> > > > > > should > >> > > > > > > > > come > >> > > > > > > > > > > with > >> > > > > > > > > > > > (at least) two BSP jobs for each algorithm: one > for > >> > > > learning > >> > > > > > and > >> > > > > > > > one > >> > > > > > > > > > for > >> > > > > > > > > > > > "predicting" each to be run separately. > >> > > > > > > > > > > > This would allow to read the training data from > >> HDFS, > >> > and > >> > > > > > > > > consequently > >> > > > > > > > > > > > create a model (also on HDFS) and then the created > >> > model > >> > > > > could > >> > > > > > be > >> > > > > > > > > read > >> > > > > > > > > > > > (again from HDFS) in order to predict an output > for > >> a > >> > new > >> > > > > > input. > >> > > > > > > > > > > > Does that make sense? > >> > > > > > > > > > > > I'm just wondering what a general purpose design > for > >> > Hama > >> > > > > based > >> > > > > > > ML > >> > > > > > > > > > stuff > >> > > > > > > > > > > > would look like so this is just to start the > >> > discussion, > >> > > > any > >> > > > > > > > opinion > >> > > > > > > > > is > >> > > > > > > > > > > > welcome. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > Tommaso > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >