I've done the first import, we can start from that now, thanks Thomas. Tommaso
2012/7/10 Tommaso Teofili <[email protected]> > ok, I'll try that, thanks :) > Tommaso > > 2012/7/10 Thomas Jungblut <[email protected]> > >> I don't know if we need sparse/named vectors for the first scratch. >> You can just use the interface and the dense implementations and remove >> all >> the uncompilable code in the writables. >> >> 2012/7/10 Tommaso Teofili <[email protected]> >> >> > Thomas, while inspecting the code I realize I may need to import >> most/all >> > of the classes inside your math library for the writables to compile, >> is it >> > ok for you or you don't want that? >> > Regards, >> > Tommaso >> > >> > 2012/7/10 Thomas Jungblut <[email protected]> >> > >> > > great, thank you for taking care of it ;) >> > > >> > > 2012/7/10 Tommaso Teofili <[email protected]> >> > > >> > > > Ok, sure, I'll just add the writables along with DoubleMatrix/Vector >> > with >> > > > the AL2 headers on top. >> > > > Thanks Thomas for the contribution and feedback. >> > > > Tommaso >> > > > >> > > > 2012/7/10 Thomas Jungblut <[email protected]> >> > > > >> > > > > Feel free to commit this, but take care to add the apache license >> > > > headers. >> > > > > Also I wanted to add a few testcases over the next few weekends. >> > > > > >> > > > > 2012/7/10 Tommaso Teofili <[email protected]> >> > > > > >> > > > > > nice idea, quickly thinking to it it looks to me that (C)GD is a >> > good >> > > > fit >> > > > > > for BSP. >> > > > > > Also I was trying to implement some easy meta learning algorithm >> > like >> > > > the >> > > > > > weighed majority algorithm where each peer as a proper learning >> > > > algorithm >> > > > > > and gest penalized for each mistaken prediction. >> > > > > > Regarding your math library do you plan to commit it yourself? >> > > > Otherwise >> > > > > I >> > > > > > can do it. >> > > > > > Regards, >> > > > > > Tommaso >> > > > > > >> > > > > > >> > > > > > 2012/7/10 Thomas Jungblut <[email protected]> >> > > > > > >> > > > > > > Maybe a first good step towards algorithms would be to try to >> > > > evaluate >> > > > > > how >> > > > > > > we can implement some non-linear optimizers in BSP. (BFGS or >> > > > conjugate >> > > > > > > gradient method) >> > > > > > > >> > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> >> > > > > > > >> > > > > > > > 2012/7/9 Thomas Jungblut <[email protected]> >> > > > > > > > >> > > > > > > > > For the matrix/vector I would propose my library >> interface: >> > > > (quite >> > > > > > like >> > > > > > > > > mahouts math, but without boundary checks) >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java >> > > > > > > > > Full Writable for Vector and basic Writable for Matrix: >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable >> > > > > > > > > >> > > > > > > > > It is an enough to make all machine learning algorithms >> I've >> > > seen >> > > > > > until >> > > > > > > > now >> > > > > > > > > and the builder pattern allows really nice chaining of >> > commands >> > > > to >> > > > > > > easily >> > > > > > > > > code equations or translate code from matlab/octave. >> > > > > > > > > See for example logistic regression cost function >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java >> > > > > > > > >> > > > > > > > >> > > > > > > > very nice, +1! >> > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > For the interfaces of the algorithms: >> > > > > > > > > I guess we need to get some more experience, I can not >> tell >> > how >> > > > the >> > > > > > > > > interfaces for them should look like, mainly because I >> don't >> > > know >> > > > > how >> > > > > > > the >> > > > > > > > > BSP version of them will call the algorithm logic. >> > > > > > > > > >> > > > > > > > >> > > > > > > > you're right, it's more reasonable to just proceed bottom - >> up >> > > with >> > > > > > this >> > > > > > > as >> > > > > > > > we're going to have a clearer idea while developing the >> > different >> > > > > > > > algorithms. >> > > > > > > > So for now I'd introduce your library Writables and then >> > proceed >> > > 1 >> > > > > step >> > > > > > > at >> > > > > > > > a time with the more common API. >> > > > > > > > Thanks, >> > > > > > > > Tommaso >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > > But having stable math interfaces is the key point. >> > > > > > > > > >> > > > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> >> > > > > > > > > >> > > > > > > > > > Ok, so let's sketch up here what these interfaces should >> > look >> > > > > like. >> > > > > > > > > > Any proposal is more than welcome. >> > > > > > > > > > Regards, >> > > > > > > > > > Tommaso >> > > > > > > > > > >> > > > > > > > > > 2012/7/7 Thomas Jungblut <[email protected]> >> > > > > > > > > > >> > > > > > > > > > > Looks fine to me. >> > > > > > > > > > > The key are the interfaces for learning and >> predicting so >> > > we >> > > > > > should >> > > > > > > > > > define >> > > > > > > > > > > some vectors and matrices. >> > > > > > > > > > > It would be enough to define the algorithms via the >> > > > interfaces >> > > > > > and >> > > > > > > a >> > > > > > > > > > > generic BSP should just run them based on the given >> > input. >> > > > > > > > > > > >> > > > > > > > > > > 2012/7/7 Tommaso Teofili <[email protected]> >> > > > > > > > > > > >> > > > > > > > > > > > Hi all, >> > > > > > > > > > > > >> > > > > > > > > > > > in my spare time I started writing some basic BSP >> based >> > > > > machine >> > > > > > > > > > learning >> > > > > > > > > > > > algorithms for our ml module, now I'm wondering, >> from a >> > > > > design >> > > > > > > > point >> > > > > > > > > of >> > > > > > > > > > > > view, where it'd make sense to put the training >> data / >> > > > model. >> > > > > > I'd >> > > > > > > > > > assume >> > > > > > > > > > > > the obvious answer would be HDFS so this makes me >> think >> > > we >> > > > > > should >> > > > > > > > > come >> > > > > > > > > > > with >> > > > > > > > > > > > (at least) two BSP jobs for each algorithm: one for >> > > > learning >> > > > > > and >> > > > > > > > one >> > > > > > > > > > for >> > > > > > > > > > > > "predicting" each to be run separately. >> > > > > > > > > > > > This would allow to read the training data from >> HDFS, >> > and >> > > > > > > > > consequently >> > > > > > > > > > > > create a model (also on HDFS) and then the created >> > model >> > > > > could >> > > > > > be >> > > > > > > > > read >> > > > > > > > > > > > (again from HDFS) in order to predict an output for >> a >> > new >> > > > > > input. >> > > > > > > > > > > > Does that make sense? >> > > > > > > > > > > > I'm just wondering what a general purpose design for >> > Hama >> > > > > based >> > > > > > > ML >> > > > > > > > > > stuff >> > > > > > > > > > > > would look like so this is just to start the >> > discussion, >> > > > any >> > > > > > > > opinion >> > > > > > > > > is >> > > > > > > > > > > > welcome. >> > > > > > > > > > > > >> > > > > > > > > > > > Cheers, >> > > > > > > > > > > > Tommaso >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
