would like to move core module so that other can reuse it. On Tue, Jul 10, 2012 at 7:13 PM, Tommaso Teofili <[email protected]> wrote: > I've done the first import, we can start from that now, thanks Thomas. > Tommaso > > 2012/7/10 Tommaso Teofili <[email protected]> > >> ok, I'll try that, thanks :) >> Tommaso >> >> 2012/7/10 Thomas Jungblut <[email protected]> >> >>> I don't know if we need sparse/named vectors for the first scratch. >>> You can just use the interface and the dense implementations and remove >>> all >>> the uncompilable code in the writables. >>> >>> 2012/7/10 Tommaso Teofili <[email protected]> >>> >>> > Thomas, while inspecting the code I realize I may need to import >>> most/all >>> > of the classes inside your math library for the writables to compile, >>> is it >>> > ok for you or you don't want that? >>> > Regards, >>> > Tommaso >>> > >>> > 2012/7/10 Thomas Jungblut <[email protected]> >>> > >>> > > great, thank you for taking care of it ;) >>> > > >>> > > 2012/7/10 Tommaso Teofili <[email protected]> >>> > > >>> > > > Ok, sure, I'll just add the writables along with DoubleMatrix/Vector >>> > with >>> > > > the AL2 headers on top. >>> > > > Thanks Thomas for the contribution and feedback. >>> > > > Tommaso >>> > > > >>> > > > 2012/7/10 Thomas Jungblut <[email protected]> >>> > > > >>> > > > > Feel free to commit this, but take care to add the apache license >>> > > > headers. >>> > > > > Also I wanted to add a few testcases over the next few weekends. >>> > > > > >>> > > > > 2012/7/10 Tommaso Teofili <[email protected]> >>> > > > > >>> > > > > > nice idea, quickly thinking to it it looks to me that (C)GD is a >>> > good >>> > > > fit >>> > > > > > for BSP. >>> > > > > > Also I was trying to implement some easy meta learning algorithm >>> > like >>> > > > the >>> > > > > > weighed majority algorithm where each peer as a proper learning >>> > > > algorithm >>> > > > > > and gest penalized for each mistaken prediction. >>> > > > > > Regarding your math library do you plan to commit it yourself? >>> > > > Otherwise >>> > > > > I >>> > > > > > can do it. >>> > > > > > Regards, >>> > > > > > Tommaso >>> > > > > > >>> > > > > > >>> > > > > > 2012/7/10 Thomas Jungblut <[email protected]> >>> > > > > > >>> > > > > > > Maybe a first good step towards algorithms would be to try to >>> > > > evaluate >>> > > > > > how >>> > > > > > > we can implement some non-linear optimizers in BSP. (BFGS or >>> > > > conjugate >>> > > > > > > gradient method) >>> > > > > > > >>> > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> >>> > > > > > > >>> > > > > > > > 2012/7/9 Thomas Jungblut <[email protected]> >>> > > > > > > > >>> > > > > > > > > For the matrix/vector I would propose my library >>> interface: >>> > > > (quite >>> > > > > > like >>> > > > > > > > > mahouts math, but without boundary checks) >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java >>> > > > > > > > > Full Writable for Vector and basic Writable for Matrix: >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable >>> > > > > > > > > >>> > > > > > > > > It is an enough to make all machine learning algorithms >>> I've >>> > > seen >>> > > > > > until >>> > > > > > > > now >>> > > > > > > > > and the builder pattern allows really nice chaining of >>> > commands >>> > > > to >>> > > > > > > easily >>> > > > > > > > > code equations or translate code from matlab/octave. >>> > > > > > > > > See for example logistic regression cost function >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > very nice, +1! >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > For the interfaces of the algorithms: >>> > > > > > > > > I guess we need to get some more experience, I can not >>> tell >>> > how >>> > > > the >>> > > > > > > > > interfaces for them should look like, mainly because I >>> don't >>> > > know >>> > > > > how >>> > > > > > > the >>> > > > > > > > > BSP version of them will call the algorithm logic. >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > you're right, it's more reasonable to just proceed bottom - >>> up >>> > > with >>> > > > > > this >>> > > > > > > as >>> > > > > > > > we're going to have a clearer idea while developing the >>> > different >>> > > > > > > > algorithms. >>> > > > > > > > So for now I'd introduce your library Writables and then >>> > proceed >>> > > 1 >>> > > > > step >>> > > > > > > at >>> > > > > > > > a time with the more common API. >>> > > > > > > > Thanks, >>> > > > > > > > Tommaso >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > > > > > > > > But having stable math interfaces is the key point. >>> > > > > > > > > >>> > > > > > > > > 2012/7/9 Tommaso Teofili <[email protected]> >>> > > > > > > > > >>> > > > > > > > > > Ok, so let's sketch up here what these interfaces should >>> > look >>> > > > > like. >>> > > > > > > > > > Any proposal is more than welcome. >>> > > > > > > > > > Regards, >>> > > > > > > > > > Tommaso >>> > > > > > > > > > >>> > > > > > > > > > 2012/7/7 Thomas Jungblut <[email protected]> >>> > > > > > > > > > >>> > > > > > > > > > > Looks fine to me. >>> > > > > > > > > > > The key are the interfaces for learning and >>> predicting so >>> > > we >>> > > > > > should >>> > > > > > > > > > define >>> > > > > > > > > > > some vectors and matrices. >>> > > > > > > > > > > It would be enough to define the algorithms via the >>> > > > interfaces >>> > > > > > and >>> > > > > > > a >>> > > > > > > > > > > generic BSP should just run them based on the given >>> > input. >>> > > > > > > > > > > >>> > > > > > > > > > > 2012/7/7 Tommaso Teofili <[email protected]> >>> > > > > > > > > > > >>> > > > > > > > > > > > Hi all, >>> > > > > > > > > > > > >>> > > > > > > > > > > > in my spare time I started writing some basic BSP >>> based >>> > > > > machine >>> > > > > > > > > > learning >>> > > > > > > > > > > > algorithms for our ml module, now I'm wondering, >>> from a >>> > > > > design >>> > > > > > > > point >>> > > > > > > > > of >>> > > > > > > > > > > > view, where it'd make sense to put the training >>> data / >>> > > > model. >>> > > > > > I'd >>> > > > > > > > > > assume >>> > > > > > > > > > > > the obvious answer would be HDFS so this makes me >>> think >>> > > we >>> > > > > > should >>> > > > > > > > > come >>> > > > > > > > > > > with >>> > > > > > > > > > > > (at least) two BSP jobs for each algorithm: one for >>> > > > learning >>> > > > > > and >>> > > > > > > > one >>> > > > > > > > > > for >>> > > > > > > > > > > > "predicting" each to be run separately. >>> > > > > > > > > > > > This would allow to read the training data from >>> HDFS, >>> > and >>> > > > > > > > > consequently >>> > > > > > > > > > > > create a model (also on HDFS) and then the created >>> > model >>> > > > > could >>> > > > > > be >>> > > > > > > > > read >>> > > > > > > > > > > > (again from HDFS) in order to predict an output for >>> a >>> > new >>> > > > > > input. >>> > > > > > > > > > > > Does that make sense? >>> > > > > > > > > > > > I'm just wondering what a general purpose design for >>> > Hama >>> > > > > based >>> > > > > > > ML >>> > > > > > > > > > stuff >>> > > > > > > > > > > > would look like so this is just to start the >>> > discussion, >>> > > > any >>> > > > > > > > opinion >>> > > > > > > > > is >>> > > > > > > > > > > > welcome. >>> > > > > > > > > > > > >>> > > > > > > > > > > > Cheers, >>> > > > > > > > > > > > Tommaso >>> > > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >>
-- Best Regards, Edward J. Yoon @eddieyoon
