Hi all, in my spare time I started writing some basic BSP based machine learning algorithms for our ml module, now I'm wondering, from a design point of view, where it'd make sense to put the training data / model. I'd assume the obvious answer would be HDFS so this makes me think we should come with (at least) two BSP jobs for each algorithm: one for learning and one for "predicting" each to be run separately. This would allow to read the training data from HDFS, and consequently create a model (also on HDFS) and then the created model could be read (again from HDFS) in order to predict an output for a new input. Does that make sense? I'm just wondering what a general purpose design for Hama based ML stuff would look like so this is just to start the discussion, any opinion is welcome.
Cheers, Tommaso
