Hi all,

in my spare time I started writing some basic BSP based machine learning
algorithms for our ml module, now I'm wondering, from a design point of
view, where it'd make sense to put the training data / model. I'd assume
the obvious answer would be HDFS so this makes me think we should come with
(at least) two BSP jobs for each algorithm: one for learning and one for
"predicting" each to be run separately.
This would allow to read the training data from HDFS, and consequently
create a model (also on HDFS) and then the created model could be read
(again from HDFS) in order to predict an output for a new input.
Does that make sense?
I'm just wondering what a general purpose design for Hama based ML stuff
would look like so this is just to start the discussion, any opinion is
welcome.

Cheers,
Tommaso

Reply via email to