[ML] - data storage and basic design approach

Tommaso Teofili Sat, 07 Jul 2012 00:39:09 -0700

Hi all,

in my spare time I started writing some basic BSP based machine learning
algorithms for our ml module, now I'm wondering, from a design point of
view, where it'd make sense to put the training data / model. I'd assume
the obvious answer would be HDFS so this makes me think we should come with
(at least) two BSP jobs for each algorithm: one for learning and one for
"predicting" each to be run separately.
This would allow to read the training data from HDFS, and consequently
create a model (also on HDFS) and then the created model could be read
(again from HDFS) in order to predict an output for a new input.
Does that make sense?
I'm just wondering what a general purpose design for Hama based ML stuff
would look like so this is just to start the discussion, any opinion is
welcome.


Cheers,
Tommaso

[ML] - data storage and basic design approach

Reply via email to