Re: [ML] - data storage and basic design approach

Thomas Jungblut Sat, 07 Jul 2012 10:50:58 -0700

Looks fine to me.
The key are the interfaces for learning and predicting so we should define
some vectors and matrices.
It would be enough to define the algorithms via the interfaces and a
generic BSP should just run them based on the given input.


2012/7/7 Tommaso Teofili <[email protected]>

> Hi all,
>
> in my spare time I started writing some basic BSP based machine learning
> algorithms for our ml module, now I'm wondering, from a design point of
> view, where it'd make sense to put the training data / model. I'd assume
> the obvious answer would be HDFS so this makes me think we should come with
> (at least) two BSP jobs for each algorithm: one for learning and one for
> "predicting" each to be run separately.
> This would allow to read the training data from HDFS, and consequently
> create a model (also on HDFS) and then the created model could be read
> (again from HDFS) in order to predict an output for a new input.
> Does that make sense?
> I'm just wondering what a general purpose design for Hama based ML stuff
> would look like so this is just to start the discussion, any opinion is
> welcome.
>
> Cheers,
> Tommaso
>

Re: [ML] - data storage and basic design approach

Reply via email to