On Jul 28, 2009, at 1:39 PM, Ted Dunning wrote:
On Tue, Jul 28, 2009 at 12:18 AM, Robin Anil <[email protected]>
wrote:
... We need modules
to convert data in databases (Flatfiles, XMLdumps, MySQL, Different
formats on HDFS, Hbase) into intermediate form(say vector).
Yes. We do need that.
+1
Ever considered having a Workflow where we select InputformatReader
Job and
an algorithm to perform (classification, clustering , itemset
mining).
where the first process breaks different sources into the vector
format. and then launches the algorithms.
That is an intriguing thought. How many algorithms have the same
shape?
(as in, one input, one output, one algorithm, one input format)?
This might be a bit tricky due to the large number of options.
However, I do agree, we should try to standardize names, etc. and use
CLI2 in all places.
-Grant