Sorry about that short email. here is the situation. We need modules
to convert data in databases (Flatfiles, XMLdumps, MySQL, Different
formats on  HDFS, Hbase) into intermediate form(say vector). Ever
considered having a Workflow where we select InputformatReader Job and
an algorithm to perform (classification, clustering , itemset mining).
where the first process breaks different sources into the vector
format. and then launches the algorithms.

There have been discussion before about using VectorWritable as
intermediate representation. What are your thoughts and ideas on
having a single launcher for all the algorithms where the input data
source/format is specified, the algorithm flags and output sink

Robin

On Tue, Jul 28, 2009 at 12:12 AM, Robin Anil<[email protected]> wrote:
> Hi, I am in the middle of implementing parallel FPGrowth. Currently I
> read in text dumps per line as transactions. I would like to move
> towards something more crisp where i need not worry about the input
> format. What do you guys suggest?
>
> Robin
>

Reply via email to