Sorry about that short email. here is the situation. We need modules to convert data in databases (Flatfiles, XMLdumps, MySQL, Different formats on HDFS, Hbase) into intermediate form(say vector). Ever considered having a Workflow where we select InputformatReader Job and an algorithm to perform (classification, clustering , itemset mining). where the first process breaks different sources into the vector format. and then launches the algorithms.
There have been discussion before about using VectorWritable as intermediate representation. What are your thoughts and ideas on having a single launcher for all the algorithms where the input data source/format is specified, the algorithm flags and output sink Robin On Tue, Jul 28, 2009 at 12:12 AM, Robin Anil<[email protected]> wrote: > Hi, I am in the middle of implementing parallel FPGrowth. Currently I > read in text dumps per line as transactions. I would like to move > towards something more crisp where i need not worry about the input > format. What do you guys suggest? > > Robin >
