This talk combined with previous talk about preferred mode of composing tools (script writing using java) is beginning to make me think that we need something like a HdfsMatrix and LocalFileMatrix which are simply wrappers around file names, but which allow extraction of elements (for debugging and diagnostics and sequential implementations) or for passing to generic driver routines or receiving from generic conversion routines.
Should I open a JIRA? On Fri, Nov 13, 2009 at 11:54 AM, Grant Ingersoll <[email protected]>wrote: > Also, take a look at what the TfIdfDriver does for the classifier stuff. > This is a M/R job for converting text for it's format. I think we can > abstract that to be more general purpose and then move it under the Utils > module. The only thing that likely needs to change is whether we output the > Writable for the classifier or whether we output a Vector. That is my naive > view at this point. > -- Ted Dunning, CTO DeepDyve
