could you be specific on which map/reduce job you encountered the error ? On Mon, Jan 18, 2010 at 7:28 PM, Olivier Grisel <olivier.gri...@ensta.org>wrote:
> 2010/1/18 Robin Anil <robin.a...@gmail.com>: > > Its this kind of thing that forced to move to sequence files instead of > > TextKeyValueInput format and other text based/ csv based formats. Kind of > > regretting the decision to go with tab separated format for > BayesClassifier > > which i wrote it 2 years ago. I will be modifying this to use sparse > vectors > > or the sequence files which ever fits. > > > > My thought is that this kind of functionality should only be used by the > > format convertors that convert to and back from sequence files. and when > > storing it to sequence files just enforce the \n rule for line breaks > > By the way, I tried to run the Bayesian classifier's features > extractor on the following wikipedia chunk: > > s3://enwiki-pages-articles/enwiki-20090810-pages-articles/chunk-0001.xml > > And I got an EOFException in hadoop related classes (no mahout classes > in the stacktrace). I wonder if this is related, or maybe this is > related to the java serialization used in that step. > > The feature extractors works on all other chunks I tried though. All > those chunks were extracted on a linux machine. > > -- > Olivier > http://twitter.com/ogrisel - http://code.oliviergrisel.name >