Re: Random thought: line separators

Robin Anil Mon, 18 Jan 2010 06:08:16 -0800

could you be specific on which map/reduce job you encountered the error ?

On Mon, Jan 18, 2010 at 7:28 PM, Olivier Grisel <olivier.gri...@ensta.org>wrote:


> 2010/1/18 Robin Anil <robin.a...@gmail.com>:
> > Its this kind of thing that forced to move to sequence files instead of
> > TextKeyValueInput format and other text based/ csv based formats. Kind of
> > regretting the decision to go with tab separated format for
> BayesClassifier
> > which i wrote it 2 years ago. I will be modifying this to use sparse
> vectors
> > or the sequence files which ever fits.
> >
> > My thought is that this kind of functionality should only be used by the
> > format convertors that convert to and back from sequence files. and when
> > storing it to sequence files just enforce the \n rule for line breaks
>
> By the way, I tried to run the Bayesian classifier's features
> extractor on the following wikipedia chunk:
>
> s3://enwiki-pages-articles/enwiki-20090810-pages-articles/chunk-0001.xml
>
> And I got an EOFException in hadoop related classes (no mahout classes
> in the stacktrace). I wonder if this is related, or maybe this is
> related to the java serialization used in that step.
>
> The feature extractors works on all other chunks I tried though. All
> those chunks were extracted on a linux machine.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>

Re: Random thought: line separators

Reply via email to