Will do, thanks

2012/5/22 Alan Gates <ga...@hortonworks.com>

> You might post this same question to mapred-user@hadoop.  I know Owen and
> Arun have done a lot of analysis of these kinds of things when optimizing
> the terasort.  Others may have valuable feedback there as well.
>
> Alan.
>
> On May 22, 2012, at 12:23 PM, Jonathan Coveney wrote:
>
> > I've been dealing some with the intermediate serialization in Pig, and
> will
> > probably be dealing with it more in the future. When serializing, there
> is
> > generally the time to serialize vs. space on disk tradeoff (an extreme
> > example being compression vs. no compression, a more nuanced one being
> > varint vs full int, that sort of thing). With Hadoop, generally network
> io
> > is the bottleneck, but I'm not sure of the best way to evaluate something
> > like: method X takes 3x as long to serialize, but is potentially 1/2 as
> > large on disk.
> >
> > What are people doing in the wild?
> > Jon
>
>

Reply via email to