True, to capture the network effect you'll need to run MR on a cluster. On Tue, May 22, 2012 at 8:57 PM, Jonathan Coveney <jcove...@gmail.com>wrote:
> But you don't capture the nature of the speed benefit of less data going > over the wire, right? I mean a lot of people use GZip, but in a hadoop > context, it is considered too CPU intensive, and the gain in speed from > less data going over the wire isn't enough to counteract that... I'm not > quite sure how to establish that with other methods. I can quantify the > cpu/size tradeoff with a microbenchmark, but not how it plays out on the > network. > > 2012/5/22 Bill Graham <billgra...@gmail.com> > > > You could also try using a microbech framework to test out various > > compression techniques in isolation. > > > > On Tuesday, May 22, 2012, Jonathan Coveney wrote: > > > > > Will do, thanks > > > > > > 2012/5/22 Alan Gates <ga...@hortonworks.com <javascript:;>> > > > > > > > You might post this same question to mapred-user@hadoop. I know > Owen > > > and > > > > Arun have done a lot of analysis of these kinds of things when > > optimizing > > > > the terasort. Others may have valuable feedback there as well. > > > > > > > > Alan. > > > > > > > > On May 22, 2012, at 12:23 PM, Jonathan Coveney wrote: > > > > > > > > > I've been dealing some with the intermediate serialization in Pig, > > and > > > > will > > > > > probably be dealing with it more in the future. When serializing, > > there > > > > is > > > > > generally the time to serialize vs. space on disk tradeoff (an > > extreme > > > > > example being compression vs. no compression, a more nuanced one > > being > > > > > varint vs full int, that sort of thing). With Hadoop, generally > > network > > > > io > > > > > is the bottleneck, but I'm not sure of the best way to evaluate > > > something > > > > > like: method X takes 3x as long to serialize, but is potentially > 1/2 > > as > > > > > large on disk. > > > > > > > > > > What are people doing in the wild? > > > > > Jon > > > > > > > > > > > > > > > > > -- > > Sent from Gmail Mobile > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*