Re: split operator

Daniel Dai Mon, 26 Jul 2010 11:10:20 -0700

Hi, Gang,

Which part of the paper are you talking about? We don't do in-memorysplit. We dump the split result to a temporary file and start a newmap-reduce job. Split do create a map-reduce boundary (Though it is notentirely true, multiquery optimizer may combine some of these jobs)


Daniel

Gang Luo wrote:

Hi all
according to the vldb 09 paper, the split operator and all its successiveoperators reside in memory without any blocking in between. However, the sourcecode (version 0.7) shows that a MR job is actually ended when it meets the splitoperator and multiple new MR jobs are created, each representing one branch.This write-once-read-multiple-times method is different from the in-memorymethod mentioned in that paper. Does pig change the strategy for split, or isthere still an in-memory version of split I didn't discover?
Thanks,
-Gang

Re: split operator

Reply via email to