Hi, Gang,
Which part of the paper are you talking about? We don't do in-memory split. We dump the split result to a temporary file and start a new map-reduce job. Split do create a map-reduce boundary (Though it is not entirely true, multiquery optimizer may combine some of these jobs)

Daniel

Gang Luo wrote:
Hi all
according to the vldb 09 paper, the split operator and all its successive operators reside in memory without any blocking in between. However, the source code (version 0.7) shows that a MR job is actually ended when it meets the split operator and multiple new MR jobs are created, each representing one branch. This write-once-read-multiple-times method is different from the in-memory method mentioned in that paper. Does pig change the strategy for split, or is there still an in-memory version of split I didn't discover?

Thanks,
-Gang



Reply via email to