Hi, Gang,
Which part of the paper are you talking about? We don't do in-memory
split. We dump the split result to a temporary file and start a new
map-reduce job. Split do create a map-reduce boundary (Though it is not
entirely true, multiquery optimizer may combine some of these jobs)
Daniel
Gang Luo wrote:
Hi all
according to the vldb 09 paper, the split operator and all its successive
operators reside in memory without any blocking in between. However, the source
code (version 0.7) shows that a MR job is actually ended when it meets the split
operator and multiple new MR jobs are created, each representing one branch.
This write-once-read-multiple-times method is different from the in-memory
method mentioned in that paper. Does pig change the strategy for split, or is
there still an in-memory version of split I didn't discover?
Thanks,
-Gang