Hi, Gang, It is about multiquery optimization. In MRCompiler, we will create a map-reduce boundary for split, later in MultiQueryOptimizer, we will merge several split into one map-reduce job. In this map-reduce job, we will nest several split plans.
Daniel Gang Luo wrote: > Hi Daniel, > in 4.3.1, the example and figure 6 show this. 5.1 last paragraph says split > operator maintain one-tuple buffer for each branch and talks about how to > synchronize multiple branches. I do think that is the in-memory split. > > here is the paper: http://www.vldb.org/pvldb/2/vldb09-1074.pdf > > > -Gang > > > > ----- 原始邮件 ---- > 发件人: Daniel Dai <jiany...@yahoo-inc.com> > 收件人: "pig-dev@hadoop.apache.org" <pig-dev@hadoop.apache.org> > 发送日期: 2010/7/26 (周一) 2:09:25 下午 > 主 题: Re: split operator > > Hi, Gang, > Which part of the paper are you talking about? We don't do in-memory split. > We > dump the split result to a temporary file and start a new map-reduce job. > Split > do create a map-reduce boundary (Though it is not entirely true, multiquery > optimizer may combine some of these jobs) > > Daniel > > Gang Luo wrote: > >> Hi all >> according to the vldb 09 paper, the split operator and all its successive >> operators reside in memory without any blocking in between. However, the >> source >> code (version 0.7) shows that a MR job is actually ended when it meets the >> split >> operator and multiple new MR jobs are created, each representing one branch. >> This write-once-read-multiple-times method is different from the in-memory >> method mentioned in that paper. Does pig change the strategy for split, or >> is >> there still an in-memory version of split I didn't discover? >> >> Thanks, >> -Gang >> >> >> >> > > > >