The logical plan for your script will look like:
Load -> Filter -> Store
Filter will have an expression plan that looks like Proj($0) > const(5)
So yes, all your data will go through the filter operator. But keep
in mind that there is a filter operator in each map task, so all your
code will not go through any one instance of the operator (unless
myfile is small). Hope that helps.
Unfortunately, there is not any great architecture document on Pig.
Probably the best substitute is a paper we published in VLDB 2009,
which you can get here: http://infolab.stanford.edu/~olston/publications/vldb09.pdf
. Since this is almost 2 years old now some of the specific
information is out of date but the basic structure is still correct.
Alan.
On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:
Hello all:
I'm new user of Pig , and I'm very interested in the architecture of
Pig.
I have a question about the logical plan
In the logical plan of this example: (in attach)
a = load 'myfile';
b = filter a by $0 > 5;
store b into 'myfilteredfile';
Does all the data in 'myfile' will be sent in it's totality to the
Proj(0) operator and to the Filter Operator ??
More generally what are runing on the arrows in the logical plan ??
what is the best documentation to understand the architecture of Pig
not only how to use it because I'll try to use it in the medical
domain but first I have to understand it
deeply
thank you very much for your help
Baraa MOHAMAD
Doctorante en informatique
ISIMA-LIMOS
Université Blaise Pascal
Clermont-Ferrand
France
Tél: +33 658900080