Basically it's a matter of clarity. I agree that it creates a lot of
boiler plate, but we thought it made it more clear exactly what was
being passed in and out of the macro. Especially in cases where a
macro returns multiple outputs (that is, you can't just look at the
last line and see what it is returning). In the original proposal the
store would basically act as a return statement. But perhaps we're
optimizing for the less common case. If others agree that a more
terse (but less clear) syntax is better, I'm open to that.
One change I would want to make. In your proposal, it isn't obvious
what is input and output without examining the macro. In cases where
the macro is more than a few lines this will be hard to use. This
could be addressed though by adding an 'out' keyword, so that it
becomes:
define bot_cleanser[X out, Y](user) {
X = filter Y by not is_a_bot($user);
}
Alan.
On Oct 15, 2010, at 5:58 PM, Scott Carey wrote:
I'm most interested in the macro expansion and importing other files
for shared common code. I could be missing something, but the
TempStorage thing necessary?
bot_filter.pig:
--------------
define bot_cleanser(user) {
A = load 'bc_input' using TempStorage();
B = filter A by not is_a_bot($user);
store B into 'bc_output' using TempStorage();
}
----------------
main.pig:
-------------------
import bot_filter.pig;
A = load 'fact';
store A into 'bc_input' using TempStorage();
inline bot_cleanser('username');
B = load 'bc_output' using TempStorage();
C = group B by user;
...
store Z into 'processed';
-----------------------
Couldn't we pass aliases in instead and remove lots of boilerplate?
bot_filter.pig:
--------------
define bot_cleanser[X,Y](user) {
X = filter Y by not is_a_bot($user);
}
----------------
main.pig:
-------------------
import bot_filter.pig;
A = load 'fact';
inline bot_cleanser[A,B]('username');
C = group B by user;
...
store Z into 'processed';
-----------------------
The inline then would substitute A for X, B for Y, and 'username'
for user. Aliases are separated from other parameters because we
may actually be declaring new aliases when inlining and it should be
easier to deal with the semantic differences that way. In
particular, the [A, B] above are essentially declaring that the
macro 'shares' these aliases, and all other aliases do not overlap.
Any aliases not declared up front are renamed as to not collide when
inlined. I look at the macro expansion and function examples and
see tons of alias naming boilerplate that should IMO be implicit
somehow. Pig already has a lot of alias and field naming
boilerplate, I would like to avoid introducing more. Otherwise, I'm
sure I'll use a preprocessor again to get rid of it :).
On Oct 15, 2010, at 4:39 PM, Alan Gates wrote:
After several months of mulling things around Richard and I have put
together a proposed design for adding control flow to Pig. See
http://wiki.apache.org/pig/TuringCompletePig
for complete details. Please give us your feedback.
Alan.