Basically it's a matter of clarity. I agree that it creates a lot of boiler plate, but we thought it made it more clear exactly what was being passed in and out of the macro. Especially in cases where a macro returns multiple outputs (that is, you can't just look at the last line and see what it is returning). In the original proposal the store would basically act as a return statement. But perhaps we're optimizing for the less common case. If others agree that a more terse (but less clear) syntax is better, I'm open to that.

One change I would want to make. In your proposal, it isn't obvious what is input and output without examining the macro. In cases where the macro is more than a few lines this will be hard to use. This could be addressed though by adding an 'out' keyword, so that it becomes:

define bot_cleanser[X out, Y](user) {
        X = filter Y by not is_a_bot($user);
}

Alan.

On Oct 15, 2010, at 5:58 PM, Scott Carey wrote:

I'm most interested in the macro expansion and importing other files for shared common code. I could be missing something, but the TempStorage thing necessary?

bot_filter.pig:
--------------
define bot_cleanser(user) {
   A = load 'bc_input' using TempStorage();
   B = filter A by not is_a_bot($user);
   store B into 'bc_output' using TempStorage();
}
----------------
main.pig:
-------------------
import bot_filter.pig;

A = load 'fact';
store A into 'bc_input' using TempStorage();
inline bot_cleanser('username');
B = load 'bc_output' using TempStorage();
C = group B by user;
...
store Z into 'processed';
-----------------------

Couldn't we pass aliases in instead and remove lots of boilerplate?

bot_filter.pig:
--------------
define bot_cleanser[X,Y](user) {
   X = filter Y by not is_a_bot($user);
}
----------------
main.pig:
-------------------
import bot_filter.pig;

A = load 'fact';
inline bot_cleanser[A,B]('username');
C = group B by user;
...
store Z into 'processed';
-----------------------

The inline then would substitute A for X, B for Y, and 'username' for user. Aliases are separated from other parameters because we may actually be declaring new aliases when inlining and it should be easier to deal with the semantic differences that way. In particular, the [A, B] above are essentially declaring that the macro 'shares' these aliases, and all other aliases do not overlap.

Any aliases not declared up front are renamed as to not collide when inlined. I look at the macro expansion and function examples and see tons of alias naming boilerplate that should IMO be implicit somehow. Pig already has a lot of alias and field naming boilerplate, I would like to avoid introducing more. Otherwise, I'm sure I'll use a preprocessor again to get rid of it :).




On Oct 15, 2010, at 4:39 PM, Alan Gates wrote:

After several months of mulling things around Richard and I have put
together a proposed design for adding control flow to Pig.  See 
http://wiki.apache.org/pig/TuringCompletePig
for complete details.  Please give us your feedback.

Alan.


Reply via email to