Andy Schlaikjer created PIG-2991:
------------------------------------

             Summary: Clarify document of Algebraic contracts 
                 Key: PIG-2991
                 URL: https://issues.apache.org/jira/browse/PIG-2991
             Project: Pig
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 0.10.0
            Reporter: Andy Schlaikjer


Documentation of Algebraic contracts is somewhat confusing.

It took me a while to understand that Initial impl exec method is passed a 
singleton bag of X, and should return the single X value so that Intermed exec 
gets a proper bag of X.

The builtins like SUM and COUNT are generally clearly written, but this 
specific point isn't easy to deduce from those impls either.

It would be great if the discussion at the following URL could be improved to 
make all Algebraic contracts more explicit:

http://pig.apache.org/docs/r0.10.0/udf.html#algebraic-interface

Also, detailed answers to the following questions would be great to include in 
some form:

Q: Does Pig make use of Initial, Intermed, Final class outputSchema methods? If 
so, how?

Q: If my Intermed or Final classes additionally implement Accumulator 
interface, does Pig take advantage of this?

Q: Should the parent UDF's outputSchema method always expect to be passed the 
same input schema, regardless of the context (algebraic, accumulative, regular 
exec) in which it is used?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to