Andy Schlaikjer created PIG-2991: ------------------------------------ Summary: Clarify document of Algebraic contracts Key: PIG-2991 URL: https://issues.apache.org/jira/browse/PIG-2991 Project: Pig Issue Type: Improvement Components: documentation Affects Versions: 0.10.0 Reporter: Andy Schlaikjer
Documentation of Algebraic contracts is somewhat confusing. It took me a while to understand that Initial impl exec method is passed a singleton bag of X, and should return the single X value so that Intermed exec gets a proper bag of X. The builtins like SUM and COUNT are generally clearly written, but this specific point isn't easy to deduce from those impls either. It would be great if the discussion at the following URL could be improved to make all Algebraic contracts more explicit: http://pig.apache.org/docs/r0.10.0/udf.html#algebraic-interface Also, detailed answers to the following questions would be great to include in some form: Q: Does Pig make use of Initial, Intermed, Final class outputSchema methods? If so, how? Q: If my Intermed or Final classes additionally implement Accumulator interface, does Pig take advantage of this? Q: Should the parent UDF's outputSchema method always expect to be passed the same input schema, regardless of the context (algebraic, accumulative, regular exec) in which it is used? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira