[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694832#comment-13694832 ]
Julien Le Dem commented on PIG-3367: ------------------------------------ I was thinking we could make the syntax part of FOREACH. {noformat} B = FOREACH A GENERATE a, b, c ASSERT a >= 0, b IS NOT NULL; {noformat} That way it is easy to integrate asserts in the flow. The advantage of having it part of the language: - the error message can be clear without extra user input. - it's more natural than doing a filter that does not filter. Also if the filter is not in the predecessors of a STORE, it won't be executed. A UDF can stop the job by throwing an exception. Although the task will retry before failing completely. For reference, the UDF based syntax: {noformat} FILTER members BY ASSERT( (member_id >= 0 ? 1 : 0), 'Doh! Some member ID is negative.' ); {noformat} Yes adding new keywords is inconvenient when the keyword was used for relation or column names. When a field collides with a keyword it is sometimes difficult to rename it. I think we should: - try to avoid new keywords if possible - provide a mechanism to escape field names to facilitate fixing conflicts when they happen (using quotes or a similar mechanism) > Add assert keyword (operator) in pig > ------------------------------------ > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser > Reporter: Aniket Mokashi > Assignee: Aniket Mokashi > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira