[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694832#comment-13694832 ] Julien Le Dem commented on PIG-3367: I was thinking we could make the syntax part of FOREACH. {noformat} B = FOREACH A GENERATE a, b, c ASSERT a = 0, b IS NOT NULL; {noformat} That way it is easy to integrate asserts in the flow. The advantage of having it part of the language: - the error message can be clear without extra user input. - it's more natural than doing a filter that does not filter. Also if the filter is not in the predecessors of a STORE, it won't be executed. A UDF can stop the job by throwing an exception. Although the task will retry before failing completely. For reference, the UDF based syntax: {noformat} FILTER members BY ASSERT( (member_id = 0 ? 1 : 0), 'Doh! Some member ID is negative.' ); {noformat} Yes adding new keywords is inconvenient when the keyword was used for relation or column names. When a field collides with a keyword it is sometimes difficult to rename it. I think we should: - try to avoid new keywords if possible - provide a mechanism to escape field names to facilitate fixing conflicts when they happen (using quotes or a similar mechanism) Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694950#comment-13694950 ] Alex Levenson commented on PIG-3367: Making it part of foreach looks more readable. I think there should still be an option for custom error messages. Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3368) doc pig flatten operator applied to empty vs null bag
Andy Schlaikjer created PIG-3368: Summary: doc pig flatten operator applied to empty vs null bag Key: PIG-3368 URL: https://issues.apache.org/jira/browse/PIG-3368 Project: Pig Issue Type: Improvement Components: documentation Reporter: Andy Schlaikjer [Pig docs|http://pig.apache.org/docs/r0.11.0/basic.html#flatten] state that FLATTEN(field_of_type_bag) may generate a cross-product in the case when an additional field is projected, e.g.: y = FOREACH x GENERATE f1, FLATTEN(fbag) as f2; Additionally, for records in x for which fbag is empty (not null), no output record is generated. What is expected behavior when fbag is null? Some users might expect similar behavior, but FLATTEN actually passes through the null, resulting in an output record (f1, f2) where f2 is null. It would be useful to update FLATTEN docs to mention this. http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?view=markup#l5051 I'm guessing these are the relevant bits which affect this behavior: http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l440 http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l468 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695007#comment-13695007 ] Arun Ahuja commented on PIG-3367: - Isn't this available with http://linkedin.github.io/datafu/docs/javadoc/datafu/pig/util/ASSERT.html, so it this mainly about bringing it into the language? Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695047#comment-13695047 ] Aniket Mokashi commented on PIG-3367: - @[~arahuja], correct. We feel that it should be part of the language itself. Very useful indeed. Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3368) doc pig flatten operator applied to empty vs null bag
[ https://issues.apache.org/jira/browse/PIG-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi reassigned PIG-3368: --- Assignee: Aniket Mokashi doc pig flatten operator applied to empty vs null bag - Key: PIG-3368 URL: https://issues.apache.org/jira/browse/PIG-3368 Project: Pig Issue Type: Improvement Components: documentation Reporter: Andy Schlaikjer Assignee: Aniket Mokashi [Pig docs|http://pig.apache.org/docs/r0.11.0/basic.html#flatten] state that FLATTEN(field_of_type_bag) may generate a cross-product in the case when an additional field is projected, e.g.: y = FOREACH x GENERATE f1, FLATTEN(fbag) as f2; Additionally, for records in x for which fbag is empty (not null), no output record is generated. What is expected behavior when fbag is null? Some users might expect similar behavior, but FLATTEN actually passes through the null, resulting in an output record (f1, f2) where f2 is null. It would be useful to update FLATTEN docs to mention this. http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?view=markup#l5051 I'm guessing these are the relevant bits which affect this behavior: http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l440 http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l468 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (15 issues) Subscriber: pigdaily Key Summary PIG-3346New property that controls the number of combined splits https://issues.apache.org/jira/browse/PIG-3346 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3288Kill jobs if the number of output files is over a configurable limit https://issues.apache.org/jira/browse/PIG-3288 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-2248Pig parser does not detect when a macro name masks a UDF name https://issues.apache.org/jira/browse/PIG-2248 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
fuzzy logic through pig programming
Hi, I want a fuzzy logic in pig latin language which should match two string for Example1 I have two words ‘Ramesh’ and ‘Rahim’ I want to check how much percentage of the string are equal Example2 If the two words are ‘Ramesh’ and ‘Ramesh’ .then it should give 100%. Kindly provide the solution if available. Thanks Harshit Bhargava