[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694832#comment-13694832
 ] 

Julien Le Dem commented on PIG-3367:


I was thinking we could make the syntax part of FOREACH.
{noformat}
B = FOREACH A GENERATE a, b, c ASSERT a = 0, b IS NOT NULL;
{noformat}
That way it is easy to integrate asserts in the flow.

The advantage of having it part of the language:
- the error message can be clear without extra user input.
- it's more natural than doing a filter that does not filter. Also if the 
filter is not in the predecessors of a STORE, it won't be executed.

A UDF can stop the job by throwing an exception. Although the task will retry 
before failing completely.

For reference, the UDF based syntax:
{noformat}
FILTER members BY ASSERT( (member_id = 0 ? 1 : 0), 'Doh! Some member ID is 
negative.' );
{noformat}

Yes adding new keywords is inconvenient when the keyword was used for relation 
or column names.
When a field collides with a keyword it is sometimes difficult to rename it.
I think we should:
 - try to avoid new keywords if possible
 - provide a mechanism to escape field names to facilitate fixing conflicts 
when they happen (using quotes or a similar mechanism)

 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi

 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Alex Levenson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694950#comment-13694950
 ] 

Alex Levenson commented on PIG-3367:


Making it part of foreach looks more readable.
I think there should still be an option for custom error messages.


 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi

 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3368) doc pig flatten operator applied to empty vs null bag

2013-06-27 Thread Andy Schlaikjer (JIRA)
Andy Schlaikjer created PIG-3368:


 Summary: doc pig flatten operator applied to empty vs null bag
 Key: PIG-3368
 URL: https://issues.apache.org/jira/browse/PIG-3368
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Andy Schlaikjer


[Pig docs|http://pig.apache.org/docs/r0.11.0/basic.html#flatten] state that 
FLATTEN(field_of_type_bag) may generate a cross-product in the case when an 
additional field is projected, e.g.:

y = FOREACH x GENERATE f1, FLATTEN(fbag) as f2;

Additionally, for records in x for which fbag is empty (not null), no output 
record is generated.

What is expected behavior when fbag is null?

Some users might expect similar behavior, but FLATTEN actually passes through 
the null, resulting in an output record (f1, f2) where f2 is null.

It would be useful to update FLATTEN docs to mention this.

http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?view=markup#l5051

I'm guessing these are the relevant bits which affect this behavior:

http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l440

http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l468

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Arun Ahuja (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695007#comment-13695007
 ] 

Arun Ahuja commented on PIG-3367:
-

Isn't this available with 
http://linkedin.github.io/datafu/docs/javadoc/datafu/pig/util/ASSERT.html, so 
it this mainly about bringing it into the language?

 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi

 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695047#comment-13695047
 ] 

Aniket Mokashi commented on PIG-3367:
-

@[~arahuja], correct. We feel that it should be part of the language itself. 
Very useful indeed.

 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi

 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3368) doc pig flatten operator applied to empty vs null bag

2013-06-27 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi reassigned PIG-3368:
---

Assignee: Aniket Mokashi

 doc pig flatten operator applied to empty vs null bag
 -

 Key: PIG-3368
 URL: https://issues.apache.org/jira/browse/PIG-3368
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Andy Schlaikjer
Assignee: Aniket Mokashi

 [Pig docs|http://pig.apache.org/docs/r0.11.0/basic.html#flatten] state that 
 FLATTEN(field_of_type_bag) may generate a cross-product in the case when an 
 additional field is projected, e.g.:
 y = FOREACH x GENERATE f1, FLATTEN(fbag) as f2;
 Additionally, for records in x for which fbag is empty (not null), no output 
 record is generated.
 What is expected behavior when fbag is null?
 Some users might expect similar behavior, but FLATTEN actually passes through 
 the null, resulting in an output record (f1, f2) where f2 is null.
 It would be useful to update FLATTEN docs to mention this.
 http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?view=markup#l5051
 I'm guessing these are the relevant bits which affect this behavior:
 http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l440
 http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java?view=markup#l468

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-06-27 Thread jira
Issue Subscription
Filter: PIG patch available (15 issues)

Subscriber: pigdaily

Key Summary
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


fuzzy logic through pig programming

2013-06-27 Thread Harshit Bhargava
Hi,
I want a fuzzy logic in pig latin language which should match two string for
Example1
I have two words ‘Ramesh’ and ‘Rahim’ I want to check how much percentage
of the string are equal
Example2
If the two words are ‘Ramesh’ and ‘Ramesh’ .then  it should give 100%.
Kindly provide the solution if available.
Thanks
Harshit Bhargava