Johannes Schwenk created PIG-2722:
-------------------------------------
Summary: UDF FilterFunc in expression using OR right hand side
gets ignored
Key: PIG-2722
URL: https://issues.apache.org/jira/browse/PIG-2722
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.1
Environment: pig-0.8.1, hadoop-0.20.2 from Clouderas distribution
cdh3u3 on Kubuntu 12.04 64Bit.
Reporter: Johannes Schwenk
The following pig script does not produce the expected output:
{noformat}
register adition.jar
a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int,
additional:int, referer:chararray);
b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') OR
com.adition.pig.filtering.string.CONTAINS(referer, 'praesident');
EXPLAIN b;
dump b;
{noformat}
TestCONTAINS-testFilteringCluster-input.txt contains
{noformat}
1 23 42
http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers
2 123 42
http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers
3 223 142
http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca
4 323 242
http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama
5 423 342 http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama
6 523 442
http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident
{noformat}
The {{adition.jar}} has been built against the cloudera cdh3u3 distribution
and contains the filter function {{CONTAINS}}, see here
http://pastebin.com/Uwje7v1V .
The output can be seen here http://pastebin.com/yXY17mXx . Essentially what is
happening is that the right hand side of the OR in the FILTER expression is
beeing ignored, resulting in the script returning just two lines
{noformat}
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
(5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
{noformat}
instead of three lines
{noformat}
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
(5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
(6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident)
{noformat}
Running the script with pig 0.11.0 yields correct results
http://pastebin.com/Cr5CkHui
See also the diskussion on the pig-user mailinglist
http://www.mail-archive.com/user%40pig.apache.org/msg05278.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira