[ https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758126#action_12758126 ]
Hadoop QA commented on PIG-738: ------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420246/PIG-738.patch against trunk revision 817319. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/42/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/42/console This message is automatically generated. > Regexp passed from pigscript fails in UDF > ------------------------------------------- > > Key: PIG-738 > URL: https://issues.apache.org/jira/browse/PIG-738 > Project: Pig > Issue Type: Bug > Components: grunt > Affects Versions: 0.3.0 > Reporter: Viraj Bhat > Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, > regexp.pig, regexpinput.txt > > > Consider a pig script which parses and counts regular expressions from a text > file. > The regular expression supplied in the Pig script needs to escape the "." > (dot) character. > {code} > register myregexp.jar; > -- pattern not picked up > define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports'); > A = load '/user/viraj/regexpinput.txt' using PigStorage() as (source : > chararray); > B = foreach A generate minelogs(source) as sportslogs; > dump B; > {code} > Snippet of UDF RegexGroupCount.java > {code} > public class RegexGroupCount extends EvalFunc<Integer> { > private final Pattern pattern_; > public RegexGroupCount(String patternStr) { > System.out.println("My pattern supplied is "+patternStr); > System.out.println("Equality test > "+patternStr.equals("www\\.yahoo\\.com/sports")); > pattern_ = Pattern.compile(patternStr, > Pattern.DOTALL|Pattern.CASE_INSENSITIVE); > } > public Integer exec(Tuple input) throws IOException { > } > } > {code} > Running the above script on the following dataset : > ==================================================================================================== > dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl > kas;dka;sd > jsjsjwww.yahoo.com/sports > jsdLSJDcom/sports > wwwJyahooMcom/sports > ==================================================================================================== > Results in the following: > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: > ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: > viraj-Sat Mar 28 02:06:31 PDT 2009-14) > Userfunc fs: int > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > 2009-03-28 02:06:43,923 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2009-03-28 02:06:43,923 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Success! > (0) > (0) > (0) > (0) > (0) > ==================================================================================================== > In essence there seems to be no way of passing this type of constructor > argument through the Pig script. The only workaround seems to be hard coding > the values in the UDF!! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.