[ https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758272#action_12758272 ]
Olga Natkovich commented on PIG-738: ------------------------------------ +1, please, commit > Regexp passed from pigscript fails in UDF > ------------------------------------------- > > Key: PIG-738 > URL: https://issues.apache.org/jira/browse/PIG-738 > Project: Pig > Issue Type: Bug > Components: grunt > Affects Versions: 0.3.0 > Reporter: Viraj Bhat > Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, > regexp.pig, regexpinput.txt > > > Consider a pig script which parses and counts regular expressions from a text > file. > The regular expression supplied in the Pig script needs to escape the "." > (dot) character. > {code} > register myregexp.jar; > -- pattern not picked up > define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports'); > A = load '/user/viraj/regexpinput.txt' using PigStorage() as (source : > chararray); > B = foreach A generate minelogs(source) as sportslogs; > dump B; > {code} > Snippet of UDF RegexGroupCount.java > {code} > public class RegexGroupCount extends EvalFunc<Integer> { > private final Pattern pattern_; > public RegexGroupCount(String patternStr) { > System.out.println("My pattern supplied is "+patternStr); > System.out.println("Equality test > "+patternStr.equals("www\\.yahoo\\.com/sports")); > pattern_ = Pattern.compile(patternStr, > Pattern.DOTALL|Pattern.CASE_INSENSITIVE); > } > public Integer exec(Tuple input) throws IOException { > } > } > {code} > Running the above script on the following dataset : > ==================================================================================================== > dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl > kas;dka;sd > jsjsjwww.yahoo.com/sports > jsdLSJDcom/sports > wwwJyahooMcom/sports > ==================================================================================================== > Results in the following: > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: > ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: > viraj-Sat Mar 28 02:06:31 PDT 2009-14) > Userfunc fs: int > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > 2009-03-28 02:06:43,923 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2009-03-28 02:06:43,923 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Success! > (0) > (0) > (0) > (0) > (0) > ==================================================================================================== > In essence there seems to be no way of passing this type of constructor > argument through the Pig script. The only workaround seems to be hard coding > the values in the UDF!! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.