[ 
https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758272#action_12758272
 ] 

Olga Natkovich commented on PIG-738:
------------------------------------

+1, please, commit

> Regexp passed from pigscript fails in UDF  
> -------------------------------------------
>
>                 Key: PIG-738
>                 URL: https://issues.apache.org/jira/browse/PIG-738
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt
>    Affects Versions: 0.3.0
>            Reporter: Viraj Bhat
>            Assignee: Pradeep Kamath
>             Fix For: 0.6.0
>
>         Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, 
> regexp.pig, regexpinput.txt
>
>
> Consider a pig script which parses and counts regular expressions from a text 
> file. 
> The regular expression supplied in the Pig script needs to escape the "."  
> (dot) character.
> {code}
> register myregexp.jar;
> -- pattern not picked up
> define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports');
> A = load '/user/viraj/regexpinput.txt'  using PigStorage() as (source : 
> chararray);
> B = foreach A generate minelogs(source) as sportslogs;
> dump B;
> {code}
> Snippet of UDF RegexGroupCount.java
> {code}
> public class RegexGroupCount extends EvalFunc<Integer> {
>     private final Pattern pattern_;
>     public RegexGroupCount(String patternStr) {
>        System.out.println("My pattern supplied is "+patternStr);
>        System.out.println("Equality test 
> "+patternStr.equals("www\\.yahoo\\.com/sports"));
>        pattern_ = Pattern.compile(patternStr, 
> Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
>    }
>   public Integer exec(Tuple input)  throws IOException {
>    }
> }
> {code}
> Running the above script on the following dataset :
> ====================================================================================================
> dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl
> kas;dka;sd
> jsjsjwww.yahoo.com/sports
> jsdLSJDcom/sports
> wwwJyahooMcom/sports
> ====================================================================================================
> Results in the following:
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: 
> ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: 
> viraj-Sat Mar 28 02:06:31 PDT 2009-14)
> Userfunc fs: int
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> My pattern supplied is www\\.yahoo\\.com/sports
> Equality test false
> 2009-03-28 02:06:43,923 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2009-03-28 02:06:43,923 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Success!
> (0)
> (0)
> (0)
> (0)
> (0)
> ====================================================================================================
> In essence there seems to be no way of passing this type of constructor 
> argument through the Pig script. The only workaround seems to be hard coding 
> the values in the UDF!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to