[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789668#action_12789668
 ] 

Thejas M Nair commented on PIG-965:
-----------------------------------

Review comments: 

* The regex will always be on the rhs. So we don't need the code/classes which 
tries to determine which side has the regular expression based on which side 
has constant.

* in determineBestRegexMethod,  need to add "(?" to the list of regex strings 
not supported in dk.bricks (in javaRegexOnly) . It has special meanings in java 
regex, which is not honored by dk.brics .

* in determineBestRegexMethod,  We are dealing with cases like "\d" (choose 
java regex), "\\d" (choose dk.brics), but not dealing with "\\\d" (which should 
be choose java regex). ie we need to go back until we find a non '\' char.

* in RegexInit.compile(..), the following message is more appropriate at debug 
level, not at info . At info level, it might also confuse the user.
+                log.info("Got an IllegalArgumentException for Pattern: " + 
pattern );
+                log.info(e.getMessage());
+                log.info("Switching to java.util.regex" );

* The following comment in PORegex.java seems to be out of place . 
 // This is a BinaryComparisonOperator hence there can only be two inputs



> PERFORMANCE: optimize common case in matches (PORegex)
> ------------------------------------------------------
>
>                 Key: PIG-965
>                 URL: https://issues.apache.org/jira/browse/PIG-965
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ankit Modi
>         Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to