[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-965: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to trunk. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Fix For: 0.8.0 Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-965: -- Status: Patch Available (was: Reopened) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Fix For: 0.8.0 Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-965: -- Fix Version/s: 0.8.0 Ankit is right, the patch is not present in trunk. I will apply it to trunk. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Fix For: 0.8.0 Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Assignee: Benjamin Francisoud (was: Ankit Modi) Status: Patch Available (was: Open) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Benjamin Francisoud Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-965: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed. Thanks Ankit for contributing and Thejas for reviewing! PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) I have included changes suggested by Thejas. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: automaton.jar poregex2.patch New patch with removed comments and added automaton.jar from http://www.brics.dk/~amoeller/automaton/automaton.jar. It fails findBugs due to missing symbols. I ran the findBugs after adding the jar to the build and it did not complain about any findBugs in the modified and added files. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) One small change to JarManager.java is missing. Will add a new patch with it. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: automaton.jar) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: automaton.jar poregex2.patch PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: automaton.jar, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch Attaching one more file of patch. This one has all the changes, except changes to build.xml. Still trying to find a maven repo for dk.brics.automaton. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: poregex.patch, poregex2.patch, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch poregex.patch These are patches for two implementations One (poregex.patch) is an implementation applying optimization mentioned above in the JIRA. Second (poregex2.patch) implementation applies optimization 1 and uses dk.brics.automaton for running simple regular expressions. Otherwise it reverts back to java.util.regex. In 1 the decision to use optimization two or use java.util.regex is decided by getSimpleString method In 2 the decision to use dk.brics.automaton is done by determineBestRegexMethod. ( changes to build.xml is this patch are temporary ) Both patches use RegexInit as an implementation which makes a decision ( calling the above mentioned decision functions ) and then sets the implementation to one decided by the decision function. In second patch, the decision function was created looking at the support of operators in dk.brics.automaton and its grammar. I tried out the classes supported and not supported in dk.brics.automaton and decided upon it. I could not find any specific page mentioning the difference between regex language java.util.regex and dk.brics.automaton. PERFORMANCE: optimize common case in matches (PORegex) -- Key: PIG-965 URL: https://issues.apache.org/jira/browse/PIG-965 Project: Pig Issue Type: Improvement Components: impl Reporter: Thejas M Nair Assignee: Ankit Modi Attachments: poregex.patch, poregex2.patch Some frequently seen use cases of 'matches' comparison operator have follow properties - 1. The rhs is a constant string . eg c1 matches 'abc%' 2. Regexes such that look for matching prefix , suffix etc are very common. eg - abc%', %abc, '%abc%' To optimize for these common cases , PORegex.java can be changed to - 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed. 2. Use string comparisons for simple common regexes (in 2 above). The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.