[ 
https://issues.apache.org/jira/browse/PIG-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eyal Allweil updated PIG-4803:
------------------------------
    Attachment: PIG-4803.patch

This patch gives REPLACE a similar compiled pattern caching strategy to that of 
REGEX_EXTRACT and REGEX_EXTRACT_ALL.

Because the existing implementation swallows exceptions with a warning, I did 
the same in my implementation (although they are different, more informative 
warnings).

I changed TestStringUDFs.TestReplace slightly to cover the new code.

> Improve performance of regex-based builtin functions
> ----------------------------------------------------
>
>                 Key: PIG-4803
>                 URL: https://issues.apache.org/jira/browse/PIG-4803
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Eyal Allweil
>            Assignee: Eyal Allweil
>         Attachments: PIG-4803.patch
>
>
> There are three strategies used by Pig's regex-based built in functions.
> 1) REPLACE doesn't do any pattern caching.
> 2) REGEX_EXTRACT and REGEX_EXTRACT_ALL attempt to cache a single pattern as 
> an instance variable.
> 3) PluckTuple attempts to cache a single pattern statically. (doesn't this 
> cause problems if two clashing defines for different PluckTuples are used?)
> I have a little fix and a medium fix in mind. The little fix is to give 
> REPLACE a similar caching strategy, and to fix PluckTuple, if the static 
> nature of the pattern is indeed a problem.
> The medium fix is to make all four functions take an additional constructor 
> with a constant regex (and therefore one less argument in evaluation) and use 
> that if it exists. This would be backwards compatible, should barely (or not) 
> affect the performance of the existing code path, but I think that in cases 
> where there are two clashing usages of the functions in the same 
> foreach..generate it would allow the pattern caching to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to