[ 
https://issues.apache.org/jira/browse/PIG-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254408#comment-13254408
 ] 

Ashutosh Chauhan commented on PIG-2653:
---------------------------------------

One technique which is used in some databases is they maintain some sort of 
(LRU) cache of md5 hash as well as serialized execution plan of scripts. So, 
when script is submitted, compute md5 hash and then look it up in cache. Skip 
the compilation phase if its found in cache. Obviously, there needs to be 
smartness to rip-off load/store locations and possibly other quoted strings 
before computing hash. Another thing to keep in mind here is that since Pig is 
used mostly as library and not as service, where do you keep this cache. One 
answer for that question is metastore.  

Just a high level idea, I am sure there are other details which needs to hashed 
out for this.
                
> Precompile option in PIG (Ability to store the plan for queries which are run 
> multiple times)
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2653
>                 URL: https://issues.apache.org/jira/browse/PIG-2653
>             Project: Pig
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Rajesh Balamohan
>
> Based on the size of the PIG script, it takes 1 or 2 minutes in certain cases 
> for PIG compiler to create the MR plan. If the same script has to be run 
> later point in time, it has to go through this process again. 
> It would be nice, if PIG can store the result (execution plan) which can be 
> reused when the same script is run again. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to