[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753878#comment-13753878
 ] 

Achal Soni commented on PIG-3419:
---------------------------------

[~bikassaha] Thanks for the heads-up Bikas! This JIRA is not concerned with the 
Tez integration for Pig and is simply the abstraction in Pig to allow for 
alternate ExecutionEngines in Pig. But will certainly change this on the Tez 
integration side of stuff.

Thanks a lot [~cheolsoo] for continuing this! I think everything looks good 
from my end. I can certainly see why we may want to keep this on a different 
branch until everything is finalized. Certain things may still need more work. 
For example, OutputStats is not completed abstracted out, as it still has 
references to POStore which is a MR implementation construct. 
ScriptState/PPNL/JobStats may still need more abstraction (especially PPNL) and 
reworking to incorporate a new ExecutionEngine abstraction. I think what we 
have done here is the minimum foundation for an abstraction though, and it 
would be appropriate to put into trunk, but these are not my decisions to make. 

With regard to public methods that were changed, I don't think most of them are 
a big deal, besides as Cheolsoo said, the PigServer throwing PigException. I 
never thought IOException was a good exception to throw, but I think reverting 
PigServer back to IOException as it is userfacing code is not a big deal. The 
rest of the method signature changes shouldn't be worrisome because most of 
them are internal to the project. 

However, the change from JobStats to MRJobStats, while necessary (as each 
ExecutionEngine would have it's own type of JobStats it would present to the 
end user), could be problematic because it is userfacing code and would 
probably break people who were previously using JobStats. That I think is the 
most important thing to keep in mind. The task of making the PPNL and JobStats 
clearly tied to the ExecutionEngine should be thought through also.
                
> Pluggable Execution Engine 
> ---------------------------
>
>                 Key: PIG-3419
>                 URL: https://issues.apache.org/jira/browse/PIG-3419
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.12
>            Reporter: Achal Soni
>            Assignee: Achal Soni
>            Priority: Minor
>         Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to