[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-17 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769734#comment-13769734
 ] 

Cheolsoo Park commented on PIG-3419:


I will open a jira to add "unstable" annotations. I am also reviewing PIG-3457 
now.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-16 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768649#comment-13768649
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


+1 to marking the interfaces as evolving.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767615#comment-13767615
 ] 

Rohini Palaniswamy commented on PIG-3419:
-

Dmitriy,
   I am not arguing about adding interfaces to enable other frameworks. If you 
take the LoadFuncs case, a lot of design and work went into that 
(http://wiki.apache.org/pig/LoadStoreRedesignProposal) in coming up with and 
finalizing the interfaces and there were Loaders and Storers changed or written 
with the design of the new interfaces before the new interfaces were released. 
In this case we have added the interfaces without doing something like that and 
putting it in a release. That was my concern. 

As Bill suggests if we mark and agree that the new interfaces are unstable and 
bound to change and no backward compatibility will be provided, then I am good. 
Because when we actually get to Tez or Spork implementation (not POC), I am 
sure these are bound to change or more new methods are required.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-13 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767227#comment-13767227
 ] 

Bill Graham commented on PIG-3419:
--

Would should at least annotate the new interfaces as evolving so we don't need 
to evolve them in a backward compatible way just yet.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767220#comment-13767220
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


This is not just for Tez. The point is to enable POC work (in branches, forks, 
etc) and not have each such attempt redo all the work in this ticket. It's the 
same reason we provide things like pluggable LoadFuncs to let people work on 
things they want to load we didn't think of loading.

We should certainly work to stabilize 0.12 and fix issues like PIG-3457

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-13 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767190#comment-13767190
 ] 

Rohini Palaniswamy commented on PIG-3419:
-

I having second thoughts on having this patch in 0.12 in and wondering whether 
we should revert this and keep it only in Tez branch. Two reasons for that:
   * Seeing PIG-3457 which was my initial concern.
   * Changing interfaces to be backward compatible is very tricky and the 
workarounds are hacky or ugly. Faced that with PIG-3255. And this patch 
introduces lot of changes and new interfaces for the purpose of future work 
which is yet to take off from POC stages. The interfaces are bound to evolve 
when actual implementations are done or become different from what is in this 
patch if we end up finding cleaner abstractions. Putting something in a release 
which we are not very sure of does not seem like a good idea.

  Someone who wants to do experimental work can start off with tez branch since 
it is experimental work anyways. Basically I just want to keep experimentation 
code separate from production code since we are talking about releasing Pig 
0.12. Thoughts?

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764834#comment-13764834
 ] 

Daniel Dai commented on PIG-3419:
-

Opened PIG-3457 for that.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-09-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764811#comment-13764811
 ] 

Daniel Dai commented on PIG-3419:
-

This break the Oozie build, which uses PigStatsUtil.{HDFS_BYTES_WRITTEN, 
MAP_INPUT_RECORDS, MAP_OUTPUT_RECORDS, REDUCE_INPUT_RECORDS, 
REDUCE_OUTPUT_RECORDS}. We need to provide a backward compatible way.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754975#comment-13754975
 ] 

Julien Le Dem commented on PIG-3419:


+1
[~cheolsoo] LGTM!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754235#comment-13754235
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


[~billgraham] looping you in for Ambrose.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753985#comment-13753985
 ] 

Achal Soni commented on PIG-3419:
-

I agree with all that is said, but there is no need to rename HExecutionEngine 
back. It doesn't semantically make sense and I don't think that anybody was 
directly interacting it outside of the test cases?

Whatever changes to Ambrose and Lipstick should be communicated clearly also. I 
have noted some issues with PPNL before with regard to abstraction -- namely, 
Pig provides the MROperPlan to the listeners, which is not relevant in a 
differen execution engine. Julien suggested this should be fixed in a follow up 
patch. This will most certainly affect Ambrose and Lipstick so we should be 
cautious in that regard.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753912#comment-13753912
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo]: thanks a lot for looking into this.

Here are my thoughts:

1. let's change it back

2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution 
engine abstraction.

3.
JobStats still exists but the MR specific part is split into MRJobStats which 
extends JobStats
Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing 
but the MR specific part is abstracted out.
HExecutionEngine could be renamed back to what it was but this is again what is 
becoming the new abstraction.
Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts 
of Pig and look at the internals. This patch is a necessary change so that 
those tools can work independently of the execution engine in the future.
The changes to Ambrose and Lipstick should be minimal though with this patch. 
But yes they would suffer from some incompatibility, but again there is no way 
around it when a tool looks inside the execution engine internals.

I think we should revert 1. and commit the patch.



> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753878#comment-13753878
 ] 

Achal Soni commented on PIG-3419:
-

[~bikassaha] Thanks for the heads-up Bikas! This JIRA is not concerned with the 
Tez integration for Pig and is simply the abstraction in Pig to allow for 
alternate ExecutionEngines in Pig. But will certainly change this on the Tez 
integration side of stuff.

Thanks a lot [~cheolsoo] for continuing this! I think everything looks good 
from my end. I can certainly see why we may want to keep this on a different 
branch until everything is finalized. Certain things may still need more work. 
For example, OutputStats is not completed abstracted out, as it still has 
references to POStore which is a MR implementation construct. 
ScriptState/PPNL/JobStats may still need more abstraction (especially PPNL) and 
reworking to incorporate a new ExecutionEngine abstraction. I think what we 
have done here is the minimum foundation for an abstraction though, and it 
would be appropriate to put into trunk, but these are not my decisions to make. 

With regard to public methods that were changed, I don't think most of them are 
a big deal, besides as Cheolsoo said, the PigServer throwing PigException. I 
never thought IOException was a good exception to throw, but I think reverting 
PigServer back to IOException as it is userfacing code is not a big deal. The 
rest of the method signature changes shouldn't be worrisome because most of 
them are internal to the project. 

However, the change from JobStats to MRJobStats, while necessary (as each 
ExecutionEngine would have it's own type of JobStats it would present to the 
end user), could be problematic because it is userfacing code and would 
probably break people who were previously using JobStats. That I think is the 
most important thing to keep in mind. The task of making the PPNL and JobStats 
clearly tied to the ExecutionEngine should be thought through also.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753869#comment-13753869
 ] 

Bikas Saha commented on PIG-3419:
-

Looks like this jira wasnt the appropriate one to comment on. Is there a 
different umbrella jira for Pig on Tez that I can track and post comments on?

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753863#comment-13753863
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


Cheolsoo thanks so much for helping with this work!

I think #1 and #3 are the issues (#3 will affect Ambrose and probably 
Lipstick). 

We can take care of updating Ambrose if we need to. [~julienledem] do you think 
this is an important enough semantic change to force advanced clients such as 
Ambrose to rewrite / recompile? Or should we roll that part back?

[~bikassaha] thanks for the heads up, we'll need to update the pig-on-tez 
branch. Fortunately it doesn't affect this patch, since it's 
framework-independent and has no TEZ references.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753843#comment-13753843
 ] 

Bikas Saha commented on PIG-3419:
-

Folks, FYI, based on recent feedback we have changed the names used in some of 
the TEZ API's. It a simple refactoring on the Tez side and should be a simple 
refactoring fix on the Pig side too. Jira for reference. TEZ-410.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-28 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753241#comment-13753241
 ] 

Cheolsoo Park commented on PIG-3419:


All unit tests pass.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-28 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752785#comment-13752785
 ] 

Cheolsoo Park commented on PIG-3419:


[~rohini],
{quote}
It makes lot of changes to the Exceptions thrown, removes public methods etc 
and that might cause backward incompatibility during runtime with code compiled 
with previous versions of pig.
{quote}
I am listing all the backward incompatible changes made to public API. 
Hopefully, this helps us estimate the impact.
# PigServer constructor
{code}
-public PigServer(String execTypeString) throws ExecException, IOException {
-this(ExecType.fromString(execTypeString));
+public PigServer(String execTypeString) throws PigException {
+this(addExecTypeProperty(PropertiesUtil.loadDefaultProperties(), 
execTypeString));
+}
{code}
We can revert PigException back to EE and IOE for this constructor (and other 
new constructors). Not hard to fix.
# PigServer.explain()
{code}
 public void explain(String alias,
 String format,
 boolean verbose,
 boolean markAsExecute,
 PrintStream lps,
-PrintStream pps,
-PrintStream eps) throws IOException {
+PrintStream eps,
+File dir,
+String suffix) throws IOException {
{code} 
The method signature changed. Although it's possible that someone uses this 
method in their applications, there is another method that wraps this one (i.e. 
{{public void explain(String alias, PrintStream stream)}}), and that one is 
more likely to be used.
# Name changes of several public classes:
** JobStats to MRJobStats
** PigStatsUtil to MRPigStatsUtil
** ScriptState to MRPScriptState
** HExecutionEngine to MRExecutionEngine
# PigContext.getExecutionEngine()
{code}
-public HExecutionEngine getExecutionEngine() {
+public ExecutionEngine getExecutionEngine() {
{code}
Result of the class name change. 
# SimplePigStats class is moved from {{org.apache.pig.tools.pigstats}} to 
{{org.apache.pig.tools.pigstats.mapreduce}}.
# Launcher class is moved from 
{{org.apache.pig.backend.hadoop.executionengine.mapReduceLayer}} to 
{{org.apache.pig.backend.hadoop.executionengine}}.
# JobControlCompiler constructor
{code}
-public JobControlCompiler(PigContext pigContext, Configuration conf) 
throws IOException {
+public JobControlCompiler(PigContext pigContext, Configuration conf) {
{code}
The IOE was redundant in the first place. So we should remove it.

Here is my estimation:
# Major - But we can fix it.
# Minor - Unlikely used in user code.
# Minor - Unlikely used in user code.
# Minor - Unlikely used in user code.
# Minor - Unlikely used in user code.
# Minor - Unlikely used in user code.
# Minor - Unlikely used in user code.

As long as we fix #1, I think we can go ahead commit the patch to trunk. What 
do you think? 

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
>

[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-27 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751788#comment-13751788
 ] 

Cheolsoo Park commented on PIG-3419:


Kicked off full unit tests now.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-26 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750352#comment-13750352
 ] 

Olga Natkovich commented on PIG-3419:
-

I don't have a very strong opinion on this so whatever you guys decide is fine 
with me.

I think if it does evolve as part of Tez, at least some changes are likely to 
sneak into Tez branch without going to trunk so they might diverge for a while 
but if we are ok to take that chance, that's fine

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749226#comment-13749226
 ] 

Julien Le Dem commented on PIG-3419:


The advantage of having the Execution engine abstraction in trunk is it allows 
running experimental Pig execution engines implementations like Tez or Spark on 
an official release of Pig without having to build from a specific branch.
The execution engine implementations themselves are fairly independent of Pig 
and do not need to  be maintained in a Pig branch.
If the ExecutionEngine abstraction evolves over time that can be done in Trunk 
and can be merged independently of the Tez implementation itself.


> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749190#comment-13749190
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


Olga, first commit to the spork branch is from *2012*.

https://github.com/dvryaboy/pig  (the default branch on my github is "spork").



> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749106#comment-13749106
 ] 

Olga Natkovich commented on PIG-3419:
-

I think the reason we wanted it on the Tez branch is that it might evolve with 
Tez implementation and so we would merge the updated code back when Tez is 
ready. Since there are no plans for any additional backend, is there a need to 
apply this to trunk sooner rather than later?

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749014#comment-13749014
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


Rohini, I want to reiterate that this patch has NO tez dependencies (if it 
does, that's a bug). The intention is not to make Tez possible. It's to make 
pluggable execution engines possible; and I do not want that functionality to 
be tied to a tez branch that will be unstable and in heavy development for the 
foreseeable future. This work will be immediately useful for the Spork (pig on 
spark) branch, for example.

Also, it allows people to work with new runtimes *without modifying Pig*. So 
Tez-on-Pig doesn't even have to be done as a branch of this project, someone 
can go an experiment completely independently.

For these reasons, I would like it in trunk.

You make a great point about the danger of changing exceptions, public methods, 
etc. I believe that most of these are project-public, and annotated as such. Do 
you have specific methods you are concerned about? Ideally we would change as 
little as possible for the end user.

Dmitriy

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748907#comment-13748907
 ] 

Rohini Palaniswamy commented on PIG-3419:
-

In the Pig-on-Tez meeting in Linkedin we decided to do Tez work on a branch and 
that Cheolsoo will initiate conversation thread on mailing list for it and take 
up the task of creating the branch. Tez is relatively new and unstable so it 
will be wise to not start with code directly on trunk. Hive is also doing their 
Tez work on a branch. 

  Cheolsoo had a question as to whether we should commit this to trunk and 
branch after that. I would prefer PIG-3419 to be also put in the branch and not 
checked into trunk. It makes lot of changes to the Exceptions thrown, removes 
public methods etc and that might cause backward incompatibility during runtime 
with code compiled with previous versions of pig. All that needs to be figured 
out and fixed. So might not be a good idea to get this patch directly into 
trunk. Thoughts?

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748873#comment-13748873
 ] 

Achal Soni commented on PIG-3419:
-

[~cheolsoo] Thanks a lot for running the test suite! It's good to see where the 
patch is failing. I definitely agree that all of these need to be investigated 
before the patch gets anywhere.

I have some ideas about a few of the test cases, looks to be some minor stuff 
with JobStats and the way Explain works now which I have to look into. The rest 
I can't really think of off hte top of my head but I'll give it a shot. 

I'll report back with some more findings as soon as possible.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748657#comment-13748657
 ] 

Cheolsoo Park commented on PIG-3419:


All, so here is the list of failing tests:
{code}
org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine
org.apache.pig.test.TestGrunt.testCheckScriptSyntaxWithSemiColonUDFErr
org.apache.pig.test.TestGrunt.testExplainDot
org.apache.pig.test.TestGrunt.testExplainOut
org.apache.pig.test.TestGrunt.testExplainBrief
org.apache.pig.test.TestGrunt.testExplainEmpty
org.apache.pig.test.TestGrunt.testExplainScript
org.apache.pig.test.TestInputOutputMiniClusterFileValidator.testValidationNeg
org.apache.pig.test.TestJobStats.testOneTaskReport
org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage1
org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage2
org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage3
org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage4
org.apache.pig.test.TestJobStats.testMedianMapReduceTime
org.apache.pig.test.TestJobStats.testGetOuputSizeUsingFileBasedStorage
org.apache.pig.test.TestMRExecutionEngine.testJobConfGeneration
org.apache.pig.test.TestMRExecutionEngine.testJobConfGenerationWithUserConfigs
org.apache.pig.test.TestMacroExpansion.test20
org.apache.pig.test.TestMacroExpansion.test21
org.apache.pig.test.TestMacroExpansion.test22
org.apache.pig.test.TestMacroExpansion.test23
org.apache.pig.test.TestMacroExpansion.test32
org.apache.pig.test.TestMacroExpansion.test33
org.apache.pig.test.TestMacroExpansion.test34
org.apache.pig.test.TestMacroExpansion.test35
org.apache.pig.test.TestMacroExpansion.testCommentInMacro
org.apache.pig.test.TestMacroExpansion.testNegativeNumber
org.apache.pig.test.TestMacroExpansion.typecastTest
org.apache.pig.test.TestMacroExpansion.testFilter
org.apache.pig.test.TestMapSideCogroup.testFailure2
org.apache.pig.test.TestMergeJoinOuter.testFailure
org.apache.pig.test.TestPigRunner.testEmptyFile
org.apache.pig.test.TestScriptLanguage.testSysArguments
org.apache.pig.test.TestShortcuts.testExplainShortcutNoAlias
org.apache.pig.test.TestShortcuts.testExplainShortcutNoAliasDefined
{code}
I prefer fixing them beforehand to fixing them afterward. Although none of 
these failures is serious (I believe), can we have a couple of more days before 
committing Achal's patch? I will make sure it gets committed into trunk because 
I definitely need it for a Tez branch.

Thoughts?

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748183#comment-13748183
 ] 

Julien Le Dem commented on PIG-3419:


+1 LGTM
If test-commit passes I think we can commit to TRUNK

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748137#comment-13748137
 ] 

Cheolsoo Park commented on PIG-3419:


I will kick off the unit tests with the new patch now. :-)
{quote}
if you can tell me what exactly I should be doing for indentation/in what files 
that'd be great.
{quote}
This is not a big deal. Basically, you can run the following command to replace 
tab chars in your patch:
{code}
sed -i .orig '/^+/,/$/ s//<4 whitespaces>/g' 
updated-8-22-2013-exec-engine.patch
{code}
Then, the modified patch can be applied with "-l" option 
(\-\-ignore-whitespace): 
{code}
patch -l < updated-8-22-2013-exec-engine.patch
{code}

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747968#comment-13747968
 ] 

Achal Soni commented on PIG-3419:
-

Here is the ReviewBoard for the new patch : 

https://reviews.apache.org/r/13752

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747966#comment-13747966
 ] 

Achal Soni commented on PIG-3419:
-

Hi all, 

I've reuploaded the patch with all of the changes that Julien suggested as well 
as accounting for the comments from Mark. The test cases should be ok now 
(hopefully!) as we changed build.xml to package the META-INF folder and I 
changed the compilePp() issue. 

[~cheolsoo] Can you give this a look when you have time, and run the test 
suite? I think everything should be fine now. Also if you can tell me what 
exactly I should be doing for indentation/in what files that'd be great. I seem 
to have some problems with the whitespace/indentation aspect so some pointers 
here would be awesome. 

Let me know if anything else seems wrong. 

Achal

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747564#comment-13747564
 ] 

Cheolsoo Park commented on PIG-3419:


Attached test_failures.txt.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, 
> test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747131#comment-13747131
 ] 

Mark Wagner commented on PIG-3419:
--

I'd also be in favor putting this in trunk as opposed to a Tez branch. Although 
the motivation for this is Tez, I think we would want this patch in Pig even if 
there wasn't Tez support.

A couple short comments for Achal: 
* It looks like the build targets that include the META-INF are only executed 
when building against hadoopversion=23. The META-INF also don't seem to be 
included in the pig.jar and pig-withouthadoop.jar that go in the root 
directory. I tried copying in the correct jars, but it seems like something is 
still off.
* The changes to the try/catch blocks in MapReduceLauncher break on 23, because 
HadoopShims for 23 doesn't throw an exception where 20 does. Maybe that should 
be fixed in HadoopShims though.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747064#comment-13747064
 ] 

Julien Le Dem commented on PIG-3419:


The point is to be able to implement alternate execution engines without having 
to fork Pig.
I think it should go in trunk.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747065#comment-13747065
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


I'd like this patch in trunk since it's not Tez-specific, and allows people to 
experiment with other runtimes (for example, Spark or Drill).

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747021#comment-13747021
 ] 

Cheolsoo Park commented on PIG-3419:


[~julienledem], I haven't looked at it yet, but I will review it tonight. I 
will also run full unit tests.

Btw, I was meeting Mark, Olga, Rohini, and Daniel at LinkedIn this morning. We 
decided to create a tez branch. Rohini suggested that this patch should go into 
that branch instead of trunk. Can we agree where we should commit this patch 
first? Personally, I think this can go into trunk directly since it's quite 
general. But there were some concerns.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746999#comment-13746999
 ] 

Julien Le Dem commented on PIG-3419:


I have submitted my review. This looks great [~achalsoni81]!
[~cheolsoo] does it look good to you?
Once Achal has updated his patch I'm willing to commit.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746842#comment-13746842
 ] 

Achal Soni commented on PIG-3419:
-

I have taken all the suggestions into account and regenerated a new patch that 
is hopefully cleaner, smaller, and reflects most of the suggestions. The patch 
is attached and the review board is the following:

https://reviews.apache.org/r/13714/



> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745448#comment-13745448
 ] 

Cheolsoo Park commented on PIG-3419:


[~julienledem],
{quote}
Do we really throw Exception?
{quote}
No, we don't throw Exception to the end user. But currently, PigServer catches 
them all in a single catch block and sort them out using instanceof calls (see 
below). Probably we should make ExecutionEngine throw FEE, EE, and IOE and 
replace instanceof calls with catch blocks in PigServer.
{code}
try {
stats = pigContext.getExecutionEngine().launchPig(lp, jobName, pigContext);
} catch (Exception e) {
// There are a lot of exceptions thrown by the launcher.  If this
// is an ExecException, just let it through.  Else wrap it.
if (e instanceof ExecException){
throw (ExecException)e;
} else if (e instanceof FrontendException) {
throw (FrontendException)e;
} else {
int errCode = 2043;
String msg = "Unexpected error during execution.";
throw new ExecException(msg, errCode, PigException.BUG, e);
}
}
{code}

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745398#comment-13745398
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo] 
1. Do we really throw Exception ? If yes, then let's just throw that. If not 
then let's instead have FrontEndException, ExecException, IOException. i.e. 
let's remove the exceptions that are already included by the highest exception 
level.
2. agreed with you. I would expect the execution engine to handle the 
Properties internally and the signature of this method to be:
{noformat}
public void setProperty(String property, String value);
{noformat}

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745227#comment-13745227
 ] 

Cheolsoo Park commented on PIG-3419:


I have two more comments in ExecutionEngine and MRExecutionEngine as follows:
# Can you simplify the checked exceptions in the ExecutionEngine interface? For 
example,
From:
{code}
public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc)
throws PlanException, VisitorException, IOException, ExecException,
JobCreationException, FrontendException, Exception;
{code}
To:
{code}
public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc) throws 
Exception;
{code}
Looks like there's no point of throwing them again in ExecutionEngine because 
they will be caught as Exception in PigServer anyway. If needed, we should take 
specific actions per exception in ExecutionEngine.
# As for the setProperty method in ExecutionEngine, do we need to pass a 
properties? Can we construct a properties with the given key/value pair and 
call recomputeProperties() internally?
{code}
public void setProperty(Properties properties, String property, String value);
{code}
Also, as for the setProperty method in MRExecutionEngine, looks like it's 
mostly duplicate of recomputeProperties(). Can you just reuse 
recomputeProperties()?

Julien said you're working on a new patch. It would be nice if you could 
incorporate these (of course if you agree with me). Thank you a lot!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-19 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744218#comment-13744218
 ] 

Cheolsoo Park commented on PIG-3419:


[~achalsoni81], thank you very much for the great work! Although I am not a Pig 
old-timer, I have been playing with your patch and have a few high-level 
comments as follows:
# I love your ExecType interface. What I don't like is that there are two 
things called ExecType:
{code}
org.apache.pig.ExecType
org.apache.pig.backend.executionengine.ExecType
{code}
Since you're introducing the ServiceLoader framework, the enum seems no longer 
needed at all. Furthermore, eliminating it helps simplify the constructor code 
of PigContext and PigServer.
# Currently, it is a bit hard to review your patch because there are too many 
changes. To get it committed faster, I suggest we should avoid unnecessary 
changes and minimize its scope. For example, having the PigServer constructor 
signature unchanged helps avoid a lot of changes in Test*.java files.
# Julien already pointed out this in the RB, but your patch accidentally 
reverts a couple of previous commits. I took them out.

I went ahead and made these changes myself - 
[here|https://github.com/piaozhexiu/apache-pig/commit/9aee192116816ecf993d69a8c2f94abaff7f9739].
 If everyone thinks this is a step forward, I will upload it in a new patch. 
Please let me know.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-14 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740079#comment-13740079
 ] 

Achal Soni commented on PIG-3419:
-

And here are the reviewboards for each one. 

Main ExecEngine changes: 
http://reviews.apache.org/r/13575/

MR ExecEngine changes:
http://reviews.apache.org/r/13576/

ScriptState/Statistics changes: 
http://reviews.apache.org/r/13577/

Testing Suite changes:
http://reviews.apache.org/r/13579/

- Achal

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738881#comment-13738881
 ] 

Achal Soni commented on PIG-3419:
-

Sorry my bad! Was trying to get it out as soon as possible that I brushed over 
some stuff too fast. 

[~dvryaboy] I will make these changes later today and post them up. I did 
actually get around the -y argument, just totally forgot to go back and get rid 
of that. In the meantime, the Review Board is located here for the current 
patch: 

https://reviews.apache.org/r/13541/

Once I update the patch later today I will post it here with the ReviewBoards 
as well. 

 - Achal

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738478#comment-13738478
 ] 

Julien Le Dem commented on PIG-3419:


Hi Achal
for large patches, please create a review here: https://reviews.apache.org

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738288#comment-13738288
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


oh 3 more things :)
I thought you found your way around the -y argument? I still see that in there.
Don't comment out blocks of code, just delete them
Add some documentation about creating new Exec Engines to the xml-based docs, 
or at least post it here. Just having it in javadocs is not sufficient.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738285#comment-13738285
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


Hi Achal,
That's a large patch.
Can you give us a roadmap for reading it -- what are the changes, at a high 
level? It looks like you had to change a bunch of stuff that's not (at first 
glance) directly related to exec mode.

Procedurally:
- please generate the patch using 'git diff -no-prefix' since the apache pig 
master is on svn
- please post the complete patch to Review Board, for ease of commenting
- please make sure that all new files have the apache license headers at the top

Thanks
-D

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-12 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737092#comment-13737092
 ] 

Achal Soni commented on PIG-3419:
-

Here is the current patch as is, with the changes to /src and /test. Please let 
me know if there are any questions or other requests. I expect these changes 
will incite some good discussion about how to achieve a pluggable execution 
engine!




> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira