[ 
https://issues.apache.org/jira/browse/PIG-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551817
 ] 

Antonio Magnaghi commented on PIG-32:
-------------------------------------

Attaching to the bug this high level summary that I sent out to the mailing 
list few days back.

Have discussed this with Ben, one aspect we talked about was to estend the API 
to provide a way to collect logging and debugging information.

________________________________________
From: Antonio Magnaghi 
Sent: Monday, December 10, 2007 9:29 AM
To: '[email protected]'
Subject: Abstraction layer: execution engine (PIG-32)

I'm starting to work on the portion of the abstraction layer about the 
execution engine for the separation of front-end from back-end. 

Based on some previous discussions with various folks, including Trevor 
Strohman from the Galago project, I think it is possible to identify some 
requirements/changes that I've summarize below (in addition to what is 
currently posted at: http://wiki.apache.org/pig/PigAbstractionLayer.)

I would like to get some feedback on these points and whether I have left out 
aspects that'd need to be considered as well.

Thanks,
-a.


Front-End:
Change logical plan representation: goal is to change the representation of 
logical plans so that: 
•       details pertaining to the physical query plan execution are not present 
anymore in the front-end; 
•       a new logical plan submitted to the back-end can reference a portion 
(or alias) of another logical plan

Aspects affected by the changes above are:
1.      need to remove data collectors and logic to manage data-pipes from the 
eval specs and cond's of logical operators. These data structures are used in 
the case of the local execution mode. We can add physical eval specs and cond's 
where data pipes and data collectors are set up. This has the disadvantage of 
creating extra code (similar to the code for logical eval specs and logical 
cond's), but the overall separation of the logical aspects from the physical 
execution should be much cleaner.
2.      need to remove the table of query results, where aliases are mapped to 
intermediate results. This data structure is populated when the logical plan is 
compiled. The concept of intermediate results does not seem to belong in the 
front-end. (Information about the generation of intermediate results will be 
maintained in the back-end)
3.      extend representation of logical operators assigning to them a scope 
and a unique id within the scope. The motivation for doing this would be that 
new logical plans submitted to the back end can reference previous logical 
plans (or parts of it) via a (scope id, node id) pair. Having the concept of 
scope can provide support in the back-end for purging information about 
entities that go out of scope. For instance, the session id could be used as 
scope to garbage collect entities in the back-end no longer needed.
4.      need to add a catalog that maps aliases to logical trees. For instance, 
when a store operation is encountered, the front-end can determine the set of 
dependent logical trees to serialize and send to the back-end or (scope, id) of 
previous plans to reference. 
5.      Serialization process from the front-end to the back-end can produce a 
representation of the logical plan and its dependencies that include (scope, 
id) of each operators to send to the back end.

Back-End:
1.      back-end would maintain table of intermediate results
2.      compilation of logical plan to physical plan would take place in the 
back-end
3.      a local back-end would generate physical trees using the physical eval 
specs and physical cond's (as described above)
4.      a Hadoop back-end would compile logical plan to map/reduce




> Abstraction Layer to decouple Pig from Back-End
> -----------------------------------------------
>
>                 Key: PIG-32
>                 URL: https://issues.apache.org/jira/browse/PIG-32
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Antonio Magnaghi
>            Assignee: Antonio Magnaghi
>         Attachments: DataStorage.diff, DataStorage20071212.diff
>
>
> I'm opening a new issue to track the development work to support an 
> abstraction layer for Pig as defined at 
> http://wiki.apache.org/pig/PigAbstractionLayer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to