[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364923#comment-15364923
 ] 

Rohini Palaniswamy commented on PIG-3764:
-----------------------------------------

Looked into byte code generation this weekend while looking at how best to fix 
PIG-3000 instead of just rewriting the plan. 

On a very high level, 
   - the idea is to generate and compile the input plans of POForeach (nested 
as well) and POFilter into a single class with code all totally inlined and 
replace them with the new generated class in the plan using a separate 
Optimizer. 
   - Create variables with operator key names as much as possible for easy 
debugging. 
   - Provide a interface for UDFs to also provide simplified versions of the 
code to avoid wrapping in the tuple and DataBag and pass an array or ArrayList 
directly so that we can do tight loops. Inline that as well instead of method 
calls if possible
  - Add methods to existing operators that will be used to generate bytecode 
instead of adding new class like 
GeneratedPigExpression/GeneratedPigExpressionGenerator with methods and code 
for all operations. If that becomes more complicated, then will go with the 
separate classes idea in Julien's prototype.

Took a look at ByteBuddy which seemed to provide a good high level abstraction 
over ASM. But none of the other hadoop projects seemed to have used it. If it 
does not work out can use ASM directly.  Will try do a prototype later in the 
month.

[~julienledem],
   Thoughts?

> Compile physical operators to bytecode
> --------------------------------------
>
>                 Key: PIG-3764
>                 URL: https://issues.apache.org/jira/browse/PIG-3764
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Julien Le Dem
>              Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to