[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-07 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366453#comment-15366453
 ] 

Rohini Palaniswamy commented on PIG-3764:
-

Thanks. I did already look at the prototype in this jira description and also 
came across brennus in DRILL-258.  Was definitely going to look into brennus in 
detail before starting on a prototype. 

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365633#comment-15365633
 ] 

Julien Le Dem commented on PIG-3764:


[~rohini] This sounds good. It does not have to be totally inlined since the 
JIT will inline method calls, you want to avoid virtual calls though. My 
prototype is still out there [1]. One thing it did not take into account is 
nulls. But I think this can be branch out separately (evaluate ignoring the 
nulls and then evaluate the is null)
Generating asm directly can be unwieldy. That's why I had made Brennus [2] to 
factor out a lot of the logic (different operations per type, different stack 
frame size per type, all sorts of special cases) see proto. [1]

1: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
2: https://github.com/julienledem/brennus

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365246#comment-15365246
 ] 

Rohini Palaniswamy commented on PIG-3764:
-

Also need to include SIMD optimizations similar to  HIVE-10179.  For eg: 
HIVE-11533 uses SIMD optimized checks for >, < and =

{code}
// The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1"
// The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1"
// The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1"
// The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1"
// The SIMD optimized form of "a > b" is "(b - a) >>> 63"
// The SIMD optimized form of "a > b" is "(b - a) >>> 63"
// The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1"
// The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1"
// The SIMD optimized form of "a < b" is "(a - b) >>> 63"
// The SIMD optimized form of "a < b" is "(a - b) >>> 63"
// The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63"
// The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63"
// The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1"
// The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1"
// The SIMD optimized form of "a > b" is "(b - a) >>> 63"
// The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1"
// The SIMD optimized form of "a < b" is "(a - b) >>> 63"
// The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63"
{code}

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364923#comment-15364923
 ] 

Rohini Palaniswamy commented on PIG-3764:
-

Looked into byte code generation this weekend while looking at how best to fix 
PIG-3000 instead of just rewriting the plan. 

On a very high level, 
   - the idea is to generate and compile the input plans of POForeach (nested 
as well) and POFilter into a single class with code all totally inlined and 
replace them with the new generated class in the plan using a separate 
Optimizer. 
   - Create variables with operator key names as much as possible for easy 
debugging. 
   - Provide a interface for UDFs to also provide simplified versions of the 
code to avoid wrapping in the tuple and DataBag and pass an array or ArrayList 
directly so that we can do tight loops. Inline that as well instead of method 
calls if possible
  - Add methods to existing operators that will be used to generate bytecode 
instead of adding new class like 
GeneratedPigExpression/GeneratedPigExpressionGenerator with methods and code 
for all operations. If that becomes more complicated, then will go with the 
separate classes idea in Julien's prototype.

Took a look at ByteBuddy which seemed to provide a good high level abstraction 
over ASM. But none of the other hadoop projects seemed to have used it. If it 
does not work out can use ASM directly.  Will try do a prototype later in the 
month.

[~julienledem],
   Thoughts?

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2014-03-05 Thread Kyungho Jeon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920681#comment-13920681
 ] 

Kyungho Jeon commented on PIG-3764:
---

My name is Kyungho Jeon. I am interested in participating GSoC 2014 and this 
project looks very interesting to me. I looked at the prototype you have. The 
prototype looks to me that the goal is to translate (and compile) expressions 
in PhysicalPlan to Java code. Can you verify If I understand correctly? 



> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)