[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366453#comment-15366453 ] Rohini Palaniswamy commented on PIG-3764: - Thanks. I did already look at the prototype in this jira description and also came across brennus in DRILL-258. Was definitely going to look into brennus in detail before starting on a prototype. > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365633#comment-15365633 ] Julien Le Dem commented on PIG-3764: [~rohini] This sounds good. It does not have to be totally inlined since the JIT will inline method calls, you want to avoid virtual calls though. My prototype is still out there [1]. One thing it did not take into account is nulls. But I think this can be branch out separately (evaluate ignoring the nulls and then evaluate the is null) Generating asm directly can be unwieldy. That's why I had made Brennus [2] to factor out a lot of the logic (different operations per type, different stack frame size per type, all sorts of special cases) see proto. [1] 1: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan 2: https://github.com/julienledem/brennus > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365246#comment-15365246 ] Rohini Palaniswamy commented on PIG-3764: - Also need to include SIMD optimizations similar to HIVE-10179. For eg: HIVE-11533 uses SIMD optimized checks for >, < and = {code} // The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1" // The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1" // The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1" // The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1" // The SIMD optimized form of "a > b" is "(b - a) >>> 63" // The SIMD optimized form of "a > b" is "(b - a) >>> 63" // The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1" // The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1" // The SIMD optimized form of "a < b" is "(a - b) >>> 63" // The SIMD optimized form of "a < b" is "(a - b) >>> 63" // The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63" // The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63" // The SIMD optimized form of "a == b" is "(((a - b) ^ (b - a)) >>> 63) ^ 1" // The SIMD optimized form of "a >= b" is "((a - b) >>> 63) ^ 1" // The SIMD optimized form of "a > b" is "(b - a) >>> 63" // The SIMD optimized form of "a <= b" is "((b - a) >>> 63) ^ 1" // The SIMD optimized form of "a < b" is "(a - b) >>> 63" // The SIMD optimized form of "a != b" is "((a - b) ^ (b - a)) >>> 63" {code} > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364923#comment-15364923 ] Rohini Palaniswamy commented on PIG-3764: - Looked into byte code generation this weekend while looking at how best to fix PIG-3000 instead of just rewriting the plan. On a very high level, - the idea is to generate and compile the input plans of POForeach (nested as well) and POFilter into a single class with code all totally inlined and replace them with the new generated class in the plan using a separate Optimizer. - Create variables with operator key names as much as possible for easy debugging. - Provide a interface for UDFs to also provide simplified versions of the code to avoid wrapping in the tuple and DataBag and pass an array or ArrayList directly so that we can do tight loops. Inline that as well instead of method calls if possible - Add methods to existing operators that will be used to generate bytecode instead of adding new class like GeneratedPigExpression/GeneratedPigExpressionGenerator with methods and code for all operations. If that becomes more complicated, then will go with the separate classes idea in Julien's prototype. Took a look at ByteBuddy which seemed to provide a good high level abstraction over ASM. But none of the other hadoop projects seemed to have used it. If it does not work out can use ASM directly. Will try do a prototype later in the month. [~julienledem], Thoughts? > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920681#comment-13920681 ] Kyungho Jeon commented on PIG-3764: --- My name is Kyungho Jeon. I am interested in participating GSoC 2014 and this project looks very interesting to me. I looked at the prototype you have. The prototype looks to me that the goal is to translate (and compile) expressions in PhysicalPlan to Java code. Can you verify If I understand correctly? > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.2#6252)