[ 
https://issues.apache.org/jira/browse/PIG-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250854#comment-13250854
 ] 

Jonathan Coveney commented on PIG-2643:
---------------------------------------

Ah, ok. Well, first, since we have a more performance, less verbose syntax 
(mainly not having to say "InvokeForLong" vs "InvokeForString" and so on), I 
think it's worth including it because it IS faster and cleaner, though I agree 
that the focus now should be on filling a niche that doesn't currently exist. 
As I said before, the work so far was in a big part to be a small project to 
begin to work with ASM with, and benefit pig on the side.

I do like the idea of potentially supporting math function syntax, and then 
behind the scenes generating the scaffolding. I like that idea a lot. Will mull 
on how it'd be implemented. Perhaps a first pass would be to support a MATH 
keyword where if you do MATH.operator(stuff, stuff) it generates the 
scaffolding, and then it can get more generic? I don't really know how to do 
this without adding keywords... hmm hmm hmm. Would love thoughts in that vein.

But what you bring up is an interesting use case...sort of the generation of 
UDF's based on some class that exists. What your proposing sounds like we could 
generate an accumulator UDF that would apply to any case where you have an 
object that you instantiate on the mapper, then stream everything through and 
return. Ideally we'd provide an interface that objects could implement that 
would serve as the bridge. Perhaps something like

{code}
public Object eval(Object... o) throws IOException;
{code}

that way they don't even have to depend on pig in the object?
                
> Use bytecode generation to make a performance replacement for InvokeForLong, 
> InvokeForString, etc
> -------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2643
>                 URL: https://issues.apache.org/jira/browse/PIG-2643
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>              Labels: codegen
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2643-0.patch
>
>
> This is basically to cut my teeth for much more ambitious code generation 
> down the line, but I think it could be performance and useful.
> the new syntax is:
> {code}a = load 'thing' as (x:chararray);
> define concat InvokerGenerator('java.lang.String','concat','String');
> define valueOf InvokerGenerator('java.lang.Integer','valueOf','String');
> define valueOfRadix 
> InvokerGenerator('java.lang.Integer','valueOf','String,int');
> b = foreach a generate x, valueOf(x) as vOf;
> c = foreach b generate x, vOf, valueOfRadix(x, 16) as vOfR;
> d = foreach c generate x, vOf, vOfR, concat(concat(x, (chararray)vOf), 
> (chararray)vOfR);
> dump d;
> {code}
> There are some differences between this version and Dmitriy's implementation:
> - it is no longer necessary to declare whether the method is static or not. 
> This is gleaned via reflection.
> - as per the above, it is no longer necessary to make the first argument be 
> the type of the object to invoke the method on. If it is not a static method, 
> then the type will implicitly be the type you need. So in the case of concat, 
> it would need to be passed a tuple of two inputs: one for the method to be 
> called against (as it is not static), and then the 'string' that was 
> specified. In the case of valueOf, because it IS static, then the 'String' is 
> the only value.
> - The arguments are type sensitive. Integer means the Object Integer, whereas 
> int (or long, or float, or boolean, etc) refer to the primitive. This is 
> necessary to properly reflect the arguments. Values passed in WILL, however, 
> be properly unboxed as necessary.
> - The return type will be reflected.
> This uses the ASM API to generate the bytecode, and then a custom classloader 
> to load it in. I will add caching of the generated code based on the input 
> strings, etc, but I wanted to get eyes and opinions on this. I also need to 
> benchmark, but it should be native speed (excluding a little startup time to 
> make the bytecode, but ASM is really fast).
> Another nice benefit is that this bypasses the need for the JDK, though it 
> adds a dependency on ASM (which is a super tiny dependency).
> Patch incoming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to