[jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen

Jonathan Coveney (Commented) (JIRA) Fri, 06 Apr 2012 15:00:39 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248910#comment-13248910
 ]


Jonathan Coveney commented on PIG-2632:
---------------------------------------

Scott, I'm cool doing the heavy lifting to get out 1.0, and comments like this 
are greatly appreciated. We talked about using Avro as well...I think it's a 
good idea. I think it's probably a good idea for the next gen of this patch?

I'm more worried in the short term about your point about requiring the JDK... 
this has been brought up before. Do you think that would be prohibitive for 
people? I guess it could be an option people could set... but that seems 
annoying. Going the route of bytecode generation is probably where it should 
go, but...oy vey. I think that Avro+Bytecode would be a really awesome version 
2.0.

As far as the classloader issue, this sort of thing is where I am not as 
strong. Do you know of any good resources to read up on these sorts of issues?
                
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
>                 Key: PIG-2632
>                 URL: https://issues.apache.org/jira/browse/PIG-2632
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2632-0.patch, PIG-2632-1.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing 
> the Schema on the frontend, we can code generate Tuples which can be used for 
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, 
> and it's ~15% smaller serialized (heavily heavily depends on the data, 
> though). Need to do get/set tests, but assuming that it's on par (or even 
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema 
> given to UDF's. The next step is to make a SchemaBag, where I think the 
> serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen

Reply via email to