[ 
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249120#comment-13249120
 ] 

Scott Carey commented on PIG-2632:
----------------------------------

I completely agree, for a 1.0 version, it doesn't need to be super slick.  

This is a big move in the right direction for making Pig significantly more 
efficient, and this and Dmitry's prior work have exposed some warts that make 
these more difficult.  I really think the best way to move this forward is to 
get something out there that just works (although there will be a need to be 
able to turn it off).  We do want to make sure that it is as hidden as possible 
so that the implementation details can change going forward.

I am hoping to have time to add some features to Avro that make using it for 
these type of use cases easier.  For example, Avro could operate directly on 
Pig Schemas without explicit translation if it had some rules on how to 
interpret them as a schema, and a framework for understanding such rules.

The JRE/JDK thing is not a big deal for me at all, but is one of those things 
that tends to get someone somewhere complaining.  If the feature can be turned 
off easily, then that may be good enough.

(aside)
If anyone has time to learn how to use ASM and dynamically generate classes for 
use cases like this, I'm sure it would be useful to them in getting a job in 
the future.  Supply/demand and all that :)
                
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
>                 Key: PIG-2632
>                 URL: https://issues.apache.org/jira/browse/PIG-2632
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2632-0.patch, PIG-2632-1.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing 
> the Schema on the frontend, we can code generate Tuples which can be used for 
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, 
> and it's ~15% smaller serialized (heavily heavily depends on the data, 
> though). Need to do get/set tests, but assuming that it's on par (or even 
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema 
> given to UDF's. The next step is to make a SchemaBag, where I think the 
> serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to