[ https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412470#comment-13412470 ]
Jonathan Coveney commented on PIG-2632: --------------------------------------- Daniel, 1) I agree that there are a lot of great places to use this. Next on my plate is using it with LoadFuncs and Foreaches, and then ideally Bag support (which I do not think would be difficult at all, just need some time). I hadn't thought about lazy tuples -- need to take a look at his code. 2) I submitted a patch fixing the MergeJoin errors, and have time to look at the rest. Do you know if any of the others were fixed by that fix? I hate the flakiness of the full test suite, hard to know what is and isn't a false positive! > Create a SchemaTuple which generates efficient Tuples via code gen > ------------------------------------------------------------------ > > Key: PIG-2632 > URL: https://issues.apache.org/jira/browse/PIG-2632 > Project: Pig > Issue Type: Improvement > Reporter: Jonathan Coveney > Assignee: Jonathan Coveney > Fix For: 0.11 > > Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-10.patch, > PIG-2632-10.patch, PIG-2632-3.patch, PIG-2632-4.patch, PIG-2632-5.patch, > PIG-2632-6.patch, PIG-2632-7.patch, PIG-2632-8.patch, PIG-2632-9.patch, > PIG-2632-9.patch, schematuple benchmarking.pdf, schematuple benchmarking.pptx > > > This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing > the Schema on the frontend, we can code generate Tuples which can be used for > fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, > and it's ~15% smaller serialized (heavily heavily depends on the data, > though). Need to do get/set tests, but assuming that it's on par (or even > faster) than Tuple, the memory gain is huge. > Need to clean up the code and add tests. > Right now, it generates a SchemaTuple for every inputSchema and outputSchema > given to UDF's. The next step is to make a SchemaBag, where I think the > serialization savings will be really huge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira