[ 
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249683#comment-13249683
 ] 

jirapos...@reviews.apache.org commented on PIG-2632:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review6784
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment15009>

    instead of checking here if the tuple is generatable, the factory could 
default to the regular TupleFactory if the generation fails. 



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15010>

    How complex is this class ? Not sure it is worth pulling the whole mahout 
just for this.



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15011>

    append...() methods should not be null
    (part of your TODO list?)



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15013>

    handle the case where other == null
    Here you get NullPointerException



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15014>

    same comment



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15015>

    If you override equals(), you should override hashCode(). 2 object that are 
equal must return the same hashcode



trunk/src/org/apache/pig/data/TupleFactory.java
<https://reviews.apache.org/r/4651/#comment15052>

    that's where we need a proper classloader



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15016>

    try avoiding accessing the PigContext statically. Add it as a parameter and 
see if it can be passed from somewhere it is actually known.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15017>

    possibly we want to rename this and/or add something else for this file.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15018>

    ?
    either get rid of the empty catch block or document why it is Ok.
    
    Here I don't see why it would be Ok that we can not instantiate that class



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15019>

    "org.apache.pig.generated." + classname ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15020>

    "org/apache/pig/generated/" + classname + ".class" ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15021>

    we should probably not compile in the current dir.
    Put it in the temp dir intead



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15022>

    this could be calling a method in the parent



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15023>

    this could be in the parent



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15024>

    stuff which is not calling a generated field directly should be pulled up.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15025>

    same comment



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15026>

    same



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15027>

    this is wasteful. Maybe DataByteArray could provide a static 
compare(byte[], byte[]) ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15028>

    duplicated code



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15029>

    this can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15030>

    the exception handling can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15031>

    "setPos_" + fieldPos + "((Boolean)val);"



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15032>

    same with a mapping for the java object for the type 



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15033>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15034>

    you can just remove the call to box, it should work



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15035>

    this can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15036>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15037>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15038>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15039>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15040>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15041>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15042>

    pull up
    



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15043>

    Ok, temporary



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15044>

    do we need this. Let's just fall back to regular tuple when generation 
fails, so that we don't need to keep those in sync



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15045>

    The factory could be generated as well so that we don't need to use 
reflection here.
    Reflection is slower to create new instances.



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15046>

    The parameter type is enough to differentiate.
    getSchemaTupleFactory(Schema schema)



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15047>

    same



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15048>

    getTupleClass()



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15049>

    getTupleClass()



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15050>

    getTupleClass()



trunk/src/org/apache/pig/data/Tuple.java
<https://reviews.apache.org/r/4651/#comment15051>

    typo


- Julien


On 2012-04-08 22:26:29, Jonathan Coveney wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4651/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-08 22:26:29)
bq.  
bq.  
bq.  Review request for pig and Julien Le Dem.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This work builds on Dmitriy's PrimitiveTuple work. The idea is that, 
knowing the Schema on the frontend, we can code generate Tuples which can be 
used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x 
better, and it's ~15% smaller serialized (heavily heavily depends on the data, 
though). Need to do get/set tests, but assuming that it's on par (or even 
faster) than Tuple, the memory gain is huge.
bq.  
bq.  Need to clean up the code and add tests.
bq.  
bq.  Right now, it generates a SchemaTuple for every inputSchema and 
outputSchema given to UDF's. The next step is to make a SchemaBag, where I 
think the serialization savings will be really huge.
bq.  
bq.  Needs tests and comments, but I want the code to settle a bit.
bq.  
bq.  
bq.  This addresses bug PIG-2632.
bq.      https://issues.apache.org/jira/browse/PIG-2632
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/bin/pig 1310666 
bq.    trunk/build.xml 1310666 
bq.    trunk/ivy.xml 1310666 
bq.    trunk/ivy/libraries.properties 1310666 
bq.    
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
 1310666 
bq.    trunk/src/org/apache/pig/data/BinInterSedes.java 1310666 
bq.    trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/PrimitiveTuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java 
PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/Tuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/TupleFactory.java 1310666 
bq.    trunk/src/org/apache/pig/data/TypeAwareTuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/impl/PigContext.java 1310666 
bq.    
trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 
1310666 
bq.  
bq.  Diff: https://reviews.apache.org/r/4651/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
>                 Key: PIG-2632
>                 URL: https://issues.apache.org/jira/browse/PIG-2632
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-3.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing 
> the Schema on the frontend, we can code generate Tuples which can be used for 
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, 
> and it's ~15% smaller serialized (heavily heavily depends on the data, 
> though). Need to do get/set tests, but assuming that it's on par (or even 
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema 
> given to UDF's. The next step is to make a SchemaBag, where I think the 
> serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to