[
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249683#comment-13249683
]
[email protected] commented on PIG-2632:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review6784
-----------------------------------------------------------
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment15009>
instead of checking here if the tuple is generatable, the factory could
default to the regular TupleFactory if the generation fails.
trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15010>
How complex is this class ? Not sure it is worth pulling the whole mahout
just for this.
trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15011>
append...() methods should not be null
(part of your TODO list?)
trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15013>
handle the case where other == null
Here you get NullPointerException
trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15014>
same comment
trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15015>
If you override equals(), you should override hashCode(). 2 object that are
equal must return the same hashcode
trunk/src/org/apache/pig/data/TupleFactory.java
<https://reviews.apache.org/r/4651/#comment15052>
that's where we need a proper classloader
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15016>
try avoiding accessing the PigContext statically. Add it as a parameter and
see if it can be passed from somewhere it is actually known.
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15017>
possibly we want to rename this and/or add something else for this file.
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15018>
?
either get rid of the empty catch block or document why it is Ok.
Here I don't see why it would be Ok that we can not instantiate that class
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15019>
"org.apache.pig.generated." + classname ?
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15020>
"org/apache/pig/generated/" + classname + ".class" ?
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15021>
we should probably not compile in the current dir.
Put it in the temp dir intead
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15022>
this could be calling a method in the parent
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15023>
this could be in the parent
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15024>
stuff which is not calling a generated field directly should be pulled up.
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15025>
same comment
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15026>
same
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15027>
this is wasteful. Maybe DataByteArray could provide a static
compare(byte[], byte[]) ?
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15028>
duplicated code
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15029>
this can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15030>
the exception handling can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15031>
"setPos_" + fieldPos + "((Boolean)val);"
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15032>
same with a mapping for the java object for the type
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15033>
can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15034>
you can just remove the call to box, it should work
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15035>
this can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15036>
can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15037>
can be pulled up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15038>
pull up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15039>
pull up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15040>
pull up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15041>
pull up
trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15042>
pull up
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15043>
Ok, temporary
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15044>
do we need this. Let's just fall back to regular tuple when generation
fails, so that we don't need to keep those in sync
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15045>
The factory could be generated as well so that we don't need to use
reflection here.
Reflection is slower to create new instances.
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15046>
The parameter type is enough to differentiate.
getSchemaTupleFactory(Schema schema)
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15047>
same
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15048>
getTupleClass()
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15049>
getTupleClass()
trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15050>
getTupleClass()
trunk/src/org/apache/pig/data/Tuple.java
<https://reviews.apache.org/r/4651/#comment15051>
typo
- Julien
On 2012-04-08 22:26:29, Jonathan Coveney wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4651/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-08 22:26:29)
bq.
bq.
bq. Review request for pig and Julien Le Dem.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This work builds on Dmitriy's PrimitiveTuple work. The idea is that,
knowing the Schema on the frontend, we can code generate Tuples which can be
used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x
better, and it's ~15% smaller serialized (heavily heavily depends on the data,
though). Need to do get/set tests, but assuming that it's on par (or even
faster) than Tuple, the memory gain is huge.
bq.
bq. Need to clean up the code and add tests.
bq.
bq. Right now, it generates a SchemaTuple for every inputSchema and
outputSchema given to UDF's. The next step is to make a SchemaBag, where I
think the serialization savings will be really huge.
bq.
bq. Needs tests and comments, but I want the code to settle a bit.
bq.
bq.
bq. This addresses bug PIG-2632.
bq. https://issues.apache.org/jira/browse/PIG-2632
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/bin/pig 1310666
bq. trunk/build.xml 1310666
bq. trunk/ivy.xml 1310666
bq. trunk/ivy/libraries.properties 1310666
bq.
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
1310666
bq. trunk/src/org/apache/pig/data/BinInterSedes.java 1310666
bq. trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION
bq. trunk/src/org/apache/pig/data/PrimitiveTuple.java 1310666
bq. trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION
bq. trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
PRE-CREATION
bq. trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION
bq. trunk/src/org/apache/pig/data/Tuple.java 1310666
bq. trunk/src/org/apache/pig/data/TupleFactory.java 1310666
bq. trunk/src/org/apache/pig/data/TypeAwareTuple.java 1310666
bq. trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION
bq. trunk/src/org/apache/pig/impl/PigContext.java 1310666
bq.
trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java
1310666
bq.
bq. Diff: https://reviews.apache.org/r/4651/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jonathan
bq.
bq.
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
> Key: PIG-2632
> URL: https://issues.apache.org/jira/browse/PIG-2632
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-3.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing
> the Schema on the frontend, we can code generate Tuples which can be used for
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better,
> and it's ~15% smaller serialized (heavily heavily depends on the data,
> though). Need to do get/set tests, but assuming that it's on par (or even
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema
> given to UDF's. The next step is to make a SchemaBag, where I think the
> serialization savings will be really huge.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira