----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4651/#review8812 -----------------------------------------------------------
Great work! some minor comments. This is getting really good! trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/4651/#comment18632> longer term we should probably have a DistributedCacheManager to centralize those things (not now) trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java <https://reviews.apache.org/r/4651/#comment18634> maybe this instead: SchemaTupleBackend.initialize(job, pigContext.getExecType()); and check inside trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java <https://reviews.apache.org/r/4651/#comment18633> is this needed ? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/4651/#comment18657> whitespace trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java <https://reviews.apache.org/r/4651/#comment18663> if you override get, you should really override containsKey as well, or this could hide some hard to debug side effects. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java <https://reviews.apache.org/r/4651/#comment18665> any reason you decided to extend HashMap as opposed to just convert when inserting? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java <https://reviews.apache.org/r/4651/#comment18658> factor this out in a method. Is there a case when this is already a SchemaTuple? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java <https://reviews.apache.org/r/4651/#comment18667> why not just convert the tuple here, instead of extending ArrayList? It would seem a little more obvious. If you want a strategy pattern, it does not have to be in List. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java <https://reviews.apache.org/r/4651/#comment18668> are there cases where the tuple is already a SchemaTuple? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java <https://reviews.apache.org/r/4651/#comment18669> what is this for ? trunk/src/org/apache/pig/builtin/mock/Storage.java <https://reviews.apache.org/r/4651/#comment18670> thanks trunk/src/org/apache/pig/data/AppendableSchemaTuple.java <https://reviews.apache.org/r/4651/#comment18671> remove? trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment18675> what are those for ? It's unlikely we want UDFs to be dependent on SchemaTuples (or their absence) trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java <https://reviews.apache.org/r/4651/#comment18678> is there some code somewhere that does this already ? trunk/test/org/apache/pig/data/TestSchemaTuple.java <https://reviews.apache.org/r/4651/#comment18679> :) - Julien Le Dem On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4651/ > ----------------------------------------------------------- > > (Updated June 29, 2012, 9:55 p.m.) > > > Review request for pig and Julien Le Dem. > > > Description > ------- > > This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing > the Schema on the frontend, we can code generate Tuples which can be used for > fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, > and it's ~15% smaller serialized (heavily heavily depends on the data, > though). Need to do get/set tests, but assuming that it's on par (or even > faster) than Tuple, the memory gain is huge. > > Need to clean up the code and add tests. > > Right now, it generates a SchemaTuple for every inputSchema and outputSchema > given to UDF's. The next step is to make a SchemaBag, where I think the > serialization savings will be really huge. > > Needs tests and comments, but I want the code to settle a bit. > > > This addresses bug PIG-2632. > https://issues.apache.org/jira/browse/PIG-2632 > > > Diffs > ----- > > trunk/.gitignore 1355561 > trunk/conf/pig.properties 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java > 1355561 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java > 1355561 > trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 > trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION > trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 > trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 > trunk/src/org/apache/pig/data/DataByteArray.java 1355561 > trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION > trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 > trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 > trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 > trunk/src/org/apache/pig/data/PIntTuple.java 1355561 > trunk/src/org/apache/pig/data/PLongTuple.java 1355561 > trunk/src/org/apache/pig/data/PStringTuple.java 1355561 > trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 > trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 > trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION > trunk/src/org/apache/pig/data/TupleFactory.java 1355561 > trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION > trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 > trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION > trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION > trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION > trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION > trunk/src/org/apache/pig/impl/PigContext.java 1355561 > trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 > trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 > > trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1355561 > trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java > 1355561 > > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java > 1355561 > > trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java > 1355561 > > trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java > 1355561 > trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 > trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION > trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION > trunk/test/org/apache/pig/test/TestDataBag.java 1355561 > trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 > trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 > trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 > trunk/test/org/apache/pig/test/TestSchema.java 1355561 > > Diff: https://reviews.apache.org/r/4651/diff/ > > > Testing > ------- > > > Thanks, > > Jonathan Coveney > >
