----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4651/ -----------------------------------------------------------
(Updated June 20, 2012, 4:28 a.m.) Review request for pig and Julien Le Dem. Changes ------- This patch now incorporates work from: https://issues.apache.org/jira/browse/PIG-2673 The goal being to leverage SchemaTuples to make merge joins more performance from a memory perspective (since the current implementation keeps a list of tuples). And I tried to add more tests. I'd like to get a to do list of what needs to be done for this to get committed, if possible. Description ------- This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge. Need to clean up the code and add tests. Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge. Needs tests and comments, but I want the code to settle a bit. This addresses bug PIG-2632. https://issues.apache.org/jira/browse/PIG-2632 Diffs (updated) ----- trunk/src/docs/src/documentation/content/xdocs/perf.xml 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351931 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1351931 trunk/src/org/apache/pig/data/BinInterSedes.java 1351931 trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1351931 trunk/src/org/apache/pig/data/DataByteArray.java 1351931 trunk/src/org/apache/pig/data/TupleFactory.java 1351931 trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351931 trunk/src/org/apache/pig/impl/PigContext.java 1351931 trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351931 trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351931 trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351931 trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1351931 trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1351931 trunk/test/org/apache/pig/test/TestDataBag.java 1351931 trunk/test/org/apache/pig/test/TestSchema.java 1351931 Diff: https://reviews.apache.org/r/4651/diff/ Testing ------- Thanks, Jonathan Coveney
