----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61265/ -----------------------------------------------------------
(Updated Aug. 9, 2017, 3:59 a.m.) Review request for pig, Daniel Dai and Koji Noguchi. Changes ------- During performance testing on large data, found that were lot of spills. Fixed that by avoiding materialization of the full databag in nested foreach similar to how it happens in regular processing. Also in addition to udfs, any repeated POSort and PODistinct is now processed only once. Also took care of clearing those values for garbage collection as soon as the last foreach plan referring to them is done instead of at the end when input was detached. Bugs: PIG-5256 https://issues.apache.org/jira/browse/PIG-5256 Repository: pig Description ------- For each POForEach or POFilter operator in the plan, a corresponding class is created by inlining code for the input plans. Bytecode is generated directly through asm APIs and the generated class files are shipped to the tasks as a jar. On the frontend, the generated classes are dynamically added to the PigContext classloader. The explain command now decompiles the class files in the explain output directory. Refer to the golden files in the test for examples of generated code. License: The newly added fernflower.jar used for decompilation is from IntelliJ (java decompiler used by IntelliJ IDEA) https://github.com/JetBrains/intellij-community/blob/master/plugins/java-decompiler/engine/src/org/jetbrains/java/decompiler/main/Fernflower.java and the license of that is Apache 2.0 ithttps://github.com/JetBrains/intellij-community/blob/master/LICENSE.txt TODO items to be addressed in a separate jira: 1) PIG-5279 - Support for MR and Spark. Currently only done for Tez. Will also add documentation in this jira 2) Support for Accumulator 3) Support for CROSS 4) Run the optimizer on combiner plans 5) Fix for test failures - TestScriptLanguage.runParallelTest2, Jython_CompileBindRun_3 java.lang.LinkageError: loader (instance of org/apache/pig/impl/PigContext$ContextClassLoader): attempted duplicate class definition for name: "org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach_scope_6" Since NodeIdGenerator is ThreadLocal, running same script in parallel with different parameters using embedded Python causes conflict. Requires ThreadLocal classloaders in PigContext. Will address in a separate jira. Other jiras fixed as part of this: 1) PIG-4515: org.apache.pig.builtin.Distinct throws ClassCastException 2) When pig.opt.bytecode=true, UDFs defined as an alias inside nested foreach are executed only once. This was actually the primary goal of doing bytecode generation PIG-3000: Optimize nested foreach PIG-1633: Using an alias withing Nested Foreach causes indeterminate behaviour Diffs (updated) ----- http://svn.apache.org/repos/asf/pig/trunk/build.xml 1804148 http://svn.apache.org/repos/asf/pig/trunk/ivy.xml 1804148 http://svn.apache.org/repos/asf/pig/trunk/ivy/libraries.properties 1804148 http://svn.apache.org/repos/asf/pig/trunk/shade/pom.xml PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/optimizer/AsmUtil.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/optimizer/CodeGenerator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/optimizer/FilterCodeGenerator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/optimizer/ForEachCodeGenerator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/BagInputOperator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORegexp.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODistinct.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFilter.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLimit.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POOptimizedForEach.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSortedDistinct.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPlanContainer.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/RuntimeCodeGenOptimizer.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/Distinct.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/InvokerGenerator.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataType.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/GetNextTupleDataBag.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/SimpleAbstractDataBag.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/PigContext.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/plan/OperatorPlan.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/ClassDecompiler.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/JarManager.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestBuiltin.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestByteCodeGen.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEvalPipelineLocal.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestProjectStarRangeInUdf.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/Util.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/MRC17.gld 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilter/POFilter_scope_12.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilter/POForEach_scope_11.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilter/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilterSplit/POFilter_scope_14.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilterSplit/POFilter_scope_9.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilterSplit/POForEach_scope_7.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testFilterSplit/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachMapLookUp/POForEach_scope_8.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachMapLookUp/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachSplit/POForEach_scope_20.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachSplit/POForEach_scope_32.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachSplit/POForEach_scope_9.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachSplit/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval/POForEach_scope_35.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/POFilter_scope_45.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/POForEach_scope_18.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/POForEach_scope_31.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/POForEach_scope_43.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/POForEach_scope_49.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testForeachUDFEval2/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POFilter_scope_20.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POFilter_scope_48.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_14.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_26.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_37.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_42.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_54.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_61.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_67.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_7.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_70.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/POForEach_scope_76.java.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/bytecodegen/tez/testNestedForeach/plan.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Distinct-2.gld 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1804148 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezGraceParallelism.java 1804148 Diff: https://reviews.apache.org/r/61265/diff/6/ Changes: https://reviews.apache.org/r/61265/diff/5-6/ Testing ------- Ran full suite of unit and e2e tests with both pig.opt.bytecode=true. All tests except TestScriptLanguage.runParallelTest2, Jython_CompileBindRun_3 mentioned in description pass. Thanks, Rohini Palaniswamy