----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6651/#review10473 -----------------------------------------------------------
Publishing the first set of comments (got through to MapReduceOper). More later. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/6651/#comment22332> does this break if we have 2 cube operators? should we add some identifier to the filename? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/6651/#comment22333> no, let's rely on the general estimator framework rather than special-casing. Having separate chunks of code dealing with this through the codebase will lead to confusion. Now that we allow passing a full plan into the reducer estimator, it will be possible to write an estimator that walks the plan and uses factors based on knowledge of inputs and operators that work on the input to adjust the naive estimation; we should encode any cube operator-specific logic into such an estimator. I think you can go ahead and remove this comment block. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/6651/#comment22334> kill dead code. let's note that the minimal number of reducers should be the max partition value, and how to get it, in POCube documentation. That way, whoever implements an improved estimator can follow this advice when they encounter POCube. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22351> you are adding a lot of code to this class. At 2800 lines, it's already too big a piece of code. perhaps some of it can be factored out into CubeCompiler or something similar? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22335> break this code out into its own function. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22337> can you add a comment describing what the plan looks like before and after this is applied? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22336> COUNT.class.getName(), COUNT_STAR.class.getName() src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22338> The trick of turning COUNTs into SUMs of COUNTS is already implicitly encoded in COUNT's implementation of Algebraic. Could we use the algebraic interface to do this generically, rather than hardcoding individual functions in the compiler? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22339> why 0? "input" might be projecting something else, right? Can you reuse "input"? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22340> @Override src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22341> drop "== true", it's already a boolean. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22342> as discussed, getting the total number of records is problematic. What if we write a sampler that does something along the lines of opening all input splits and reading from each until 2M are reached or input is exhausted? This doesn't let us use the 100K for 2m-2b heuristic, but it's something. this is also where we can look at LoadFunc implementations and see if we can just get an estimated total # of records from it -- if we can, we don't have to do the above compromise, and can apply all the heuristics described by Arnab. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22344> go ahead and write a test. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22346> don't compare booleans to true. just do if (className.equals(...)) { } src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22347> same src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java <https://reviews.apache.org/r/6651/#comment22349> How come? That sounds a little scary. Let's understand this. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java <https://reviews.apache.org/r/6651/#comment22353> can these be added to POCube and extracted from within the contained plans instead? - Dmitriy Ryaboy On Aug. 16, 2012, 10:19 p.m., Prasanth_J wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6651/ > ----------------------------------------------------------- > > (Updated Aug. 16, 2012, 10:19 p.m.) > > > Review request for pig and Dmitriy Ryaboy. > > > Description > ------- > > This is a review board request for > https://issues.apache.org/jira/browse/PIG-2831 > > > This addresses bug PIG-2831. > https://issues.apache.org/jira/browse/PIG-2831 > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 8029dec > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java > 1d05a20 > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java > b87c209 > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRPrinter.java > 157caad > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java > cde340c > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java > ff65146 > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCube.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java > 0502917 > src/org/apache/pig/builtin/CubeDimensions.java 5652029 > src/org/apache/pig/builtin/PigStorage.java 21e835f > src/org/apache/pig/builtin/RollupDimensions.java f6c26e4 > src/org/apache/pig/impl/builtin/HolisticCube.java PRE-CREATION > src/org/apache/pig/impl/builtin/HolisticCubeCompoundKey.java PRE-CREATION > src/org/apache/pig/impl/builtin/PartitionMaxGroup.java PRE-CREATION > src/org/apache/pig/impl/builtin/PostProcessCube.java PRE-CREATION > src/org/apache/pig/impl/io/ReadSingleLoader.java PRE-CREATION > src/org/apache/pig/impl/util/Utils.java 270cb6a > src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java > 13439c6 > src/org/apache/pig/newplan/logical/relational/LOCube.java b262efb > > src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java > 127ab7a > src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java 369f5c2 > src/org/apache/pig/parser/LogicalPlanBuilder.java 289a76f > src/org/apache/pig/pen/EquivalenceClasses.java 194f8cb > src/org/apache/pig/pen/LineageTrimmingVisitor.java 917073c > src/org/apache/pig/pen/util/DisplayExamples.java 265f8f7 > test/org/apache/pig/impl/builtin/TestHolisticCubeCompundKey.java > PRE-CREATION > test/org/apache/pig/impl/builtin/TestPartitionMaxGroup.java PRE-CREATION > test/org/apache/pig/impl/builtin/TestPostProcessCube.java PRE-CREATION > test/org/apache/pig/test/TestCubeOperator.java 65d56a6 > > Diff: https://reviews.apache.org/r/6651/diff/ > > > Testing > ------- > > Unit tests: All passed > > Pre-commit tests: All passed > ant clean test-commit > > > Thanks, > > Prasanth_J > >