Dmitry Lychagin has posted comments on this change. Change subject: [ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization ......................................................................
Patch Set 13: (18 comments) still reviewing, but here's the first batch of comments https://asterix-gerrit.ics.uci.edu/#/c/2393/11/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj File asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj: Line 206: private static final String GEN_FIELDS_HINT = "gen-fields"; > Done. I changed it to "unpartitioned". I'd set the OrderByClause.isFullParallel to false by default. Then set to 'true' in SQL++ parser (unless there's 'unpartitioned' hint). Then AQL parser can remain the same and will continue to generate the same sort as before. https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java: Line 173: ByteArrayInputStream byteStream = new ByteArrayInputStream(localSamples.getByteArray(), don't create new instance for each tuple. use ByteArrayAccessibleInputStream Line 175: AOrderedList localSamplesList = listSerde.deserialize(new DataInputStream(byteStream)); don't create DateInputStream instance for each tuple. create it once with ByteArrayAccessibleInputStream then reset that stream's content https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/base/OperatorAnnotations.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/base/OperatorAnnotations.java: Line 25: public static final String USE_RANGE_CONNECTOR = "USE_RANGE_CONNECTOR"; // --> should we rename USE_RANGE_CONECTOR to USE_STATIC_RANGE and rename the value accordingly? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractStableSortPOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractStableSortPOperator.java: Line 82: if (sortOp.getAnnotations().get(OperatorAnnotations.USE_DYNAMIC_RANGE) == Boolean.TRUE) { should the same be required for USE_RANGE_CONNECTOR annotation? We know the range statically in that case, but the OrderedParititonedProperty requirement is the same. Right? https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/utils/DotFormatGenerator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/utils/DotFormatGenerator.java: Line 52: private static final LogicalOperatorDotVisitor dotVisitor = new LogicalOperatorDotVisitor(); why was it made static? this could be a problem if we want to call it concurrently. Not sure if we'll ever do that, but it's not a good pattern to follow. https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java File hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java: Line 117: private static final FunctionIdentifier LOCAL_SAMPLING_FUN = AlgebricksBuiltinFunctions.LOCAL_SAMPLING; just a thought. you can pass these two function identifiers through the constructor. that way they won't need to be defined in algebricks Line 144: debug(">>>> Optimizing operator " + op.getPhysicalOperator() + ".\n"); the idea behind calling Logger.isDebugEnabled() is to avoid string concatenation (i.e. create log message only if it will be logged). with your change the message is always created which is suboptimal. can we revert this? Line 165: trace(STR_PROP_FOR + op.getPhysicalOperator() + ": " + op.getDeliveredPhysicalProperties() + "\n"); same issue with potentially unnecessary message construction. revert to using Logger.isTraceEnabled() Line 170: private boolean physOptimizeOp(Mutable<ILogicalOperator> opRef, IPhysicalPropertiesVector required, minor. can you put this method back after getStartChildIndex()? This would make the review easier. Let's reorder methods in this rule in a follow up change if you want to do that. Line 611: if (parentOp.getAnnotations().containsKey(OperatorAnnotations.USE_RANGE_CONNECTOR)) { I think that we once we start requiring OrderedPartitionedProperty for children in AbstractStableSortPOperator for USE_RANGE_CONNECTOR then this branch should go away. What do you think? This branch needs to enforce UNPARTITIONED property, but does it really do it? Line 666: // options for range partitioning: 1. range map computed at run time, 2. no range map?? I think the two options are: 1) range map is known statically (USE_RANGE_CONNECTOR). 2) otherwise the range map is computed at runtime. So the rest of this method should be structured this way. Line 728: ReplicateOperator replicateOperator = new ReplicateOperator(2); set source location (take if from the parent operator) Line 750: private void createAggregateFunction(IOptimizationContext context, List<LogicalVariable> localResultVariables, set source locations for all created expressions and operators Line 796: AggregateOperator aggregateOperator = new AggregateOperator(resultVariables, expressions); set source location Line 810: private static ExchangeOperator createOneToOneExchangeOp(MutableObject<ILogicalOperator> inputOperator, IsolateHyracksOperatorsRule.insertOneToOneExchange() has almost identical code. Can we reuse it here or share the code somehow between the two rules? Line 824: ForwardOperator forwardOperator = new ForwardOperator(rangeMapKey, rangeMapVariable); set source location https://asterix-gerrit.ics.uci.edu/#/c/2393/11/hyracks-fullstack/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/partition/range/FieldRangePartitionComputerFactory.java File hyracks-fullstack/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/partition/range/FieldRangePartitionComputerFactory.java: Line 72: for (int slotNumber = 0; slotNumber < rangeMap.getSplitCount(); ++slotNumber) { > Could you help and point out what I should do? The tuple partition computer You can pass source location from the logical operator when creating FieldRangePartitionComputerFactory instances. store it in a field, then pass it to the exception constructor. -- To view, visit https://asterix-gerrit.ics.uci.edu/2393 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f Gerrit-PatchSet: 13 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Ali Alsuliman <ali.al.solai...@gmail.com> Gerrit-Reviewer: Ali Alsuliman <ali.al.solai...@gmail.com> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <dmitry.lycha...@couchbase.com> Gerrit-Reviewer: Ildar Absalyamov <ildar.absalya...@gmail.com> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Taewoo Kim <wangs...@gmail.com> Gerrit-Reviewer: Till Westmann <ti...@apache.org> Gerrit-Reviewer: abdullah alamoudi <bamou...@gmail.com> Gerrit-HasComments: Yes