Dmitry Lychagin has posted comments on this change.

Change subject: [ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization
......................................................................


Patch Set 13:

(18 comments)

still reviewing, but here's the first batch of comments

https://asterix-gerrit.ics.uci.edu/#/c/2393/11/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
File asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj:

Line 206:     private static final String GEN_FIELDS_HINT = "gen-fields";
> Done. I changed it to "unpartitioned".
I'd set the OrderByClause.isFullParallel to false by default. Then set to 
'true' in SQL++ parser (unless there's 'unpartitioned' hint). Then AQL parser 
can remain the same and will continue to generate the same sort as before.


https://asterix-gerrit.ics.uci.edu/#/c/2393/13/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java
File 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/aggregates/std/GlobalSamplingAggregateDescriptor.java:

Line 173:             ByteArrayInputStream byteStream = new 
ByteArrayInputStream(localSamples.getByteArray(),
don't create new instance for each tuple. use ByteArrayAccessibleInputStream


Line 175:             AOrderedList localSamplesList = listSerde.deserialize(new 
DataInputStream(byteStream));
don't create DateInputStream instance for each tuple. create it once with 
ByteArrayAccessibleInputStream then reset that stream's content


https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/base/OperatorAnnotations.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/base/OperatorAnnotations.java:

Line 25:     public static final String USE_RANGE_CONNECTOR = 
"USE_RANGE_CONNECTOR"; // -->
should we rename USE_RANGE_CONECTOR to USE_STATIC_RANGE and rename the value 
accordingly?


https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractStableSortPOperator.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractStableSortPOperator.java:

Line 82:             if 
(sortOp.getAnnotations().get(OperatorAnnotations.USE_DYNAMIC_RANGE) == 
Boolean.TRUE) {
should the same be required for USE_RANGE_CONNECTOR annotation? We know the 
range statically in that case, but the OrderedParititonedProperty requirement 
is the same. Right?


https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/utils/DotFormatGenerator.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/utils/DotFormatGenerator.java:

Line 52:     private static final LogicalOperatorDotVisitor dotVisitor = new 
LogicalOperatorDotVisitor();
why was it made static? this could be a problem if we want to call it 
concurrently. Not sure if we'll ever do that, but it's not a good pattern to 
follow.


https://asterix-gerrit.ics.uci.edu/#/c/2393/13/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java
File 
hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/EnforceStructuralPropertiesRule.java:

Line 117:     private static final FunctionIdentifier LOCAL_SAMPLING_FUN = 
AlgebricksBuiltinFunctions.LOCAL_SAMPLING;
just a thought. you can pass these two function identifiers through the 
constructor. that way they won't need to be defined in algebricks


Line 144:         debug(">>>> Optimizing operator " + op.getPhysicalOperator() 
+ ".\n");
the idea behind calling Logger.isDebugEnabled() is to avoid string 
concatenation (i.e. create log message only if it will be logged). with your 
change the message is always created which is suboptimal. can we revert this?


Line 165:             trace(STR_PROP_FOR + op.getPhysicalOperator() + ": " + 
op.getDeliveredPhysicalProperties() + "\n");
same issue with potentially unnecessary message construction. revert to using 
Logger.isTraceEnabled()


Line 170:     private boolean physOptimizeOp(Mutable<ILogicalOperator> opRef, 
IPhysicalPropertiesVector required,
minor. can you put this method back after getStartChildIndex()? This would make 
the review easier. Let's reorder methods in this rule in a follow up change if 
you want to do that.


Line 611:             if 
(parentOp.getAnnotations().containsKey(OperatorAnnotations.USE_RANGE_CONNECTOR))
 {
I think that we once we start requiring OrderedPartitionedProperty for children 
in AbstractStableSortPOperator for USE_RANGE_CONNECTOR then this branch should 
go away. What do you think? This branch needs to enforce UNPARTITIONED 
property, but does it really do it?


Line 666:         // options for range partitioning: 1. range map computed at 
run time, 2. no range map??
I think the two options are: 1)  range map is known statically 
(USE_RANGE_CONNECTOR). 2) otherwise the range map is computed at runtime. So 
the rest of this method should be structured this way.


Line 728:         ReplicateOperator replicateOperator = new 
ReplicateOperator(2);
set source location (take if from the parent operator)


Line 750:     private void createAggregateFunction(IOptimizationContext 
context, List<LogicalVariable> localResultVariables,
set source locations for all created expressions and operators


Line 796:         AggregateOperator aggregateOperator = new 
AggregateOperator(resultVariables, expressions);
set source location


Line 810:     private static ExchangeOperator 
createOneToOneExchangeOp(MutableObject<ILogicalOperator> inputOperator,
IsolateHyracksOperatorsRule.insertOneToOneExchange() has almost identical code. 
Can we reuse it here or share the code somehow between the two rules?


Line 824:         ForwardOperator forwardOperator = new 
ForwardOperator(rangeMapKey, rangeMapVariable);
set source location


https://asterix-gerrit.ics.uci.edu/#/c/2393/11/hyracks-fullstack/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/partition/range/FieldRangePartitionComputerFactory.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/partition/range/FieldRangePartitionComputerFactory.java:

Line 72:                 for (int slotNumber = 0; slotNumber < 
rangeMap.getSplitCount(); ++slotNumber) {
> Could you help and point out what I should do? The tuple partition computer
You can pass source location from the logical operator when creating 
FieldRangePartitionComputerFactory instances. store it in a field, then pass it 
to the exception constructor.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/2393
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f
Gerrit-PatchSet: 13
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Ali Alsuliman <ali.al.solai...@gmail.com>
Gerrit-Reviewer: Ali Alsuliman <ali.al.solai...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dmitry.lycha...@couchbase.com>
Gerrit-Reviewer: Ildar Absalyamov <ildar.absalya...@gmail.com>
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Taewoo Kim <wangs...@gmail.com>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: abdullah alamoudi <bamou...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to