>From Ian Maxon <[email protected]>:

Attention is currently required from: Ali Alsuliman, Ali Alsuliman.

Ian Maxon has posted comments on this change by Ian Maxon. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email )

Change subject: [ASTERIXDB-3631][RT] Profile nested groupby clauses
......................................................................


Patch Set 8: -Code-Review

(6 comments)

Patchset:

PS8:
i'm working on a couple more tests to be sure things are as i understand them. 
one thing i noticed was that computeTimings() wasn't actually being called, yet 
the tests i expected would necessitate that being called still seem to work


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/aggreg/NestedPlansAccumulatingAggregatorFactory.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/af51c350_01a185ad?usp=email
 :
PS8, Line 186:                 profiling(ctx) ? new 
ArrayList<>(subplans.length) : Collections.emptyList();
> You can call profiling(ctx) only once and re-use in the 3 places.
Done


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/PipelineAssembler.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/afb22395_f389886b?usp=email
 :
PS8, Line 187:                     boolean shouldProfile = profile && 
!(runtimeFactory instanceof EmptyTupleSourceRuntimeFactory)
> If we are here, then profile must be true, same thing for below.
fixed. the old comment is still true though imo. not sure exactly what to do 
with it.


File 
hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/job/profiling/StatsCollector.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/644427e9_e405d1f4?usp=email
 :
PS8, Line 38:     private final Map<String, IOperatorStats> operatorStatsMap = 
new ConcurrentHashMap<>();
> Don't we need to use synchronized blocks or synchronized map? […]
i don't believe so. part of the reason the issue with missing stats was so hard 
to find was because this structure is guarded by the locks around the task, 
which establish the correct sequence of events chronologically. however, that 
doesn't guarantee visibility of updates between threads, so even though the 
operators had long ago added the stats to the map, the thread managing the 
tasks can get a stale read.

if this were a primitive we could simply make this variable volatile. however 
since it's a collection, we can either use a synchronized collection or a 
concurrent collection to achieve the correct cache consistency. my thought is 
to use a concurrent collection because there is no need for synchronization 
here. i also think it has better behavior in the event that my understanding of 
the bug is not correct, i.e. it cannot introduce a deadlock, only the 
possibility of a less-than-complete set of stats. it will never exhibit the 
behavior that causes the query to fail and crash the cc from serialization 
problems (more/less entries than stated in the map size leading to a buffer 
under/overflow)


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupOperatorNodePushable.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/68c2dc3e_afa80ad2?usp=email
 :
PS8, Line 41:     private final RecordDescriptor outRecordDescriptor;
> Just checking; we also have 
> AbstractUnaryInputUnaryOutputIntrospectingOperatorNodePushable. […]
i didn't think of this initially and it's a good point. there's a lot of 
overlap between what's happening here and what happens in MetaOp. however i 
think an important distinction is that the only time the "Introspecting" part 
of the interface has to come into play for this operator is for nested group by 
clauses inside the aggregator. otherwise the normal profiling handles it 
perfectly afaik. so i think it's a question of how much of the profiling code 
we want to put in here.


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupWriter.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/173984a2_c9c82cf5?usp=email
 :
PS8, Line 93:         aggregatorFactory.setOperatorStats(stats);
> Why not the aggregator after it's created? e.g.: 
> this.aggregator.setOperatorStats(stats). […]
i tried this actually, i like it, but it doesn't pencil out. i see now why i 
had to do it this way. the factory is what invokes the pipeline assembler, and 
currently that is the only place we know the relationship between the operators 
and hence can create the properly labeled IOperatorStats instances for them. i 
tried to refactor this some more to get rid of instanceof entirely and avoid 
having that sort of mysterious "setOperatorStats" be in the factory interface.



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I516dc90b8da3a7086dc80b67946ac14f6ade0973
Gerrit-Change-Number: 20270
Gerrit-PatchSet: 8
Gerrit-Owner: Ian Maxon <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Attention: Ali Alsuliman <[email protected]>
Gerrit-Attention: Ali Alsuliman <[email protected]>
Gerrit-Comment-Date: Wed, 17 Dec 2025 17:23:13 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
Comment-In-Reply-To: Ali Alsuliman <[email protected]>

Reply via email to