>From Ian Maxon <[email protected]>: Attention is currently required from: Ali Alsuliman, Ali Alsuliman.
Ian Maxon has posted comments on this change by Ian Maxon. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email ) Change subject: [ASTERIXDB-3631][RT] Profile nested groupby clauses ...................................................................... Patch Set 8: -Code-Review (6 comments) Patchset: PS8: i'm working on a couple more tests to be sure things are as i understand them. one thing i noticed was that computeTimings() wasn't actually being called, yet the tests i expected would necessitate that being called still seem to work File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/aggreg/NestedPlansAccumulatingAggregatorFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/af51c350_01a185ad?usp=email : PS8, Line 186: profiling(ctx) ? new ArrayList<>(subplans.length) : Collections.emptyList(); > You can call profiling(ctx) only once and re-use in the 3 places. Done File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/PipelineAssembler.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/afb22395_f389886b?usp=email : PS8, Line 187: boolean shouldProfile = profile && !(runtimeFactory instanceof EmptyTupleSourceRuntimeFactory) > If we are here, then profile must be true, same thing for below. fixed. the old comment is still true though imo. not sure exactly what to do with it. File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/job/profiling/StatsCollector.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/644427e9_e405d1f4?usp=email : PS8, Line 38: private final Map<String, IOperatorStats> operatorStatsMap = new ConcurrentHashMap<>(); > Don't we need to use synchronized blocks or synchronized map? […] i don't believe so. part of the reason the issue with missing stats was so hard to find was because this structure is guarded by the locks around the task, which establish the correct sequence of events chronologically. however, that doesn't guarantee visibility of updates between threads, so even though the operators had long ago added the stats to the map, the thread managing the tasks can get a stale read. if this were a primitive we could simply make this variable volatile. however since it's a collection, we can either use a synchronized collection or a concurrent collection to achieve the correct cache consistency. my thought is to use a concurrent collection because there is no need for synchronization here. i also think it has better behavior in the event that my understanding of the bug is not correct, i.e. it cannot introduce a deadlock, only the possibility of a less-than-complete set of stats. it will never exhibit the behavior that causes the query to fail and crash the cc from serialization problems (more/less entries than stated in the map size leading to a buffer under/overflow) File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupOperatorNodePushable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/68c2dc3e_afa80ad2?usp=email : PS8, Line 41: private final RecordDescriptor outRecordDescriptor; > Just checking; we also have > AbstractUnaryInputUnaryOutputIntrospectingOperatorNodePushable. […] i didn't think of this initially and it's a good point. there's a lot of overlap between what's happening here and what happens in MetaOp. however i think an important distinction is that the only time the "Introspecting" part of the interface has to come into play for this operator is for nested group by clauses inside the aggregator. otherwise the normal profiling handles it perfectly afaik. so i think it's a question of how much of the profiling code we want to put in here. File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupWriter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/173984a2_c9c82cf5?usp=email : PS8, Line 93: aggregatorFactory.setOperatorStats(stats); > Why not the aggregator after it's created? e.g.: > this.aggregator.setOperatorStats(stats). […] i tried this actually, i like it, but it doesn't pencil out. i see now why i had to do it this way. the factory is what invokes the pipeline assembler, and currently that is the only place we know the relationship between the operators and hence can create the properly labeled IOperatorStats instances for them. i tried to refactor this some more to get rid of instanceof entirely and avoid having that sort of mysterious "setOperatorStats" be in the factory interface. -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I516dc90b8da3a7086dc80b67946ac14f6ade0973 Gerrit-Change-Number: 20270 Gerrit-PatchSet: 8 Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Attention: Ali Alsuliman <[email protected]> Gerrit-Attention: Ali Alsuliman <[email protected]> Gerrit-Comment-Date: Wed, 17 Dec 2025 17:23:13 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: Yes Comment-In-Reply-To: Ali Alsuliman <[email protected]>
