>From Ian Maxon <[email protected]>: Attention is currently required from: Ali Alsuliman, Ali Alsuliman, Michael Blow.
Ian Maxon has posted comments on this change by Ian Maxon. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email ) Change subject: [ASTERIXDB-3631][RT] Profile nested groupby clauses ...................................................................... Patch Set 14: (14 comments) File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f5b80772_5945dad7?usp=email : PS13, Line 193: List<Integer> paths = new ArrayList<>(inputs.size()); > I am having a hard time following the logic for `paths` and `Collections. […] it is very convoluted and i don't like it. every time i look on this code after not looking at it for a few months it confuses the hell out of me. indeed the reason is so that the numbering exactly matches how it's constructed in the push runtime. it should be refactored so that we generate the identifiers here, and pass them through, rather than generating them twice independently and trying to match them. i tried to implement your suggestion on refactoring these two methods. however after wrangling a bit and looking at the output, i don't think passing baseId like that will work, to avoid making getExtendedOdidForOperator return int. if i implemented your suggestion correctly it does label the ops, just in the opposite order they need to be. the core issue here is that in the push runtime the numbers are incremented from source to sink, but here the operators are linked from sink to source. so basically we have to perform a dfs to get the proper numbering, currently. the whole stuff about paths() is figuring out where the numbering should start from, if you actually do have multiple inputs. in that case it should start from the path farthest away, i.e. the max of the paths. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/9465631f_bbfa0f43?usp=email : PS13, Line 199: paths.isEmpty() > Can "paths.isEmpty()" ever be empty? seems like it should always have > something. yeah it shouldn't. i just have a habit of stuff like that to avoid NPEs. File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/AlgebricksMetaOperatorDescriptor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/e5a03f98_8ddf2a71?usp=email : PS13, Line 105: subIds.length() > 2 > Is this reliable to determine it's Subplan? I didn't follow the logic, but it > feels like we are assu […] yeah that's a bad assumption. switched to check the length of the array returned by split() instead. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/8678cf9d_83aac22f?usp=email : PS13, Line 140: public > Does `traverseFactories()` need to be `public` or is it just used here and > can be made `private`? can be private sure https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/ba10bc58_92412a64?usp=email : PS13, Line 153: else > Previously, we were always adding: `microOpStats.put(fact, > makeStatForRuntimeFact`. […] i'm a little foggy on it, but i think the reason is that since now we are piercing into the subplan itself, there's no reason to add the top-level "Subplan - ODID:x.y.z" stats. the individual stats for the operators within the subplan will account for its time. i can dig into it more if you like File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/PipelineAssembler.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/9a273823_ba0f06de?usp=email : PS13, Line 172: profiling(ctx) > You already have `boolean profile` above. You can replace > `ctx.getJobFlags().contains(JobFlag. […] Done File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/SubplanRuntimeFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/aa27a128_2af98336?usp=email : PS13, Line 138: private final boolean profile; > Seems like "profile" can be just a local variable instead of instance variable Done File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/std/NestedTupleSourceRuntimeFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/0c47becd_e33e81ee?usp=email : PS13, Line 77: INestedTupleSourceRuntime nestedWrapped; > Make it `private final` if we can. Done File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/win/WindowAggregatorDescriptorFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/40a444b4_24289840?usp=email : PS13, Line 78: .assemblePipeline(subplan, outputWriter, ctx, pipelineRuntimeMap, Collections.emptyMap()).getLeft() > Seems like you are using `Collections.emptyMap()` to indicate non-profiling. > […] yea good point. not obvious from the function invocation what the implication of the argument is. File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/dataflow/ProfiledFrameWriter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/1943d570_f86f9712?usp=email : PS13, Line 48: protected > I haven't checked. […] it's intentional, in NestedTupleSourceRuntimeFactory which extends this class, it accesses this field directly File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/job/profiling/StatsCollector.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f34f25a0_1bfebbfa?usp=email : PS13, Line 85: output.writeInt(operatorStatsMap.size()); : for (IOperatorStats stat : operatorStatsMap.values()) { : stat.writeFields(output); : } > Remind me to come back to this and whether `operatorStatsMap` needs to be > synchronized or not. good point, i think this should be as well. since this is the other place where the thread that completes the task is accessing this structure. all the other accesses are from the task threads themselves. maybe operatorStatsMap should just use a synchronizedMap? File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupOperatorNodePushable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/16c321c4_4a8291ca?usp=email : PS13, Line 36: PreclusteredGroupOperatorNodePushable > Just checking, when profiling, does this nodepushable become > "ProfiledOperatorNodePushable"? how is […] it does indeed. basically making this operator self profiling lets the stats for it be passed into the nested subplan- so that the overall time in nextFrame() can be subtracted by all of the nested ops inside it. otherwise you just see all of the time taken in the subplan accumulated here File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupWriter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/8856a301_d7062fe8?usp=email : PS13, Line 66: private Runnable timingHandle = Function::identity; > We can just simplify and remove this instance variable. […] sure was just trying too hard to avoid yet another cast File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/util/ProfilingUtils.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/eb1ac863_4e70c014?usp=email : PS13, Line 27: ctx != null > Can `ctx != null` ever be null? I don't think so. We can remove this check. yeah probably not, sure removed -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I516dc90b8da3a7086dc80b67946ac14f6ade0973 Gerrit-Change-Number: 20270 Gerrit-PatchSet: 14 Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-CC: Michael Blow <[email protected]> Gerrit-Attention: Ali Alsuliman <[email protected]> Gerrit-Attention: Ali Alsuliman <[email protected]> Gerrit-Attention: Michael Blow <[email protected]> Gerrit-Comment-Date: Fri, 06 Feb 2026 16:07:49 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Comment-In-Reply-To: Ali Alsuliman <[email protected]>
