>From Ian Maxon <[email protected]>:

Attention is currently required from: Ali Alsuliman, Ali Alsuliman, Michael 
Blow.

Ian Maxon has posted comments on this change by Ian Maxon. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email )

Change subject: [ASTERIXDB-3631][RT] Profile nested groupby clauses
......................................................................


Patch Set 14:

(14 comments)

File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f5b80772_5945dad7?usp=email
 :
PS13, Line 193: List<Integer> paths = new ArrayList<>(inputs.size());
> I am having a hard time following the logic for `paths` and `Collections. […]
it is very convoluted and i don't like it. every time i look on this code after 
not looking at it for a few months it confuses the hell out of me. indeed the 
reason is so that the numbering exactly matches how it's constructed in the 
push runtime. it should be refactored so that we generate the identifiers here, 
and pass them through, rather than generating them twice independently and 
trying to match them.

i tried to implement your suggestion on refactoring these two methods. however 
after wrangling a bit and looking at the output, i don't think passing baseId 
like that will work, to avoid making getExtendedOdidForOperator return int. if 
i implemented your suggestion correctly it does label the ops, just in the 
opposite order they need to be. the core issue here is that in the push runtime 
the numbers are incremented from source to sink, but here the operators are 
linked from sink to source. so basically we have to perform a dfs to get the 
proper numbering, currently.

the whole stuff about paths() is figuring out where the numbering should start 
from, if you actually do have multiple inputs. in that case it should start 
from the path farthest away, i.e. the max of the paths.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/9465631f_bbfa0f43?usp=email
 :
PS13, Line 199: paths.isEmpty()
> Can "paths.isEmpty()" ever be empty? seems like it should always have 
> something.
yeah it shouldn't. i just have a habit of stuff like that to avoid NPEs.


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/AlgebricksMetaOperatorDescriptor.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/e5a03f98_8ddf2a71?usp=email
 :
PS13, Line 105: subIds.length() > 2
> Is this reliable to determine it's Subplan? I didn't follow the logic, but it 
> feels like we are assu […]
yeah that's a bad assumption. switched to check the length of the array 
returned by split() instead.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/8678cf9d_83aac22f?usp=email
 :
PS13, Line 140: public
> Does `traverseFactories()` need to be `public` or is it just used here and 
> can be made `private`?
can be private sure


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/ba10bc58_92412a64?usp=email
 :
PS13, Line 153: else
> Previously, we were always adding: `microOpStats.put(fact, 
> makeStatForRuntimeFact`. […]
i'm a little foggy on it, but i think the reason is that since now we are 
piercing into the subplan itself, there's no reason to add the top-level 
"Subplan - ODID:x.y.z" stats. the individual stats for the operators within the 
subplan will account for its time. i can dig into it more if you like


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/PipelineAssembler.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/9a273823_ba0f06de?usp=email
 :
PS13, Line 172: profiling(ctx)
> You already have `boolean profile` above. You can replace 
> `ctx.getJobFlags().contains(JobFlag. […]
Done


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/SubplanRuntimeFactory.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/aa27a128_2af98336?usp=email
 :
PS13, Line 138: private final boolean profile;
> Seems like "profile" can be just a local variable instead of instance variable
Done


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/std/NestedTupleSourceRuntimeFactory.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/0c47becd_e33e81ee?usp=email
 :
PS13, Line 77: INestedTupleSourceRuntime nestedWrapped;
> Make it `private final` if we can.
Done


File 
hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/win/WindowAggregatorDescriptorFactory.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/40a444b4_24289840?usp=email
 :
PS13, Line 78: .assemblePipeline(subplan, outputWriter, ctx, 
pipelineRuntimeMap, Collections.emptyMap()).getLeft()
> Seems like you are using `Collections.emptyMap()` to indicate non-profiling. 
> […]
yea good point. not obvious from the function invocation what the implication 
of the argument is.


File 
hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/dataflow/ProfiledFrameWriter.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/1943d570_f86f9712?usp=email
 :
PS13, Line 48: protected
> I haven't checked. […]
it's intentional, in NestedTupleSourceRuntimeFactory which extends this class, 
it accesses this field directly


File 
hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/job/profiling/StatsCollector.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f34f25a0_1bfebbfa?usp=email
 :
PS13, Line 85: output.writeInt(operatorStatsMap.size());
             :         for (IOperatorStats stat : operatorStatsMap.values()) {
             :             stat.writeFields(output);
             :         }
> Remind me to come back to this and whether `operatorStatsMap` needs to be 
> synchronized or not.
good point, i think this should be as well. since this is the other place where 
the thread that completes the task is accessing this structure. all the other 
accesses are from the task threads themselves.
maybe operatorStatsMap should just use a synchronizedMap?


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupOperatorNodePushable.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/16c321c4_4a8291ca?usp=email
 :
PS13, Line 36: PreclusteredGroupOperatorNodePushable
> Just checking, when profiling, does this nodepushable become 
> "ProfiledOperatorNodePushable"? how is  […]
it does indeed. basically making this operator self profiling lets the stats 
for it be passed into the nested subplan- so that the overall time in 
nextFrame() can be subtracted by all of the nested ops inside it. otherwise you 
just see all of the time taken in the subplan accumulated here


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupWriter.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/8856a301_d7062fe8?usp=email
 :
PS13, Line 66: private Runnable timingHandle = Function::identity;
> We can just simplify and remove this instance variable. […]
sure was just trying too hard to avoid yet another cast


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/util/ProfilingUtils.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/eb1ac863_4e70c014?usp=email
 :
PS13, Line 27: ctx != null
> Can `ctx != null` ever be null? I don't think so. We can remove this check.
yeah probably not, sure removed



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I516dc90b8da3a7086dc80b67946ac14f6ade0973
Gerrit-Change-Number: 20270
Gerrit-PatchSet: 14
Gerrit-Owner: Ian Maxon <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-CC: Michael Blow <[email protected]>
Gerrit-Attention: Ali Alsuliman <[email protected]>
Gerrit-Attention: Ali Alsuliman <[email protected]>
Gerrit-Attention: Michael Blow <[email protected]>
Gerrit-Comment-Date: Fri, 06 Feb 2026 16:07:49 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Ali Alsuliman <[email protected]>

Reply via email to