Re: [I] Fix branch interpolation logic in regressions command [otava]

via GitHub Tue, 23 Dec 2025 14:59:21 -0800


Gerrrr commented on issue #118:
URL: https://github.com/apache/otava/issues/118#issuecomment-3688169108

Re branch names, I am slightly against "head branch" because I find it a bit
confusing and, as a user, would have look up what each of those do. I don't
think there is standard terminology, as Gitlab calls those "target branches"
and Bitbucket calls them "destination branches".

For base branch, I equally prefer base branch and main branch.

> Regarding the regressions command, what does the basic version of that do?

Unlike `analyze`, `regressions` always fetches 30 days worth of data on the
base branch and then finds regressions within the specified selector -
https://github.com/apache/otava/blob/3b3d59a787ea03749e972c3509241114b861c99e/otava/main.py#L264-L270.

It makes sense, but I wonder why don't we do this in `analyze` as well as it
seems to generally make sense. If we'd do that, it would only be valuable to
run `regressions` to analyze feature branches... however we could just move
feature branch comparison into `analyze` and remove `regressions` command.

> Then, if I understood what is the intended use case we are talking about,
my next question would be, why do we need variable interpolation for this use
case?

Consider a use case, a benchmark that you run and want to analyze across
multiple versions. One way to approach that is to have multiple identical tests
that only differ in the branch name. Another is to make branch name a variable.

Here is a query that is very similar to one of the use cases I care about:

```
tests:
aggregate_mem:
inherit: [ common ] # avoids repeating metrics definitions and
postgres-related config
query: |
SELECT e.commit,
e.commit_ts,
r.process_cumulative_rate_mean,
r.process_cumulative_rate_stderr,
r.process_cumulative_rate_diff,
r.experiment_id,
r.config_id
FROM results r
INNER JOIN configs c ON r.config_id = c.id
INNER JOIN experiments e ON r.experiment_id = e.id
WHERE e.exclude_from_analysis = false AND
e.branch = '${BRANCH}' AND
e.username = 'ci' AND
c.store = 'MEM' AND
c.cache = true AND
c.benchmark = 'aggregate' AND
c.instance_type = 'ec2i3.large'
ORDER BY e.commit_ts ASC;
```

Then, it is called like that:

```
export BRANCH=... # can be trunk, 3.4, 3.5, etc
otava analyze all --update-postgres --output regressions_only --last 10
```

`ConfigArgParse` does not expand `$BRANCH`, so I had to bring `expandvars`
back. At the same time, I agree with your take from the mailing list that it
would be nice to get rid of `expandvars` completely. Why can't we do it right
away and ConfigArgParse'ify `--branch`? The problem is with feature-branch
comparison that needs to query the time series and expand it twice with 2
different values - one for the base branch and one for the feature branch.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Fix branch interpolation logic in regressions command [otava]

Reply via email to