rodrigoluizs commented on issue #8381:
URL:
https://github.com/apache/incubator-devlake/issues/8381#issuecomment-2898676920
Thanks for the feedback and questions, @Startrekzky and @klesh!
> **Which tables will the `is_bot` flag be added, only table accounts?**
Yes, the `is_bot` flag would be added only to the `accounts` table. In
addition to that, my plan was to add a new column to `project_pr_metrics`
called `is_authored_by_bot`, since I believe that name better reflects the
context of the pull request entity and makes the intent clearer when querying.
> **Which cases will the `is_bot` take effect? I'm not worrying about the
dashboard queries but the plugin's internal processing logic, for instance, the
calculation in the DORA plugin to generate table `project_pr_metrics` might
also take the bot PRs or commits. If so, the plugin's processing logic needs to
be updated as well after the `is_bot` flag is introduced.**
You’re absolutely right — for this to work reliably, the DORA plugin’s
processing logic that populates `project_pr_metrics` would also need to be
updated to propagate the `is_bot` value from the author account into the
`is_authored_by_bot` field during metric calculation. That way, downstream
queries (like Grafana dashboards) can filter without needing to join back to
the `accounts` table.
> **For bot detection, we could use the environment variables with default
values to achieve both auto + manual control**
Using environment variables to control the bot name patterns sounds like a
great way to support both automatic detection and manual overrides — I’ll
incorporate that into the plan as well.
> **For Grafana dashboards, if the `is_bot` is added, updating the SQL in
the existing dashboard would be my choice.**
That was also my preferred approach — nice to hear that you agree!
Just a small note: my intention was to filter on the new
`is_authored_by_bot` column in `project_pr_metrics`.
---
### Follow-up
Based on your input, my current understanding of the preferred direction is:
1. **Filtering approach:** Use **Option 2** — introduce a flag (`is_bot` in
`accounts`, and `is_authored_by_bot` in `project_pr_metrics`)
2. **Bot detection:** Combine **automatic detection** with **manual
override**, using an environment variable to define bot name patterns
3. **Dashboard behavior:**
- Update existing dashboards to support filtering based on
`is_authored_by_bot`
- I’d like your feedback on the idea to introduce an **`include_bots`
variable** to control whether bot-authored changes should be filtered in the
queries or not.
The idea here is to avoid introducing a breaking change and to keep the
DORA metrics the same for users who do not explicitly opt in to this new
feature.
---
Does this align with how you both see it?
Do you agree with the proposed column names — `is_bot` for the `accounts`
table and `is_authored_by_bot` for `project_pr_metrics`?
Additionally, I’d appreciate some clarification on how new environment
variables can be introduced in DevLake, as I’m not very familiar with that part
of the project yet.
Just want to make sure we’re on the same page before moving forward with an
RFC or implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]