Hello here,

I have run a few PR triage sessions with Airflow and started to gather
stats and analysis of how well triage sessions are "just accept what the
agent proposes" vs. "need improvement in heuristics" and "genuine human
judgment needed".

You can see the first result here [1], which also some proposals for
heuristic improvements that I am applying now.

Two PRs as result [2] - update the SKILL to self analyse sessions and to
propose heuristics improvement [3] the PR with the improvements


Once we gather more data, we might start proposing that some of those
triage skills be converted into fully automated triage actions - for
example by generating deterministic Python scripts reflecting the SKILLs
actions.

This way those triage actions could simply be performed as part of CI. We
can keep them updated by having humans run the triage sessions and when any
of the SKILLs updates, the deterministic Python scripts might also be
updated.

This might create a nice self-improvement loop where after every triage
session, you can identify improvements, and areas suitable for full
automation - making triage better with every loop.

Almost every day I find new ways this whole process feeds itself, with
humans who use it to teach the SKILLs to be better.

Jarek

[1] Mode-D stats Gist
https://gist.github.com/potiuk/c419315f2ac318f74a3e63134757723a
[2] PR triage stats persistence
https://github.com/apache/airflow-steward/pull/343
[3] Improvements to PR triage heuristics
https://github.com/apache/airflow-steward/pull/344

J.

Reply via email to