Hello here, I have run a few PR triage sessions with Airflow and started to gather stats and analysis of how well triage sessions are "just accept what the agent proposes" vs. "need improvement in heuristics" and "genuine human judgment needed".
You can see the first result here [1], which also some proposals for heuristic improvements that I am applying now. Two PRs as result [2] - update the SKILL to self analyse sessions and to propose heuristics improvement [3] the PR with the improvements Once we gather more data, we might start proposing that some of those triage skills be converted into fully automated triage actions - for example by generating deterministic Python scripts reflecting the SKILLs actions. This way those triage actions could simply be performed as part of CI. We can keep them updated by having humans run the triage sessions and when any of the SKILLs updates, the deterministic Python scripts might also be updated. This might create a nice self-improvement loop where after every triage session, you can identify improvements, and areas suitable for full automation - making triage better with every loop. Almost every day I find new ways this whole process feeds itself, with humans who use it to teach the SKILLs to be better. Jarek [1] Mode-D stats Gist https://gist.github.com/potiuk/c419315f2ac318f74a3e63134757723a [2] PR triage stats persistence https://github.com/apache/airflow-steward/pull/343 [3] Improvements to PR triage heuristics https://github.com/apache/airflow-steward/pull/344 J.
