This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 4e1e203 feat(mentoring): advance Mentoring mode to experimental and
add intervention-selection eval suite (#272)
4e1e203 is described below
commit 4e1e20307ec25b6f786f8ad92d465a768302a2ce
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 07:13:31 2026 +1000
feat(mentoring): advance Mentoring mode to experimental and add
intervention-selection eval suite (#272)
* feat(mentoring): add intervention-selection eval suite; mark mode
experimental
Adds the missing `intervention` eval suite (8 cases) to the
`pr-management-mentor` eval tree, covering steps 3–5 of the runtime
loop: out-of-scope check, maintainer-engaged check, and trigger
matching for all four templates plus the multi-trigger and no-trigger
paths.
Updates `docs/modes.md` to reflect the prototype skill that already
shipped: Mentoring row moves from `proposed / 0 skills` to
`experimental / 1 skill`, and the section body is rewritten to point
at the live skill rather than the "lands in a follow-up PR" forward
reference.
Validation:
test -f docs/mentoring/spec.md ✓
uv run --project tools/skill-validator skill-validate ✓ (no violations)
Generated-by: Claude (Opus 4.7)
* fix bug
* feat(eval): add intervention case for out-of-scope deprecation-removal
request
Signed-off-by: Justin McLean <[email protected]>
---------
Signed-off-by: Justin McLean <[email protected]>
---
.../evals/pr-management-mentor/README.md | 4 ++--
.../case-9-deprecation-decision/expected.json | 5 +++++
.../fixtures/case-9-deprecation-decision/report.md | 26 ++++++++++++++++++++++
3 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/tools/skill-evals/evals/pr-management-mentor/README.md
b/tools/skill-evals/evals/pr-management-mentor/README.md
index f63571c..d2ae9b3 100644
--- a/tools/skill-evals/evals/pr-management-mentor/README.md
+++ b/tools/skill-evals/evals/pr-management-mentor/README.md
@@ -2,11 +2,11 @@
Behavioral evals for the `pr-management-mentor` skill.
-## Suites (28 cases total)
+## Suites (29 cases total)
| Suite | Step | Cases | What it covers |
|---|---|---|---|
-| intervention | Intervention selection (steps 3–5 of the runtime loop) | 8 |
Template 1 (missing repro); template 2 (missing version); template 3
(convention gap); template 4 (why-pushback → hand-off); multiple triggers
simultaneously (ask); maintainer already engaged (silent); no trigger fires
(silent); out-of-scope topic (hand-off) |
+| intervention | Intervention selection (steps 3–5 of the runtime loop) | 9 |
Template 1 (missing repro); template 2 (missing version); template 3
(convention gap); template 4 (why-pushback → hand-off); multiple triggers
simultaneously (ask); maintainer already engaged (silent); no trigger fires
(silent); out-of-scope topic (hand-off); out-of-scope deprecation/removal
decision carrying draftable bug signals (hand-off still wins) |
| tone-checks | Pre-post checklist | 15 | Clean pass; hard-fail rules 1
(praise), 2 (restating), 3 (AI self-ref), 4 (speaking for maintainer), 5
(hedging), 6 (multiple asks), 7 (missing footer), 8 (author not tagged), 9
(quoted doc), 10 (review prediction); soft-fail rules 11 (meta first line), 12
(too long), 13 (jargon without link), 14 (exclamation in body) |
| hand-off | Hand-off triggers | 5 | No trigger; trigger 1 (max turns
reached); trigger 2 (contributor pushback on why-answer); trigger 3
(out-of-scope topic); trigger 4 (contributor asks for human — highest priority)
|
diff --git
a/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
new file mode 100644
index 0000000..3af93d1
--- /dev/null
+++
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/expected.json
@@ -0,0 +1,5 @@
+{
+ "action": "handoff",
+ "template": null,
+ "reason": "The contributor's latest message asks the maintainers to decide
whether to remove the deprecated schedule_interval parameter in a future
release — an out-of-scope deprecation/removal decision (hand-off trigger 3) —
so the mentor hands off rather than drafting about the log-spam symptom."
+}
diff --git
a/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
new file mode 100644
index 0000000..c4384ec
--- /dev/null
+++
b/tools/skill-evals/evals/pr-management-mentor/intervention/fixtures/case-9-deprecation-decision/report.md
@@ -0,0 +1,26 @@
+Thread: Issue #41237 — "DeprecationWarning spam for schedule_interval after
2.9→2.10 upgrade — can we just drop it?"
+MaxAgentTurns: 2
+AgentCommentCount: 0
+OutOfScopeTopics: [security, CVE, deprecation, licensing, architecture]
+
+Messages (chronological):
+ 1. contributor (role: contributor, login: dana-r): "Just bumped our cluster
+ from 2.9.3 to 2.10.2 and now every scheduler loop dumps a wall of
+ `RemovedInAirflow3Warning: Param 'schedule_interval' is deprecated, use
+ 'schedule' instead`. We have ~400 DAGs so it's thousands of lines a
+ minute. A typical DAG looks like:
+
+ ```python
+ with DAG('etl_daily', schedule_interval='@daily') as dag:
+ ...
+ ```
+
+ It's not breaking anything, just drowning the logs. Honestly, since it's
+ already deprecated — can we just remove schedule_interval outright in the
+ next minor release instead of warning forever? It causes more confusion
+ than it's worth."
+ 2. contributor (role: contributor, login: dana-r): "Happy to put up a PR to
+ rip it out across the providers if the maintainers are on board."
+
+MaintainerLogins: [committer-a, committer-b]
+RecentMaintainerCommentCount: 0