This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 87ee836 fix(skill-validator): preserve blank lines inside block
scalars in parse_frontmatter (#238)
87ee836 is described below
commit 87ee836c08952e0e8e273fbc3e2aace57ba5f0cd
Author: André Ahlert <[email protected]>
AuthorDate: Wed May 20 05:19:03 2026 -0300
fix(skill-validator): preserve blank lines inside block scalars in
parse_frontmatter (#238)
`parse_frontmatter` treated any blank line as a terminator for the
current key, so a block scalar with a paragraph break (a blank line
between two indented paragraphs in a `|` or `>` value) silently lost
everything after the first paragraph.
Two consequences:
1. `validate_frontmatter` measures `len(fm["description"]) +
len(fm["when_to_use"])` against `MAX_METADATA_CHARS` (the
Claude Code truncation budget). With the dropped paragraphs the
measurement was too small, so a frontmatter that actually exceeds
the budget could pass the check.
2. `validate_principle_compliance` and `validate_trigger_preservation`
only saw the truncated string, missing forbidden patterns or
trigger phrasing that lived in the dropped paragraphs.
No existing SKILL.md hit the bug, so it was latent. The
`init_skill.py` scaffolding emits multi-line description and
when_to_use blocks, and any author writing a two-paragraph
description would have been silently mis-validated.
Fix: in real YAML, a blank line inside a block scalar is part of the
value, not a terminator. Only a new top-level key finalises the
current value. Append blank lines to the value lines and let
`.strip()` at finalisation discard leading/trailing blanks so
single-line values are unaffected.
Adds a regression test that asserts a `|` block scalar with an
internal blank line keeps both paragraphs.
---
.../src/skill_validator/__init__.py | 11 ++++++----
tools/skill-validator/tests/test_validator.py | 24 ++++++++++++++++++++++
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/tools/skill-validator/src/skill_validator/__init__.py
b/tools/skill-validator/src/skill_validator/__init__.py
index a93e034..3b16ffe 100644
--- a/tools/skill-validator/src/skill_validator/__init__.py
+++ b/tools/skill-validator/src/skill_validator/__init__.py
@@ -291,12 +291,15 @@ def parse_frontmatter(text: str) -> dict[str, str] | None:
# Strip trailing whitespace but keep leading (for folded scalars)
line = raw_line.rstrip()
- # Empty line ends a folded scalar
+ # Blank line: in real YAML, a blank line inside a block scalar
+ # is part of the value, not a terminator. Only a new top-level
+ # key finalises the current value. Preserve the blank so
+ # multi-paragraph descriptions are measured and validated in
+ # full; a trailing/leading blank is removed by `.strip()` at
+ # finalisation, so single-line values are unaffected.
if line == "":
if current_key is not None:
- result[current_key] = "\n".join(current_value_lines).strip()
- current_key = None
- current_value_lines = []
+ current_value_lines.append("")
continue
# New top-level key?
diff --git a/tools/skill-validator/tests/test_validator.py
b/tools/skill-validator/tests/test_validator.py
index 8189ce8..baa572a 100644
--- a/tools/skill-validator/tests/test_validator.py
+++ b/tools/skill-validator/tests/test_validator.py
@@ -78,6 +78,30 @@ class TestParseFrontmatter:
assert "First line" in fm["description"]
assert "Second line" in fm["description"]
+ def test_block_scalar_preserves_internal_blank_line(self) -> None:
+ """Blank lines inside a ``|`` block scalar are part of the value.
+
+ Regression: the parser used to treat any blank line as a
+ terminator, silently dropping everything after the first
+ paragraph break. That made ``MAX_METADATA_CHARS`` measurement
+ and principle/trigger validation operate on truncated text.
+ """
+ text = (
+ "---\n"
+ "name: my-skill\n"
+ "description: |\n"
+ " Paragraph one.\n"
+ "\n"
+ " Paragraph two, which used to be dropped.\n"
+ "license: Apache-2.0\n"
+ "---\n"
+ )
+ fm = parse_frontmatter(text)
+ assert fm is not None
+ assert "Paragraph one." in fm["description"]
+ assert "Paragraph two" in fm["description"]
+ assert fm["license"] == "Apache-2.0"
+
def test_missing_frontmatter(self) -> None:
assert parse_frontmatter("# no frontmatter\n") is None