This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new fe71a0f3 chore(ci): run lychee link-check offline; drop dead sandbox
network domains (#501)
fe71a0f3 is described below
commit fe71a0f3cada16d3e904ec8a835a5b0d6805561a
Author: Jarek Potiuk <[email protected]>
AuthorDate: Thu Jun 11 23:35:00 2026 +0200
chore(ci): run lychee link-check offline; drop dead sandbox network domains
(#501)
* chore(ci): run lychee link-check offline; drop dead sandbox network
domains
The `lychee` prek hook links macOS SecureTransport (`native-tls`), whose TLS
handshake fails through the secure-agent sandbox's CONNECT proxy on macOS 26
(`OSStatus -26276`) even though the certs are valid, there is no MITM, and
trustd is reachable — so online external-link checking cannot pass
in-sandbox.
`enableWeakerNetworkIsolation` no longer rescues it on this OS.
Switch the hook to offline mode (`offline = true` in `.lychee.toml`): it now
validates only local cross-file and anchor references, which is the in-repo
reference integrity this hook is really for. External-URL liveness was flaky
and rate-limited anyway (hence the long ASF-infra `exclude` list) and is no
longer checked anywhere.
With external link-checking gone, the wildcard link-target domains that were
allowlisted purely so lychee could reach them (`*.apache.org`,
`*.anthropic.com`,
`*.claude.com`, `*.mitre.org`, `*.nist.gov`, `*.github.io`,
`gist.github.com`,
`astral.sh`, `json.schemastore.org`, `lychee.cli.rs`, `sdkman.io`) are dead
weight — drop them from the sandbox allowlist. Kept: `*.crates.io` +
`static.rust-lang.org` (still needed to build lychee) and
`enableWeakerNetworkIsolation` (gh / gcloud / Go-tool TLS, per the schema).
- .lychee.toml: offline = true; header rewritten to local-references-only
- .claude/settings.json: drop the 11 dead lychee link-target domains
- docs/setup/secure-agent-setup.md (isolation-setup template): same domain
removal; add ~/.rustup + ~/.cargo write/read and static.rust-lang.org so a
fresh in-sandbox setup can actually build lychee's toolchain; replace the
excludedCommands/TLS workaround note with an offline-mode explanation
Generated-by: Claude Code (Opus 4.8)
* fix(sandbox-lint): drop the dead lychee domains from the baseline too
The sandbox-lint M.29 invariant requires `.claude/settings.json` and
`tools/sandbox-lint/expected.json` to stay in lockstep (two files, two
edits, one review surface). The prior commit removed the 11 dead lychee
link-target domains from the live settings but not the baseline, so the
`sandbox-lint` CLI and its `test_baseline_file_matches_live_settings` /
`test_main_exits_zero_on_repo` tests failed in CI. Mirror the same
removal in expected.json.
Generated-by: Claude Code (Opus 4.8)
---
.claude/settings.json | 13 +------------
.lychee.toml | 24 ++++++++++++++++++++++--
docs/setup/secure-agent-setup.md | 34 +++++++++++++++++++++++++++-------
tools/sandbox-lint/expected.json | 13 +------------
4 files changed, 51 insertions(+), 33 deletions(-)
diff --git a/.claude/settings.json b/.claude/settings.json
index 63fa217f..91db81e3 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -43,18 +43,7 @@
"cveawg.mitre.org",
"oauth2.googleapis.com",
"gmail.googleapis.com",
- "*.crates.io",
- "*.apache.org",
- "*.anthropic.com",
- "*.claude.com",
- "*.mitre.org",
- "*.nist.gov",
- "*.github.io",
- "gist.github.com",
- "astral.sh",
- "json.schemastore.org",
- "lychee.cli.rs",
- "sdkman.io"
+ "*.crates.io"
],
"enableWeakerNetworkIsolation": true
}
diff --git a/.lychee.toml b/.lychee.toml
index 98985137..07a5aed2 100644
--- a/.lychee.toml
+++ b/.lychee.toml
@@ -1,9 +1,13 @@
# Lychee link checker config for apache/airflow-steward.
#
-# Validates every link in markdown / rst / .md.j2 files:
+# Runs in OFFLINE mode (see `offline` below): validates only *local*
+# references in markdown / rst / .md.j2 files:
# * cross-file file existence — `[text](other.md)`
# * cross-file fragments — `[text](other.md#anchor)`
-# * external URLs — HTTP 2xx
+# * same-file fragments — `[text](#anchor)`
+# External `http(s)://` URLs are intentionally NOT fetched — see the
+# `offline` note below for why. Remote-link liveness is not checked
+# anywhere; the link check exists to keep in-repo references intact.
#
# Run via prek (locally and in CI) as the `lychee` hook in
# `.pre-commit-config.yaml` — prek installs lychee itself, so no local
@@ -21,6 +25,22 @@
# `lychee-action`. The v0.23.x boolean form (`true`) no longer parses.
include_fragments = "anchor-only"
+# Offline mode — check only local file/anchor references, never fetch
+# remote URLs. Two reasons:
+# 1. Scope: this hook's job is in-repo reference integrity, not
+# external-link liveness (which is flaky and rate-limited — note
+# the long `exclude` list of ASF infra hosts below that existed
+# purely to tame online checking).
+# 2. Sandbox compatibility: the cargo/brew lychee links macOS
+# SecureTransport (`native-tls`), whose TLS handshake fails
+# through the secure-agent sandbox's CONNECT proxy on macOS 26
+# (`OSStatus -26276`) even though certs are valid. Offline mode
+# makes no network calls, so the hook passes cleanly in-sandbox.
+# The network-related settings below (timeout / retry / accept /
+# exclude / cache) are dormant while offline = true, kept for
+# reference / a future opt-in online check.
+offline = true
+
# Concurrency cap — kept moderate to avoid being rate-limited by GitHub.
max_concurrency = 14
diff --git a/docs/setup/secure-agent-setup.md b/docs/setup/secure-agent-setup.md
index 07037183..f5277c2c 100644
--- a/docs/setup/secure-agent-setup.md
+++ b/docs/setup/secure-agent-setup.md
@@ -355,6 +355,18 @@ below, annotated.
{
"sandbox": {
"enabled": true,
+ // The `lychee` link-check hook runs in OFFLINE mode (`offline =
+ // true` in `.lychee.toml`): it validates only local cross-file and
+ // anchor references and never fetches remote URLs, so it makes no
+ // network calls and needs no in-sandbox TLS at runtime. This
+ // sidesteps a macOS-26 issue where the sandbox's CONNECT proxy is
+ // incompatible with SecureTransport (the `native-tls` stack the
+ // cargo/brew lychee links): online link checks fail every external
+ // URL with `OSStatus -26276` even though the certs are valid and
+ // `enableWeakerNetworkIsolation` is set. Building lychee still
+ // needs the rust toolchain (see the `~/.rustup`/`~/.cargo` +
+ // `*.crates.io`/`static.rust-lang.org` entries below); only its
+ // *runtime* network use is eliminated.
"filesystem": {
"denyRead": ["~/"], // default-deny the entire home dir for
Bash subprocesses
"allowRead": [
@@ -364,6 +376,8 @@ below, annotated.
"~/.config/gh/", // gh CLI auth (token in hosts.yml)
"~/.cache/", // dev tool caches (uv HTTP cache, prek
logs, ruff/mypy caches)
"~/.local/share/uv/", // uv's tool venvs (prek, etc.)
+ "~/.rustup/", // rustup toolchains (the `lychee` rust
hook builds against them)
+ "~/.cargo/", // cargo registry + the lychee binary
the rust hook installs
"~/.local/bin/", // uv-installed tool entry points
"~/.config/apache-magpie/", // Gmail OAuth refresh token (oauth-draft
tool)
"~/.gnupg/", // gpg keys (commit signing)
@@ -371,7 +385,9 @@ below, annotated.
],
"allowWrite": [
"~/.cache/", // uv lock files, prek log + state,
ruff/mypy caches
- "~/.local/share/uv/" // uv's tool venvs (prek installs new
hook envs here)
+ "~/.local/share/uv/", // uv's tool venvs (prek installs new
hook envs here)
+ "~/.rustup/", // rustup writes settings.toml +
downloaded toolchains (first run of the `lychee` rust hook)
+ "~/.cargo/" // cargo registry cache + the compiled
lychee binary
]
},
"network": {
@@ -382,12 +398,16 @@ below, annotated.
"lists.apache.org", "dist.apache.org", "downloads.apache.org",
"archive.apache.org",
"cveprocess.apache.org", "cve.org", "www.cve.org", "cveawg.mitre.org",
"oauth2.googleapis.com", "gmail.googleapis.com",
- // Added with the `lychee` link-check prek hook: the hosts the
- // framework's own docs link to (so lychee passes in-sandbox)
- // plus `*.crates.io` (so the rust hook can `cargo install` lychee).
- "*.crates.io", "*.apache.org", "*.anthropic.com", "*.claude.com",
- "*.mitre.org", "*.nist.gov", "*.github.io", "gist.github.com",
- "astral.sh", "json.schemastore.org", "lychee.cli.rs", "sdkman.io"
+ // `*.crates.io` + `static.rust-lang.org` let the `lychee` rust
+ // hook bootstrap a rustup toolchain and `cargo install` lychee
+ // on first run (rustup downloads the toolchain from
+ // static.rust-lang.org; crate deps come from crates.io). These
+ // are the ONLY hosts lychee needs: it runs offline (see
+ // `.lychee.toml`), so it never fetches the external URLs the
+ // docs link to — the wildcard link-target hosts that used to
+ // live here (`*.apache.org`, `*.nist.gov`, `lychee.cli.rs`, …)
+ // were removed when the hook went offline.
+ "*.crates.io", "static.rust-lang.org"
],
// Lets native-TLS CLI tools (lychee — and, per the schema, gh /
// gcloud / terraform) verify TLS through the sandbox's
diff --git a/tools/sandbox-lint/expected.json b/tools/sandbox-lint/expected.json
index 63fa217f..91db81e3 100644
--- a/tools/sandbox-lint/expected.json
+++ b/tools/sandbox-lint/expected.json
@@ -43,18 +43,7 @@
"cveawg.mitre.org",
"oauth2.googleapis.com",
"gmail.googleapis.com",
- "*.crates.io",
- "*.apache.org",
- "*.anthropic.com",
- "*.claude.com",
- "*.mitre.org",
- "*.nist.gov",
- "*.github.io",
- "gist.github.com",
- "astral.sh",
- "json.schemastore.org",
- "lychee.cli.rs",
- "sdkman.io"
+ "*.crates.io"
],
"enableWeakerNetworkIsolation": true
}