kevinjqliu opened a new issue, #178: URL: https://github.com/apache/datafusion-site/issues/178
## Problem Some blog post URLs redirect through an internal `/output/` path that should never be exposed to users. **Working URLs** (slugs without dots): - https://datafusion.apache.org/blog/2026/03/31/writing-table-providers → 301 → `/blog/2026/03/31/writing-table-providers/` ✅ **Broken URLs** (slugs with version numbers): - https://datafusion.apache.org/blog/2026/04/18/datafusion-comet-0.15.0 → 301 → `/blog/output/2026/04/18/datafusion-comet-0.15.0/` ❌ - https://datafusion.apache.org/blog/2026/04/02/datafusion-53.0.0 → 301 → `/blog/output/2026/04/02/datafusion-53.0.0/` ❌ - https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0 → 301 → `/blog/output/2024/07/24/datafusion-40.0.0/` ❌ Every blog post with a version number in its slug (e.g. `datafusion-comet-0.15.0`) is affected. The `/output/` URL still serves the page, but is not the canonical URL and causes subtle issues — for example, giscus CSP was not applying correctly on the `/output/` path (see https://github.com/apache/datafusion-site/issues/80#issuecomment-4416405278). ## Root Cause There is a mismatch between `publish-site.yml` and `.asf.yaml`: - `.asf.yaml` on `asf-site` declares `subdir: blog`, telling ASF infrastructure to serve content from a `blog/` subdirectory. - `publish-site.yml` does **not** set `output: 'blog'`, so the pelican action defaults to `output: 'output'`, putting built content into `output/` instead of `blog/`. ```yaml # .asf.yaml (expects blog/) publish: whoami: asf-site subdir: blog # publish-site.yml (produces output/) - uses: apache/infrastructure-actions/pelican@main with: destination: 'asf-site' gfm: 'false' # output: 'blog' <-- MISSING ``` The staging workflow (`stage-site.yml`) already has `output: 'blog'` and works correctly. To bridge this mismatch, `.htaccess` on the `asf-site` branch has rewrite rules that internally map requests from `blog/` to `output/`: ```apache RewriteCond %{REQUEST_URI} !/output/ RewriteRule ^(.*)$ output/$1 [L] ``` These rules also add a trailing-slash redirect for extensionless URLs, but skip URLs that "look like files": ```apache RewriteCond %1 !\.[^./]+$ ``` The regex `\.[^./]+$` matches any dot followed by non-dot/non-slash characters at the end of the URL. This incorrectly matches `.0` in version-number slugs like `datafusion-comet-0.15.0`, causing the trailing-slash redirect to be skipped. Apache's `mod_dir` then adds the trailing slash itself, but exposes the internal `output/` prefix in the redirect Location header. ## Fix Add `output: 'blog'` to `publish-site.yml` to match `stage-site.yml`: ```yaml - uses: apache/infrastructure-actions/pelican@main with: destination: 'asf-site' gfm: 'false' output: 'blog' ``` This puts build output into `blog/` on the `asf-site` branch, matching what `.asf.yaml` expects. The `.htaccess` rewrite rules become unnecessary. ## Follow-up (after deploy) After the first successful deploy with this fix, a separate PR to the `asf-site` branch should: 1. Remove the stale `output/` directory (all content will now be in `blog/`). 2. Simplify `.htaccess` to remove the rewrite rules, keeping only the CSP directive: ```apache SetEnv CSP_PROJECT_DOMAINS "https://giscus.app" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
