davsclaus opened a new issue, #1631: URL: https://github.com/apache/camel-website/issues/1631
## Summary Deep analysis of the sporadic 404 errors reported in [INFRA-27679](https://issues.apache.org/jira/browse/INFRA-27679) and #1565. The 404s are caused by three interacting problems, not a single root cause. ## Root Cause Analysis ### 1. Fastly CDN caches 404 responses (the trigger) @gansheer's testing (#1565) identified that the 404 **only** occurs when `Accept-Encoding: gzip` is in the request header. Modern browsers always send `gzip` — curl does not by default. That's why "it works with curl but fails in browsers." The origin server returns `Vary: Accept-Encoding`, so Fastly maintains **separate cache entries** per encoding. If the gzip variant gets a 404 (even transiently), Fastly caches and serves it, while the non-gzip variant may be fine. @zregvart confirmed the origin returns 200 for the same URL — the 404 is purely in the CDN cache. ### 2. Large pushes silently fail to trigger CDN purge (the cause) Daniel Gruno (INFRA) confirmed: pushes with ~15,000+ changed files produce event payloads exceeding the **256KB message queue limit**, causing the PURGE to be **silently dropped**. The CDN never learns the content changed. Every website rebuild produces a massive push because the Jenkinsfile does `git rm -r * && cp -R public/ .` — every file is marked as changed even if the content hasn't. ### 3. The site is enormous (the amplifier) | Component | Files | Size | Notes | |---|---|---|---| | Blog images (PNG/JPG) | 265 (>500KB each) | 377 MB | | | Blog GIFs | many | 129 MB | top 5 alone = 41MB | | Blog other | ~600 | ~65 MB | | | Spring schemas (1.x–4.x) | 481 | 131 MB | 1.x/2.x EOL 10+ years | | Blueprint schemas (2.x–3.x) | 199 | 86 MB | all EOL | | Other schemas | ~815 | 19 MB | | | Antora docs | 9 repos × 2-3 branches | huge | | Source content alone is ~840MB before Antora generates the documentation. The published site is likely 30,000–50,000+ files. --- ## Action Items ### A. Immediate fixes (high impact, low effort) #### A1. Request INFRA to stop Fastly from caching 404s This is the single most impactful fix. In Fastly VCL: ``` sub vcl_fetch { if (beresp.status == 404) { set beresp.cacheable = false; } } ``` @zregvart already suggested this on the Jira ticket. Even if other issues exist, a transient 404 should never be cached indefinitely — this turns brief glitches into prolonged outages. #### A2. Add a follow-up "nudge" push after the main deploy PR #1533 added an empty commit push. Verify it runs reliably after the large push. The Jenkinsfile Deploy stage should: 1. Push the main site content 2. Wait a few seconds 3. Push a tiny follow-up commit (e.g., touch a timestamp file) This second push stays well under 256KB and reliably triggers the CDN purge. #### A3. Reduce the diff size by not wiping and re-adding everything The current deploy logic (Jenkinsfile lines 96–103): ```groovy sh 'git rm -q -r *' sh "cp -R $WORKSPACE/camel-website/public/. ." sh 'git add .' ``` This marks every single file as changed even when content is identical. Instead, use rsync-like logic: ```groovy sh "rsync -a --delete --exclude='.git' --exclude='.asf.yaml' $WORKSPACE/camel-website/public/ ." sh 'git add -A' ``` Git would then only see actually-changed files. A blog-only update might change 10 files instead of 15,000. ### B. Reduce site size (high impact, moderate effort) #### B1. Remove EOL schema files The `static/schema/` directory is 236MB with 1,495 files. Most are for versions nobody uses: | Schemas | Files | Size | Status | |---|---|---|---| | Spring 1.x–2.x | 130 | 31 MB | EOL for 10+ years | | Blueprint 2.x | 110 | ~30 MB | EOL | | Spring 3.x | 87 | 55 MB | EOL | | Blueprint 3.x | 85 | 55 MB | EOL | | **Total removable** | **412** | **~170 MB** | | These XSDs are published to stable URLs that tooling may reference. A safe approach: - Keep them available at `downloads.apache.org/camel/schema/` or as a GitHub release artifact - Add `.htaccess` redirects from the old URLs to the archive location - Remove from the git-published site #### B2. Optimize blog images 265 images over 500KB total 377MB. Biggest wins: - **Convert PNGs to WebP**: 283MB of PNGs → ~70MB as WebP (75% reduction) - **Convert GIFs to MP4/WebM**: A 15MB GIF → ~1MB MP4 - **Compress existing JPGs**: 77MB → likely ~40MB with quality 85 This could cut blog size from 571MB to ~200MB. #### B3. Remove 3.x documentation Already tracked in #1302 / PR #1570. ### C. .htaccess improvements (moderate impact, low effort) #### C1. Remove the mod_deflate section The `.htaccess` has an elaborate `mod_deflate` section (lines 1599–1678) with a workaround for "mangled Accept-Encoding" headers (lines 1606–1611): ``` SetEnvIfNoCase ^(Accept-EncodXng|X-cept-Encoding|...) ... HAVE_Accept-Encoding RequestHeader append Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding ``` This is a decade-old workaround for broken proxies. With Fastly in front, the CDN handles compression — remove the entire `mod_deflate` section to eliminate potential interactions between Apache's compression and Fastly's `Vary` handling. #### C2. Remove ETag stripping Lines 1828–1833 strip ETags. With Fastly in front, ETags from the origin are useful for cache validation. Removing them forces full re-fetches on every cache miss. #### C3. Simplify cache expiration The `mod_expires` section (lines 1849–1953) sets granular expiration by MIME type. Fastly largely ignores `Expires` headers in favor of its own TTL logic. Keep only the `text/html: 0 seconds` rule and let Fastly handle the rest. ### D. Deployment architecture (long-term, higher effort) #### D1. Split static assets from generated content Host blog images and schemas from a separate origin (e.g., a static bucket or `downloads.apache.org`). The git-published site would only contain HTML, CSS, JS, and small icons — maybe 10% of current size. This fundamentally solves the "too many files changed" problem. #### D2. Incremental documentation builds The Antora playbook pulls 9 repositories × 2–3 branches each. Most branches don't change between builds. Consider caching previous Antora output and only regenerating components whose source branches have new commits. --- ## Recommended Priority 1. **Push INFRA to disable 404 caching in Fastly** (A1) — fixes the symptom permanently 2. **Fix Jenkinsfile deploy to use rsync instead of rm+cp** (A3) — reduces diff from ~15K files to actual changes 3. **Add a follow-up nudge push** (A2) — ensures CDN purge fires even for large pushes 4. **Remove old schemas (1.x, 2.x, 3.x)** (B1) — drops ~170MB / 412 files immediately 5. **Strip mod_deflate from .htaccess** (C1) — eliminates gzip/Vary interaction edge cases 6. **Optimize blog images** (B2) — longer-term but halves the site size ## References - [INFRA-27679](https://issues.apache.org/jira/browse/INFRA-27679) — Jira issue with INFRA team discussion - #1565 — GitHub issue with gzip/Accept-Encoding findings - #1302 — Remove 3.x docs - #1533 — Empty commit CDN flush - #1573 — Avoid squash to avoid large commits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
