davsclaus opened a new issue, #1631:
URL: https://github.com/apache/camel-website/issues/1631

   ## Summary
   
   Deep analysis of the sporadic 404 errors reported in 
[INFRA-27679](https://issues.apache.org/jira/browse/INFRA-27679) and #1565. The 
404s are caused by three interacting problems, not a single root cause.
   
   ## Root Cause Analysis
   
   ### 1. Fastly CDN caches 404 responses (the trigger)
   
   @gansheer's testing (#1565) identified that the 404 **only** occurs when 
`Accept-Encoding: gzip` is in the request header. Modern browsers always send 
`gzip` — curl does not by default. That's why "it works with curl but fails in 
browsers."
   
   The origin server returns `Vary: Accept-Encoding`, so Fastly maintains 
**separate cache entries** per encoding. If the gzip variant gets a 404 (even 
transiently), Fastly caches and serves it, while the non-gzip variant may be 
fine. @zregvart confirmed the origin returns 200 for the same URL — the 404 is 
purely in the CDN cache.
   
   ### 2. Large pushes silently fail to trigger CDN purge (the cause)
   
   Daniel Gruno (INFRA) confirmed: pushes with ~15,000+ changed files produce 
event payloads exceeding the **256KB message queue limit**, causing the PURGE 
to be **silently dropped**. The CDN never learns the content changed.
   
   Every website rebuild produces a massive push because the Jenkinsfile does 
`git rm -r * && cp -R public/ .` — every file is marked as changed even if the 
content hasn't.
   
   ### 3. The site is enormous (the amplifier)
   
   | Component | Files | Size | Notes |
   |---|---|---|---|
   | Blog images (PNG/JPG) | 265 (>500KB each) | 377 MB | |
   | Blog GIFs | many | 129 MB | top 5 alone = 41MB |
   | Blog other | ~600 | ~65 MB | |
   | Spring schemas (1.x–4.x) | 481 | 131 MB | 1.x/2.x EOL 10+ years |
   | Blueprint schemas (2.x–3.x) | 199 | 86 MB | all EOL |
   | Other schemas | ~815 | 19 MB | |
   | Antora docs | 9 repos × 2-3 branches | huge | |
   
   Source content alone is ~840MB before Antora generates the documentation. 
The published site is likely 30,000–50,000+ files.
   
   ---
   
   ## Action Items
   
   ### A. Immediate fixes (high impact, low effort)
   
   #### A1. Request INFRA to stop Fastly from caching 404s
   
   This is the single most impactful fix. In Fastly VCL:
   ```
   sub vcl_fetch {
     if (beresp.status == 404) {
       set beresp.cacheable = false;
     }
   }
   ```
   @zregvart already suggested this on the Jira ticket. Even if other issues 
exist, a transient 404 should never be cached indefinitely — this turns brief 
glitches into prolonged outages.
   
   #### A2. Add a follow-up "nudge" push after the main deploy
   
   PR #1533 added an empty commit push. Verify it runs reliably after the large 
push. The Jenkinsfile Deploy stage should:
   1. Push the main site content
   2. Wait a few seconds
   3. Push a tiny follow-up commit (e.g., touch a timestamp file)
   
   This second push stays well under 256KB and reliably triggers the CDN purge.
   
   #### A3. Reduce the diff size by not wiping and re-adding everything
   
   The current deploy logic (Jenkinsfile lines 96–103):
   ```groovy
   sh 'git rm -q -r *'
   sh "cp -R $WORKSPACE/camel-website/public/. ."
   sh 'git add .'
   ```
   
   This marks every single file as changed even when content is identical. 
Instead, use rsync-like logic:
   ```groovy
   sh "rsync -a --delete --exclude='.git' --exclude='.asf.yaml' 
$WORKSPACE/camel-website/public/ ."
   sh 'git add -A'
   ```
   
   Git would then only see actually-changed files. A blog-only update might 
change 10 files instead of 15,000.
   
   ### B. Reduce site size (high impact, moderate effort)
   
   #### B1. Remove EOL schema files
   
   The `static/schema/` directory is 236MB with 1,495 files. Most are for 
versions nobody uses:
   
   | Schemas | Files | Size | Status |
   |---|---|---|---|
   | Spring 1.x–2.x | 130 | 31 MB | EOL for 10+ years |
   | Blueprint 2.x | 110 | ~30 MB | EOL |
   | Spring 3.x | 87 | 55 MB | EOL |
   | Blueprint 3.x | 85 | 55 MB | EOL |
   | **Total removable** | **412** | **~170 MB** | |
   
   These XSDs are published to stable URLs that tooling may reference. A safe 
approach:
   - Keep them available at `downloads.apache.org/camel/schema/` or as a GitHub 
release artifact
   - Add `.htaccess` redirects from the old URLs to the archive location
   - Remove from the git-published site
   
   #### B2. Optimize blog images
   
   265 images over 500KB total 377MB. Biggest wins:
   
   - **Convert PNGs to WebP**: 283MB of PNGs → ~70MB as WebP (75% reduction)
   - **Convert GIFs to MP4/WebM**: A 15MB GIF → ~1MB MP4
   - **Compress existing JPGs**: 77MB → likely ~40MB with quality 85
   
   This could cut blog size from 571MB to ~200MB.
   
   #### B3. Remove 3.x documentation
   
   Already tracked in #1302 / PR #1570.
   
   ### C. .htaccess improvements (moderate impact, low effort)
   
   #### C1. Remove the mod_deflate section
   
   The `.htaccess` has an elaborate `mod_deflate` section (lines 1599–1678) 
with a workaround for "mangled Accept-Encoding" headers (lines 1606–1611):
   ```
   SetEnvIfNoCase ^(Accept-EncodXng|X-cept-Encoding|...) ... 
HAVE_Accept-Encoding
   RequestHeader append Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding
   ```
   
   This is a decade-old workaround for broken proxies. With Fastly in front, 
the CDN handles compression — remove the entire `mod_deflate` section to 
eliminate potential interactions between Apache's compression and Fastly's 
`Vary` handling.
   
   #### C2. Remove ETag stripping
   
   Lines 1828–1833 strip ETags. With Fastly in front, ETags from the origin are 
useful for cache validation. Removing them forces full re-fetches on every 
cache miss.
   
   #### C3. Simplify cache expiration
   
   The `mod_expires` section (lines 1849–1953) sets granular expiration by MIME 
type. Fastly largely ignores `Expires` headers in favor of its own TTL logic. 
Keep only the `text/html: 0 seconds` rule and let Fastly handle the rest.
   
   ### D. Deployment architecture (long-term, higher effort)
   
   #### D1. Split static assets from generated content
   
   Host blog images and schemas from a separate origin (e.g., a static bucket 
or `downloads.apache.org`). The git-published site would only contain HTML, 
CSS, JS, and small icons — maybe 10% of current size. This fundamentally solves 
the "too many files changed" problem.
   
   #### D2. Incremental documentation builds
   
   The Antora playbook pulls 9 repositories × 2–3 branches each. Most branches 
don't change between builds. Consider caching previous Antora output and only 
regenerating components whose source branches have new commits.
   
   ---
   
   ## Recommended Priority
   
   1. **Push INFRA to disable 404 caching in Fastly** (A1) — fixes the symptom 
permanently
   2. **Fix Jenkinsfile deploy to use rsync instead of rm+cp** (A3) — reduces 
diff from ~15K files to actual changes
   3. **Add a follow-up nudge push** (A2) — ensures CDN purge fires even for 
large pushes
   4. **Remove old schemas (1.x, 2.x, 3.x)** (B1) — drops ~170MB / 412 files 
immediately
   5. **Strip mod_deflate from .htaccess** (C1) — eliminates gzip/Vary 
interaction edge cases
   6. **Optimize blog images** (B2) — longer-term but halves the site size
   
   ## References
   
   - [INFRA-27679](https://issues.apache.org/jira/browse/INFRA-27679) — Jira 
issue with INFRA team discussion
   - #1565 — GitHub issue with gzip/Accept-Encoding findings
   - #1302 — Remove 3.x docs
   - #1533 — Empty commit CDN flush
   - #1573 — Avoid squash to avoid large commits


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to