moonming opened a new pull request, #2016:
URL: https://github.com/apache/apisix-website/pull/2016

   ## Summary
   
   Reduce the sitemap from ~5,200 URLs to ~2,700 by filtering out redundant 
versioned documentation pages, development docs, and low-value pages. Update 
robots.txt to match.
   
   ## Problem
   
   The sitemap includes every versioned doc page across 7 projects x 6 versions 
(3.10-3.15) + next. For example, `/docs/apisix/getting-started/` (latest) and 
`/docs/apisix/3.14/getting-started/` (old version) both appear. This wastes 
crawl budget and causes duplicate content confusion.
   
   Additionally, `/search`, `/blog/tags/`, and `/blog/page/` were being 
included in the sitemap despite being low-value pages.
   
   ## Changes
   
   ### 1. Sitemap merge script (`scripts/update-sitemap-loc.js`)
   
   Added URL filtering during post-build sitemap merge. Excludes:
   - `/docs/<project>/<version>/` - versioned doc pages
   - `/docs/<project>/next/` - unreleased dev docs
   - `/search`, `/blog/tags/`, `/blog/page/`
   
   Unversioned latest doc paths (e.g. `/docs/apisix/getting-started/`) are kept.
   
   ### 2. robots.txt (`website/static/robots.txt`)
   
   Added Disallow rules for all versioned doc paths, next docs, search, blog 
tags, and blog pagination across both locales. Ensures robots.txt and sitemap 
send consistent signals.
   
   ## Expected result
   
   - EN sitemap: ~2,638 -> ~1,360 URLs (~48% reduction)
   - ZH sitemap: ~2,620 -> ~1,340 URLs (~49% reduction)
   - Remaining URLs are high-value: latest docs, blog posts, main pages


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to