Hello,

llms.txt is a proposal that aims to standardize how LLMs can easily access
information. I think the Apache Camel website is structured well enough and
we can easily expose information so that it's easily accessible to LLMs.
The /llms.txt is similar to a sitemap, but designed for LLM consumption
with markdown content.

I see here two main benefits:
1. When LLMs are trained, they can easily crawl and index our documentation
through the standardized llms.txt format
2. The llms.txt and markdown pages can be used by coding agents like Gemini
CLI, Claude Code, Cursor, etc. directly to provide accurate Apache Camel
information

Implementation attempt on camel-website:
- After Antora generates HTML pages, a Gulp task converts them to markdown
- The public folder now contains both advice-with.html and
advice-with.html.md for every page
- Markdown files are cleaned up - only the important article section is
extracted (no nav, headers, footers)
- An /llms.txt file is generated at the root with an overview and structure
Results:
- 5,355+ markdown pages generated automatically during build
- Almost all HTML pages can be accessed as markdown by appending .md to the
URL (the .md after .html is just a proposal, there aren't best practices
around it, any input is welcome)

This way html documentation like
https://camel.apache.org/components/next/languages/simple-language.html and
markdown content
https://camel.apache.org/components/next/languages/simple-language.html.md will
be exposed.

This should make Apache Camel documentation much more accessible to AI
tools and future LLM training.

Draft Pull Request: https://github.com/apache/camel-website/pull/1437
Any feedback or suggestions are welcome!

Regards,
Federico

Reply via email to