While it is not standard per se, it is quickly becoming a common approach.
And as you noted per MCP site, they have the llms-full.txt, they also have
https://modelcontextprotocol.io/llms.txt


On Wed, Sep 10, 2025 at 14:48 Bjørn Jørgensen <[email protected]>
wrote:

> The protocol for this llms.txt is not a standard yet.
>
> "*To clarify, llms.txt is not meant to be a duplication of the full
> documentation.*"
> Some like the Model Context Protocol (MCP)
> <https://modelcontextprotocol.io/tutorials/building-mcp-with-llms> site
> have their full web page in the llms page.
> https://modelcontextprotocol.io/llms-full.txt
>
>
> https://modelcontextprotocol.io/tutorials/building-mcp-with-llms
>
> ons. 10. sep. 2025 kl. 22:27 skrev Allison Wang
> <[email protected]>:
>
>> Thanks Dongjoon for raising these concerns. I agree with your point that
>> it’s worth making the lightweight manifest scope explicit in the SPIP so we
>> have a systematic guarantee it stays small (under 10MB).
>>
>> To clarify, llms.txt is not meant to be a duplication of the full
>> documentation. Instead, it acts more like an index or table of contents
>> page: a small, curated manifest that points to existing canonical docs.
>> The intent is to help AI-assisted tools and LLMs discover the right entry
>> points, not to repackage the entire documentation set.
>>
>> For example this DuckDB's llms.txt
>> <https://duckdb.org/docs/stable/llms.txt> file is around 30KB in
>> size. Spark’s manifests will likely be a bit larger given the broader scope
>> of APIs and documentation, but they should still remain lightweight
>> link-only markdown files and well under the 10MB limit, even across
>> multiple versions and language scopes.
>>
>> On Wed, Sep 10, 2025 at 8:47 AM Wenchen Fan <[email protected]> wrote:
>>
>>> This should just be a llm-facing index page of Spark docs? Given the
>>> amount of APIs Spark provides today, I think this index page should be
>>> useful to humans as well.
>>>
>>> On Wed, Sep 10, 2025 at 10:46 PM Dongjoon Hyun <[email protected]>
>>> wrote:
>>>
>>>> Thank you, Allison and Hyukjin.
>>>>
>>>> IIUC, this proposal is not about a single file. SPIP already exposes
>>>> multiple files which may increase our documentation and website size twice
>>>> (or more in the worst case) because it's simply a duplication of the
>>>> content. If we start to use AI tools to generate these LLMS.txt files, it
>>>> could be much bigger than the original.
>>>>
>>>> *** From SPIP ***
>>>> - [PySpark (Python)](
>>>> https://spark.apache.org/docs/latest/api/python/llms.txt)
>>>> - [Scala](https://spark.apache.org/docs/latest/api/scala/llms.txt)
>>>> - [4.0.0 docs hub](
>>>> https://archive.apache.org/dist/spark/docs/4.0.0/llms.txt)
>>>> ***
>>>>
>>>> Since the size of Apache Spark 4.1.0-preview1 documentation is 1.2GB,
>>>> could you propose to limit the total size of newly added llms.txt files
>>>> under 10MB always systematically, Allison? If we don't have full
>>>> controllability, this duplication will break the ASF Spark website like
>>>> last year. We already inevitably archived old Spark documents from the
>>>> original website location to "https://archive.apache.org/dist/spark/";
>>>> due to the CI outage.
>>>>
>>>> $ du -h 4.1.0-preview1 | tail -n1
>>>> 1.2G 4.1.0-preview1
>>>>
>>>> The bottom line is that we need to have a clear hard limit for this
>>>> newly proposed duplication for machine-friendly metadata. If we have a
>>>> systematic way to control the upper bound which is less than 10MB per Spark
>>>> version in total (now and forever), it sounds like a good addition.
>>>>
>>>> Thanks,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Sep 9, 2025 at 7:19 PM Allison Wang <[email protected]>
>>>> wrote:
>>>>
>>>>> Yes, that’s right. It’s essentially just one markdown file to start
>>>>> with, and we can add more later for language or version specific files if
>>>>> needed.
>>>>>
>>>>> On Tue, Sep 9, 2025 at 4:32 PM Hyukjin Kwon <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> so it's basically adding one text file for llm, right? I think it's a
>>>>>> good idea.
>>>>>>
>>>>>> On Tue, 9 Sept 2025 at 10:22, Allison Wang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I’d like to propose adding llms.txt files to the Spark
>>>>>>> documentation.
>>>>>>>
>>>>>>> As more users rely on AI-assisted tools and LLMs to learn, write
>>>>>>> Spark code, and troubleshoot issues, it’s increasingly important that 
>>>>>>> these
>>>>>>> tools point back to the up-to-date official documentation. This
>>>>>>> will help improve code generation quality and make new Spark features
>>>>>>> easier to discover. The emerging llms.txt convention
>>>>>>> <https://llmstxt.org/> provides a lightweight way to curate
>>>>>>> LLM-friendly manifests of key documentation links.
>>>>>>>
>>>>>>> Would love to hear your feedback!
>>>>>>> SPIP:
>>>>>>> https://docs.google.com/document/d/1tRYdNTrIs8-JTgDthQ-7kcxEG7S91mNUVmUOfevW-cE/edit?tab=t.0#heading=h.wq8o4rl94dvr
>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-53528
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Allison
>>>>>>>
>>>>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++Norge?entry=gmail&source=g>
> Norge
> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++Norge?entry=gmail&source=g>
>
> +47 480 94 297
>

Reply via email to