onestardao opened a new issue, #62291: URL: https://github.com/apache/airflow/issues/62291
### Description I would like to propose a documentation example (and optionally a small example DAG) that demonstrates how to debug RAG / LLM pipelines in Airflow using a structured 16-problem failure map. The idea is to help users who orchestrate RAG data flows with Airflow DAGs diagnose *where* in the pipeline things go wrong (ingestion, chunking, embeddings, vector store, retrieval, LLM, post-processing) using a consistent checklist, instead of only tuning the LLM prompt. ### Use case/motivation More and more users are using Airflow to schedule and orchestrate: * ingestion and chunking of documents, * batch embedding jobs, * vector-store maintenance, * evaluation and monitoring flows for RAG systems. When a RAG system fails, the underlying issues are often subtle and multi-step (bad chunking, index skew, retriever bias, evaluation gaps) rather than a single broken task. Today, there is no canonical Airflow guide that: * names the most common failure patterns across the whole DAG, and * shows how to attach sensors / logging / extra tasks to quickly localise the failure. I maintain an MIT-licensed open-source project called **WFGY** (~1.5k GitHub stars). One component is the **WFGY 16-problem ProblemMap**, a checklist of typical RAG / LLM pipeline failure modes (retriever behaviour, vector stores, routing, hallucinations, guardrails, evaluation, etc.). It is already referenced by several curated lists and research efforts. The proposal is to adapt this checklist to Airflow and turn it into a concrete guide so that: * Airflow users get a ready-made vocabulary for RAG failures, and * have a worked example of an “observed” RAG DAG with the right logging / checks attached. ### Related issues I searched the issue tracker and docs but did not find an existing feature or guide that covers RAG / LLM failure analysis in this structured way. If I missed something, I am happy to adapt this proposal to fit existing work. ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
