Thanks for the thoughtful questions, Amogh. These are exactly the right things to consider before committing resources. Let me address each one:
> 1. Where do these tests run? How long would it take to run? Any > special needs? Cadence? The proposal is to integrate with **OSS-Fuzz**, Google's continuous fuzzing infrastructure for open source projects. This means: - Tests run on Google's infrastructure at no cost to the project - Fuzzing runs continuously 24/7, not blocking CI - No special hardware or infrastructure needs from our side Optionally, fuzzers can run locally or in existing CI as quick sanity checks (seconds to minutes), while deep fuzzing happens asynchronously on OSS-Fuzz. > 2. I see an initial maintenance burden too - who will own it / > maintain it? Who will triage the reports? (false positives, > duplicates, low priority bugs) Once integrated, OSS-Fuzz operates autonomously. We have full control over how findings are handled: - Bugs are reported to the **OSS-Fuzz dashboard**, not directly to our issue tracker - We can **enable or disable** automatic GitHub issue creation - Findings are private for 90 days, then become public if unfixed That 90-day window does create some pressure to address findings - but the alternative is worse. These bugs exist whether or not we're fuzzing. External researchers or attackers finding them first gives us zero lead time. OSS-Fuzz guarantees we hear about it first, with 90 days to respond privately. I'll handle the **initial integration work** - writing the fuzzers, setting up the OSS-Fuzz project config, verifying it runs. After that, maintenance is minimal; fuzzers rarely need updates unless the APIs they target change significantly. > 3. Airflow assumes trusted users, so some findings through the fuzzer > might not be exploitable at all, but would lead to time spent triaging > that. Fair point. We can handle this carefully by scoping fuzzers to target code paths where the security boundaries are simple - input parsing, serialization, external protocol handling - and exclude areas where Airflow's trusted user model means findings wouldn't be actionable. > 4. DAG runs user code end of the day, fuzzer may find issues in user > code instead? Can we control that? Fuzzers work like regression tests - they target Airflow's own code paths, not user DAGs. Just as our test suite imports and exercises specific modules directly, fuzzers do the same: - Input parsing and validation functions - Serialization/deserialization (pickle, JSON, etc.) - Command construction utilities - Connection parameter handling No DAG is ever loaded or executed. The fuzzer imports a function, feeds it crafted inputs, and checks for crashes -- exactly like a unit test, just with generated inputs instead of handwritten ones. > 5. Our ecosystem of tons of providers may require us to spend > significant initial time to cover that surface area and later > maintain it Agreed this is large. The proposal is not to fuzz all providers immediately. Instead: - **Phase 1:** Core Airflow only (serializers, API input handling, scheduler internals) - **Phase 2:** High-risk providers with shell/exec patterns (SSH, Docker, Kubernetes, Teradata) - **Phase 3:** Community-driven expansion as we see value This mirrors how other large projects (Kubernetes, Envoy) adopted fuzzing; start narrow, prove value, expand organically. The bottom line: With OSS-Fuzz handling infrastructure, the upfront cost is a small PR and minimal ongoing commitment. We get 90 days of private lead time on any bugs found - far better than the zero days we'd get if external researchers find them first. Happy to start with a minimal proof-of-concept targeting just the serialization layer if that helps demonstrate value. Best, Leslie
