I think it's definitely worth trying. I saw a number of reports from
fuzzing in other ASF projects - and they are sometimes useful and detect
real issues.

I think it would be great also that we treat it as a learning exercise -
getting smaller PRs adding gradually some fuzzers from most obvious cases
to the more complex ones - currently I think it's hard to imagine for us
how such fuzzing could look like for Airflow and we would love to learn I
think.

I can easily imagine it for a bit more "lower-level" tools - libraries
that are operating on well defined inputs and produce outputs as a result
of processing the inputs with CLI or library call. Kind of "pure functions"
- which do not have state to start with and do not produce state
side-effects.

Airflow is more of a "living organism" where there is a lot of state - both
to begin with and the state gets updated as a result of various inputs. So
I have no good intuition on how such fuzzing could look like - but if an
expert comes and proposes something, we can discuss it and give our opinion
if it makes sense - and learn how to - possibly - add more fuzzing on our
own.

Also, I know other ASF projects already rely on the OSS-Fuzz by Google, so
there are no objections to using the tool from the ASF point of view - and
it would definitely make it easier to start.

One small thing that I see potentially as a blocker - is that if we start
seeing a lot of false-positives, such fuzzing might become useless -
especially if we have hard time analysing and understanding such fuzzing
report - but if we start small, and include the learning path for us - I am
quite sure we can mitigate it.

J.


On Fri, Dec 19, 2025 at 9:59 AM Leslie P. Polzer <[email protected]>
wrote:

> Thanks for the thoughtful questions, Amogh. These are exactly the right
> things to consider before committing resources. Let me address each one:
>
> > 1. Where do these tests run? How long would it take to run? Any
> > special needs? Cadence?
>
> The proposal is to integrate with **OSS-Fuzz**, Google's continuous
> fuzzing infrastructure for open source projects.
>
> This means:
>
> - Tests run on Google's infrastructure at no cost to the project
> - Fuzzing runs continuously 24/7, not blocking CI
> - No special hardware or infrastructure needs from our side
>
> Optionally, fuzzers can run locally or in existing CI as quick sanity
> checks (seconds to minutes), while deep fuzzing happens
> asynchronously on OSS-Fuzz.
>
> > 2. I see an initial maintenance burden too - who will own it /
> > maintain it? Who will triage the reports? (false positives,
> > duplicates, low priority bugs)
>
> Once integrated, OSS-Fuzz operates autonomously. We have full control
> over how findings are handled:
>
> - Bugs are reported to the **OSS-Fuzz dashboard**, not directly to our
>   issue tracker
> - We can **enable or disable** automatic GitHub issue creation
> - Findings are private for 90 days, then become public if unfixed
>
> That 90-day window does create some pressure to address findings
> - but the alternative is worse. These bugs exist whether or not we're
> fuzzing. External researchers or attackers finding them first gives us
> zero lead time. OSS-Fuzz guarantees we hear about it first, with 90
> days to respond privately.
>
> I'll handle the **initial integration work** - writing the fuzzers,
> setting up the OSS-Fuzz project config, verifying it runs. After that,
> maintenance is minimal; fuzzers rarely need updates unless the APIs
> they target change significantly.
>
> > 3. Airflow assumes trusted users, so some findings through the fuzzer
> > might not be exploitable at all, but would lead to time spent triaging
> > that.
>
> Fair point. We can handle this carefully by scoping fuzzers to target
> code paths where the security boundaries are simple - input parsing,
> serialization, external protocol handling - and exclude areas where
> Airflow's trusted user model means findings wouldn't be actionable.
>
> > 4. DAG runs user code end of the day, fuzzer may find issues in user
> > code instead? Can we control that?
>
> Fuzzers work like regression tests - they target Airflow's own code
> paths, not user DAGs. Just as our test suite imports and exercises
> specific modules directly, fuzzers do the same:
>
> - Input parsing and validation functions
> - Serialization/deserialization (pickle, JSON, etc.)
> - Command construction utilities
> - Connection parameter handling
>
> No DAG is ever loaded or executed. The fuzzer imports a function, feeds
> it crafted inputs, and checks for crashes -- exactly like a unit test,
> just with generated inputs instead of handwritten ones.
>
> > 5. Our ecosystem of tons of providers may require us to spend
> > significant initial time to cover that surface area and later
> > maintain it
>
> Agreed this is large. The proposal is not to fuzz all providers
> immediately. Instead:
>
> - **Phase 1:** Core Airflow only (serializers, API input handling,
>   scheduler internals)
> - **Phase 2:** High-risk providers with shell/exec patterns (SSH,
>   Docker, Kubernetes, Teradata)
> - **Phase 3:** Community-driven expansion as we see value
>
> This mirrors how other large projects (Kubernetes, Envoy) adopted
> fuzzing; start narrow, prove value, expand organically.
>
> The bottom line: With OSS-Fuzz handling infrastructure, the upfront
> cost is a small PR and minimal ongoing commitment. We get 90 days of
> private lead time on any bugs found - far better than the zero days
> we'd get if external researchers find them first. Happy to start with
> a minimal proof-of-concept targeting just the serialization layer if
> that helps demonstrate value.
>
> Best,
>
> Leslie
>

Reply via email to