travisdowns opened a new pull request, #3778: URL: https://github.com/apache/avro/pull/3778
## What `CodeGen::guard()` in [lang/c++/impl/avrogencpp.cc](https://github.com/apache/avro/blob/main/lang/c++/impl/avrogencpp.cc#L805) suffixes the generated header's `#ifndef` guard with the output of an `std::mt19937` seeded from `::time(nullptr)`: ```cpp string CodeGen::guard() { string h = headerFile_; makeCanonical(h, true); return h + "_" + std::to_string(random_()) + "_H"; } ``` So two invocations on the same schema produce headers whose guards differ: ``` #ifndef FOO_AVROGEN_H_3350718792_H #ifndef FOO_AVROGEN_H_2362587291_H ``` This PR drops the random suffix. ## Why 1. **Generated output is non-deterministic.** Re-running `avrogencpp -i schema.avsc -o foo.h` on the same input produces a different file every time, which is surprising for a codegen and makes side-by-side diff / git review difficult. 2. **It breaks content-addressed build systems.** Bazel's remote cache and the Nix store both key on input-content digests. With a randomised include guard, every consumer of the generated header sees a different input digest on every build — so on byte-identical schemas the entire downstream `.cc` → `.o` → `.a` → binary chain has to recompile and re-link, even though the schema hasn't changed. This was the trigger for filing the PR: a hermetic two-output-base Bazel build comparison flagged `manifest_file.avrogen.h` as a root non-hermetic action and traced ~hundreds of downstream cascade rebuilds back to it. ## Why this is safe `headerFile_` is already guaranteed-unique per output (it's the path of the file we're about to write). `makeCanonical(h, true)` turns that path into a valid C identifier, which on its own is a fine guard name. The RNG suffix only added entropy, not uniqueness — there's no scenario where two `avrogencpp` runs producing headers at the same output path are supposed to coexist with conflicting guards. After the change, the guard is `<canonicalised-path>_H`, deterministic across runs. ## Test Build avrogencpp, run it twice against the same schema (`echo $RANDOM > /tmp/x.avsc` is irrelevant — same schema, same path), check the outputs are byte-identical. Before the change they differ; after, they match. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
