[Discussion][Format] Should we store generated code in repository or not?

Sem Thu, 13 Jun 2024 03:33:34 -0700

Hello!

Because we are switching to proto3 as a language of GAR format
definition, we need to decide, are we going t store generated code in
git or not.


Pros of storing generated code:
1. Stability: even if the protoc is changed or a plugin is deprecated
we are still having generated and compilable code in the repo;
2. Usability: anyone can go to git and see how the actual code looks
like; also users and developers should not care about protoc/buf and
can just clone the repo and thats it;
3. CI simplicity: we do not need to incorporate protoc/buf in the
building process;

Cons of storing generated code:
1. Huge git diffs: in my experience changing a single line in proto may
tend to hundreds of lines diff in generated classes;
2. Generated by protoc code is actually unreadable and it does not help
a lot in understanding what is going on;
3. Risk of outdated classes: I cannot imagine the way how to check that
generated code is up to date.


Sources of possible inspiration:
1.
https://github.com/apache/spark/blob/master/dev/connect-check-protos.py
: an utility in Apache Spark project that checks are the generated code
up to date or not. We may try to implement the same for Java/Cpp too.
2.
https://github.com/apache/spark/blob/master/dev/connect-gen-protos.sh :
an utility in Apache Spark project that re-generate proto classes for
PySpark and apply formatting to reduce the git diff. We may try to
implement the same for Java/Cpp too.


How it is done in Apache Spark itself:
1. proto files are incorporated into Maven build via maven-proto-
plugin, so Java classes are not stored in the repo and are generated
during the build
2. Python classes are stored in the repo and are generated/updated by
request. In CI checking of sync status is called

Another options.
I had talks with some engineers and as I understood the best solution
and an industry standard is to put all the protos in a sepearate
repository with generation of classes and put these classes into
packages. After that these packages may be used as dependencies. The
problem here is that requires to split our monorepo into parts: harder
to work with, harder to onboard people, harder to test, etc.

Best regards,
Sem

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Discussion][Format] Should we store generated code in repository or not?

Reply via email to