Hiļ¼Sem, thanks to bring up this topic. For Cpp, I prefer to incorporate generate code in the building process. And GraphScope[1] use this strategy in cpp/python, and works well.
For Java/Scala, how about we make format as a independent package and other modules rely it? [1] https://github.com/alibaba/GraphScope/blob/main/analytical_engine/CMakeLists.txt#L288 Best, Weibin Zeng On 2024/06/13 10:32:41 Sem wrote: > Hello! > > Because we are switching to proto3 as a language of GAR format > definition, we need to decide, are we going t store generated code in > git or not. > > Pros of storing generated code: > 1. Stability: even if the protoc is changed or a plugin is deprecated > we are still having generated and compilable code in the repo; > 2. Usability: anyone can go to git and see how the actual code looks > like; also users and developers should not care about protoc/buf and > can just clone the repo and thats it; > 3. CI simplicity: we do not need to incorporate protoc/buf in the > building process; > > Cons of storing generated code: > 1. Huge git diffs: in my experience changing a single line in proto may > tend to hundreds of lines diff in generated classes; > 2. Generated by protoc code is actually unreadable and it does not help > a lot in understanding what is going on; > 3. Risk of outdated classes: I cannot imagine the way how to check that > generated code is up to date. > > > Sources of possible inspiration: > 1. > https://github.com/apache/spark/blob/master/dev/connect-check-protos.py > : an utility in Apache Spark project that checks are the generated code > up to date or not. We may try to implement the same for Java/Cpp too. > 2. > https://github.com/apache/spark/blob/master/dev/connect-gen-protos.sh : > an utility in Apache Spark project that re-generate proto classes for > PySpark and apply formatting to reduce the git diff. We may try to > implement the same for Java/Cpp too. > > > How it is done in Apache Spark itself: > 1. proto files are incorporated into Maven build via maven-proto- > plugin, so Java classes are not stored in the repo and are generated > during the build > 2. Python classes are stored in the repo and are generated/updated by > request. In CI checking of sync status is called > > Another options. > I had talks with some engineers and as I understood the best solution > and an industry standard is to put all the protos in a sepearate > repository with generation of classes and put these classes into > packages. After that these packages may be used as dependencies. The > problem here is that requires to split our monorepo into parts: harder > to work with, harder to onboard people, harder to test, etc. > > Best regards, > Sem > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
