Hi,Sem, thanks to bring up this topic.

For Cpp, I prefer to incorporate  generate code in the building process. And 
GraphScope[1]
use this strategy in cpp/python, and works well.

For Java/Scala, how about we make format as a independent package and other 
modules rely it? 

[1] 
https://github.com/alibaba/GraphScope/blob/main/analytical_engine/CMakeLists.txt#L288

Best,
Weibin Zeng

On 2024/06/13 10:32:41 Sem wrote:
> Hello!
> 
> Because we are switching to proto3 as a language of GAR format
> definition, we need to decide, are we going t store generated code in
> git or not.
> 
> Pros of storing generated code:
> 1. Stability: even if the protoc is changed or a plugin is deprecated
> we are still having generated and compilable code in the repo;
> 2. Usability: anyone can go to git and see how the actual code looks
> like; also users and developers should not care about protoc/buf and
> can just clone the repo and thats it;
> 3. CI simplicity: we do not need to incorporate protoc/buf in the
> building process;
> 
> Cons of storing generated code:
> 1. Huge git diffs: in my experience changing a single line in proto may
> tend to hundreds of lines diff in generated classes;
> 2. Generated by protoc code is actually unreadable and it does not help
> a lot in understanding what is going on;
> 3. Risk of outdated classes: I cannot imagine the way how to check that
> generated code is up to date.
> 
> 
> Sources of possible inspiration:
> 1.
> https://github.com/apache/spark/blob/master/dev/connect-check-protos.py
> : an utility in Apache Spark project that checks are the generated code
> up to date or not. We may try to implement the same for Java/Cpp too.
> 2.
> https://github.com/apache/spark/blob/master/dev/connect-gen-protos.sh :
> an utility in Apache Spark project that re-generate proto classes for
> PySpark and apply formatting to reduce the git diff. We may try to
> implement the same for Java/Cpp too.
> 
> 
> How it is done in Apache Spark itself:
> 1. proto files are incorporated into Maven build via maven-proto-
> plugin, so Java classes are not stored in the repo and are generated
> during the build
> 2. Python classes are stored in the repo and are generated/updated by
> request. In CI checking of sync status is called
> 
> Another options.
> I had talks with some engineers and as I understood the best solution
> and an industry standard is to put all the protos in a sepearate
> repository with generation of classes and put these classes into
> packages. After that these packages may be used as dependencies. The
> problem here is that requires to split our monorepo into parts: harder
> to work with, harder to onboard people, harder to test, etc.
> 
> Best regards,
> Sem
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to