I made a small research about that and it seems to me that classes, generated from protobuf are not serializable into another formats like yaml/json.
There is a 3d party project: https://github.com/krzko/proto2yaml that provide such utility, but it does not look well maintained. I see that there is an utility, provided by google. It allows conversion to JSON and from JSON in Java/Python (most probably, cpp too): 1. https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat 2. https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html But not to/from YAML. For me that question is important, because we need not only generate the code but resolve the question about serialization/deserialization. What do you think about using proto (binary messages) for underlaying communication format in the code and JSONs for human-readable representation on disk? Because it looks like only with switching from YAML to JSON we achieve all the benefits of using protobuf. With JSON I see it like we just call once `fromJson` (via google protobuf util) to read the data and create proto classes from JSON info files and work with them. At the end we call `toJson` again to serialize messages back. To achieve the same with YAML we need to use 3d-party and not well maintained library or support our own serialization/deserialization of proto messages (classes) to/from YAML for three languages (Python, Java, Cpp). On Thu, 2024-05-09 at 15:27 +0800, weibin.zen wrote: > Hi, everyone, > > I would like to propose that we should considering using an Interface > Definition Language(IDL) like Protobuf[1] for GraphAr format > definition. > Currently we use YAML to describe schema and metadata of graph, and > data storage with common format like CSV/Parquet. YAML > provide human-readable ability but it can not provide much > validation, version-controlled. And various programming languages > need > to parse them and check the validation by themself. > > Using IDL to describe format would bring benefits like: > > • provide a clear, standardized, language-agnostic format definition > that can be version-controlled, shared by libraries and make the > format consistent between implementations. > • The validation by protobuf can be directly use by our validation of > the schema, no need to let the libraries to implement the validation. > • Cross-language support, libraries can use the generated structure > as graph info directly. > > > This proposal is not replace the YAML with Protobuf. We still use > YAML as the final schema&metadata file for user readable, but with > IDL to maintaining a > robust and precis schema definition. It's kind a hybrid strategy to > accommondates both human and machine needs. > > But Using IDL do bring some disadvantages, Sem has list some in the > comment of pr[2]: > > • the generated code is huge and unreadable. > • the generated code may need to store in git. > • debugging is very hard. > > > Since this would be a huge change, and I want to hear the thoughts > about the proposal from you. > > > [1] https://protobuf.dev/ > [2] https://github.com/apache/incubator-graphar/pull/475 > > Best > weibin.zen --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
