Re: [DISCUSS][format] Using an Interface Definition Language to define GraphAr format

Sem Thu, 09 May 2024 01:50:52 -0700

I made a small research about that and it seems to me that classes,
generated from protobuf are not serializable into another formats like
yaml/json.

There is a 3d party project: https://github.com/krzko/proto2yaml that
provide such utility, but it does not look well maintained.

I see that there is an utility, provided by google. It allows
conversion to JSON and from JSON in Java/Python (most probably, cpp
too):
1.
https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat
2.
https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

But not to/from YAML.

For me that question is important, because we need not only generate
the code but resolve the question about serialization/deserialization.

What do you think about using proto (binary messages) for underlaying
communication format in the code and JSONs for human-readable
representation on disk? Because it looks like only with switching from
YAML to JSON we achieve all the benefits of using protobuf.

With JSON I see it like we just call once `fromJson` (via google
protobuf util) to read the data and create proto classes from JSON info
files and work with them. At the end we call `toJson` again to
serialize messages back.

To achieve the same with YAML we need to use 3d-party and not well
maintained library or support our own serialization/deserialization of
proto messages (classes) to/from YAML for three languages (Python,
Java, Cpp).

On Thu, 2024-05-09 at 15:27 +0800, weibin.zen wrote:
> Hi, everyone,
> 
> I would like to propose that we should considering using an Interface
> Definition Language(IDL) like Protobuf[1] for GraphAr format
> definition.
> Currently we use YAML to describe schema and metadata of graph, and
> data storage with common format like CSV/Parquet. YAML
> provide human-readable ability but it can not provide much
> validation, version-controlled. And various programming languages
> need
> to parse them and check the validation by themself.
> 
> Using IDL to describe format would bring benefits like:
> 
> • provide a clear, standardized, language-agnostic format definition
> that can be version-controlled, shared by libraries and make the
> format consistent between implementations.
> • The validation by protobuf can be directly use by our validation of
> the schema, no need to let the libraries to implement the validation.
> • Cross-language support, libraries can use the generated structure
> as graph info directly.
> 
> 
> This proposal is not replace the YAML with Protobuf. We still use
> YAML as the final schema&metadata file for user readable, but with
> IDL to maintaining  a
> robust and precis schema definition. It's kind a hybrid strategy to
> accommondates both human and machine needs.
> 
> But Using IDL do bring some disadvantages, Sem has list some in the
> comment of pr[2]:
> 
> • the generated code is huge and unreadable.
> • the generated code may need to store in git.
> • debugging is very hard.
> 
> 
> Since this would be a huge change, and I want to hear the thoughts
> about the proposal from you.
> 
> 
> [1] https://protobuf.dev/
> [2] https://github.com/apache/incubator-graphar/pull/475
> 
> Best
> weibin.zen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS][format] Using an Interface Definition Language to define GraphAr format

Reply via email to