Hi, lixue, it's not related to the discussion that you can only reply to
the [email protected] to avoid opening a new thread in the mail list:)

Thanks

Best Regards
weibin

李雪(有理) <[email protected]> 于2024年5月11日周六 09:41写道:

> Thank you for your thoughtful feedback and insights. Regarding the
> concerns:
> 1. The implementation of a CLI is a good idea. However, manual viewing or
> review of configuration files is still necessary, in our current workflow.
> 2. YAML’s syntax allows for the omission of braces, quotes, and commas,
> making the entire block easier to read and write, especially for those
> multi-level nested structure.
> ------------------------------------------------------------------
> 发件人:Weibin Zeng <[email protected]>
> 发送时间:2024年5月10日(星期五) 16:05
> 收件人:dev<[email protected]>
> 主 题:Re: 回复:[DISCUSS][format] Using an Interface Definition Language to
> define GraphAr format
> Hi, Lixue, Thanks for the reply.
> For
> > 1. YAML's format is more human-readable and easier to edit, which is a
> significant advantage in scenarios where we frequently need to view or
> modify configuration files. For example, to define a subgraph from an
> existing graph.
> I do not agree that we should let user to edit the yaml/json files
> directly. Manual modification of schema files is unreliable and
> unpredictable that would probably bring error that users don't even know
> why. And that's why we gonna to provide a CLI to restrict the operations on
> graph data, including the project a subgraph.
> for the human-readable, here is the ldbc-sample.graph.yml for YAML and
> JSON:
> ```
> name: ldbc_sample
> vertices:
>  - person.vertex.yml
> edges:
>  - person_knows_person.edge.yml
> version: gar/v1
> extra_metadata: {}
> ```
> ```
> {
>  "name": "ldbc_sample",
>  "vertices": [
>  "person.vertex.yml"
>  ],
>  "edges": [
>  "person_knows_person.edge.yml"
>  ],
>  "version": "gar/v1",
>  "extra_metadata": {}
> }
> ```
> JSON is readable enough i think, but not configurable as YAML. But since
> the files are not allow modified directly, I think JSON is ok.
> > 2. YAML often provides a more concise representation of the same data.
> Can you give an example to show that why YAML provides more concise
> representation of the data.
> > 3. YAML natively supports comments and extensions, making it more
> flexible.
> I agree that YAML support more feature and more flexible. But it's too
> flexible that can not provide much template validation support. To GraphAr
> format, we should consider that if the format is enough to express the
> schema and configuration of GraphAr. In this point, JSON is good to me.
> On 2024/05/10 01:47:47 "李雪(有理)" wrote:
> > Thank you for the information and links provided. While I understand the
> application of JSON in GraphScope Flex and its advantages when integrated
> with GraphAr, considering our specific use case, I still think that YAML
> might be a more suitable choice for us. Here are the primary reasons:
> > 1. YAML's format is more human-readable and easier to edit, which is a
> significant advantage in scenarios where we frequently need to view or
> modify configuration files. For example, to define a subgraph from an
> existing graph.
> > 2. YAML often provides a more concise representation of the same data.
> > 3. YAML natively supports comments and extensions, making it more
> flexible.
> > Therefore, we initially favored YAML over JSON. I hope we can further
> discuss this topic to find the solution that best fits our project
> requirements.
> > ------------------------------------------------------------------
> > 发件人:Weibin Zeng <[email protected]>
> > 发送时间:2024年5月9日(星期四) 18:52
> > 收件人:dev<[email protected]>
> > 主 题:Re: [DISCUSS][format] Using an Interface Definition Language to
> define GraphAr format
> > Sorry, miss the link
> > > GraphScope Flex now use json as communication format for graph schema
> and check with rest API[1]
> > [1]
> https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models
> <
> https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models
> > <
> https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models
> <
> https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models
> > >
> > On 2024/05/09 10:49:24 Weibin Zeng wrote:
> > > JSONs is ok for me. And GraphScope Flex now use json as communication
> format for graph schema and check with rest API[1], I think switching to
> JSON is good for GraphAr. Since GraphAr has been integrated into GraphScope.
> > >
> > > On 2024/05/09 08:50:45 Sem wrote:
> > > > I made a small research about that and it seems to me that classes,
> > > > generated from protobuf are not serializable into another formats
> like
> > > > yaml/json.
> > > >
> > > > There is a 3d party project: https://github.com/krzko/proto2yaml <
> https://github.com/krzko/proto2yaml > <https://github.com/krzko/proto2yaml
> <https://github.com/krzko/proto2yaml > > that
> > > > provide such utility, but it does not look well maintained.
> > > >
> > > > I see that there is an utility, provided by google. It allows
> > > > conversion to JSON and from JSON in Java/Python (most probably, cpp
> > > > too):
> > > > 1.
> > > >
> https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat
> <
> https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat
> > <
> https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat
> <
> https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat
> > >
> > > > 2.
> > > >
> https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html
> <
> https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html
> > <
> https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html
> <
> https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html
> > >
> > > >
> > > > But not to/from YAML.
> > > >
> > > > For me that question is important, because we need not only generate
> > > > the code but resolve the question about
> serialization/deserialization.
> > > >
> > > >
> > > > What do you think about using proto (binary messages) for underlaying
> > > > communication format in the code and JSONs for human-readable
> > > > representation on disk? Because it looks like only with switching
> from
> > > > YAML to JSON we achieve all the benefits of using protobuf.
> > > >
> > > > With JSON I see it like we just call once `fromJson` (via google
> > > > protobuf util) to read the data and create proto classes from JSON
> info
> > > > files and work with them. At the end we call `toJson` again to
> > > > serialize messages back.
> > > >
> > > > To achieve the same with YAML we need to use 3d-party and not well
> > > > maintained library or support our own serialization/deserialization
> of
> > > > proto messages (classes) to/from YAML for three languages (Python,
> > > > Java, Cpp).
> > > >
> > > > On Thu, 2024-05-09 at 15:27 +0800, weibin.zen wrote:
> > > > > Hi, everyone,
> > > > >
> > > > > I would like to propose that we should considering using an
> Interface
> > > > > Definition Language(IDL) like Protobuf[1] for GraphAr format
> > > > > definition.
> > > > > Currently we use YAML to describe schema and metadata of graph, and
> > > > > data storage with common format like CSV/Parquet. YAML
> > > > > provide human-readable ability but it can not provide much
> > > > > validation, version-controlled. And various programming languages
> > > > > need
> > > > > to parse them and check the validation by themself.
> > > > >
> > > > > Using IDL to describe format would bring benefits like:
> > > > >
> > > > > • provide a clear, standardized, language-agnostic format
> definition
> > > > > that can be version-controlled, shared by libraries and make the
> > > > > format consistent between implementations.
> > > > > • The validation by protobuf can be directly use by our validation
> of
> > > > > the schema, no need to let the libraries to implement the
> validation.
> > > > > • Cross-language support, libraries can use the generated structure
> > > > > as graph info directly.
> > > > >
> > > > >
> > > > > This proposal is not replace the YAML with Protobuf. We still use
> > > > > YAML as the final schema&metadata file for user readable, but with
> > > > > IDL to maintaining a
> > > > > robust and precis schema definition. It's kind a hybrid strategy to
> > > > > accommondates both human and machine needs.
> > > > >
> > > > > But Using IDL do bring some disadvantages, Sem has list some in the
> > > > > comment of pr[2]:
> > > > >
> > > > > • the generated code is huge and unreadable.
> > > > > • the generated code may need to store in git.
> > > > > • debugging is very hard.
> > > > >
> > > > >
> > > > > Since this would be a huge change, and I want to hear the thoughts
> > > > > about the proposal from you.
> > > > >
> > > > >
> > > > > [1] https://protobuf.dev/ <https://protobuf.dev/ > <
> https://protobuf.dev/ <https://protobuf.dev/ > >
> > > > > [2] https://github.com/apache/incubator-graphar/pull/475 <
> https://github.com/apache/incubator-graphar/pull/475 > <
> https://github.com/apache/incubator-graphar/pull/475 <
> https://github.com/apache/incubator-graphar/pull/475 > >
> > > > >
> > > > > Best
> > > > > weibin.zen
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

Reply via email to