Hi, lixue, it's not related to the discussion that you can only reply to the [email protected] to avoid opening a new thread in the mail list:)
Thanks Best Regards weibin 李雪(有理) <[email protected]> 于2024年5月11日周六 09:41写道: > Thank you for your thoughtful feedback and insights. Regarding the > concerns: > 1. The implementation of a CLI is a good idea. However, manual viewing or > review of configuration files is still necessary, in our current workflow. > 2. YAML’s syntax allows for the omission of braces, quotes, and commas, > making the entire block easier to read and write, especially for those > multi-level nested structure. > ------------------------------------------------------------------ > 发件人:Weibin Zeng <[email protected]> > 发送时间:2024年5月10日(星期五) 16:05 > 收件人:dev<[email protected]> > 主 题:Re: 回复:[DISCUSS][format] Using an Interface Definition Language to > define GraphAr format > Hi, Lixue, Thanks for the reply. > For > > 1. YAML's format is more human-readable and easier to edit, which is a > significant advantage in scenarios where we frequently need to view or > modify configuration files. For example, to define a subgraph from an > existing graph. > I do not agree that we should let user to edit the yaml/json files > directly. Manual modification of schema files is unreliable and > unpredictable that would probably bring error that users don't even know > why. And that's why we gonna to provide a CLI to restrict the operations on > graph data, including the project a subgraph. > for the human-readable, here is the ldbc-sample.graph.yml for YAML and > JSON: > ``` > name: ldbc_sample > vertices: > - person.vertex.yml > edges: > - person_knows_person.edge.yml > version: gar/v1 > extra_metadata: {} > ``` > ``` > { > "name": "ldbc_sample", > "vertices": [ > "person.vertex.yml" > ], > "edges": [ > "person_knows_person.edge.yml" > ], > "version": "gar/v1", > "extra_metadata": {} > } > ``` > JSON is readable enough i think, but not configurable as YAML. But since > the files are not allow modified directly, I think JSON is ok. > > 2. YAML often provides a more concise representation of the same data. > Can you give an example to show that why YAML provides more concise > representation of the data. > > 3. YAML natively supports comments and extensions, making it more > flexible. > I agree that YAML support more feature and more flexible. But it's too > flexible that can not provide much template validation support. To GraphAr > format, we should consider that if the format is enough to express the > schema and configuration of GraphAr. In this point, JSON is good to me. > On 2024/05/10 01:47:47 "李雪(有理)" wrote: > > Thank you for the information and links provided. While I understand the > application of JSON in GraphScope Flex and its advantages when integrated > with GraphAr, considering our specific use case, I still think that YAML > might be a more suitable choice for us. Here are the primary reasons: > > 1. YAML's format is more human-readable and easier to edit, which is a > significant advantage in scenarios where we frequently need to view or > modify configuration files. For example, to define a subgraph from an > existing graph. > > 2. YAML often provides a more concise representation of the same data. > > 3. YAML natively supports comments and extensions, making it more > flexible. > > Therefore, we initially favored YAML over JSON. I hope we can further > discuss this topic to find the solution that best fits our project > requirements. > > ------------------------------------------------------------------ > > 发件人:Weibin Zeng <[email protected]> > > 发送时间:2024年5月9日(星期四) 18:52 > > 收件人:dev<[email protected]> > > 主 题:Re: [DISCUSS][format] Using an Interface Definition Language to > define GraphAr format > > Sorry, miss the link > > > GraphScope Flex now use json as communication format for graph schema > and check with rest API[1] > > [1] > https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models > < > https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models > > < > https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models > < > https://github.com/alibaba/GraphScope/tree/main/python/graphscope/flex/rest/models > > > > > On 2024/05/09 10:49:24 Weibin Zeng wrote: > > > JSONs is ok for me. And GraphScope Flex now use json as communication > format for graph schema and check with rest API[1], I think switching to > JSON is good for GraphAr. Since GraphAr has been integrated into GraphScope. > > > > > > On 2024/05/09 08:50:45 Sem wrote: > > > > I made a small research about that and it seems to me that classes, > > > > generated from protobuf are not serializable into another formats > like > > > > yaml/json. > > > > > > > > There is a 3d party project: https://github.com/krzko/proto2yaml < > https://github.com/krzko/proto2yaml > <https://github.com/krzko/proto2yaml > <https://github.com/krzko/proto2yaml > > that > > > > provide such utility, but it does not look well maintained. > > > > > > > > I see that there is an utility, provided by google. It allows > > > > conversion to JSON and from JSON in Java/Python (most probably, cpp > > > > too): > > > > 1. > > > > > https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat > < > https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat > > < > https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat > < > https://cloud.google.com/java/docs/reference/protobuf/latest/com.google.protobuf.util.JsonFormat > > > > > > > 2. > > > > > https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html > < > https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html > > < > https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html > < > https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html > > > > > > > > > > > But not to/from YAML. > > > > > > > > For me that question is important, because we need not only generate > > > > the code but resolve the question about > serialization/deserialization. > > > > > > > > > > > > What do you think about using proto (binary messages) for underlaying > > > > communication format in the code and JSONs for human-readable > > > > representation on disk? Because it looks like only with switching > from > > > > YAML to JSON we achieve all the benefits of using protobuf. > > > > > > > > With JSON I see it like we just call once `fromJson` (via google > > > > protobuf util) to read the data and create proto classes from JSON > info > > > > files and work with them. At the end we call `toJson` again to > > > > serialize messages back. > > > > > > > > To achieve the same with YAML we need to use 3d-party and not well > > > > maintained library or support our own serialization/deserialization > of > > > > proto messages (classes) to/from YAML for three languages (Python, > > > > Java, Cpp). > > > > > > > > On Thu, 2024-05-09 at 15:27 +0800, weibin.zen wrote: > > > > > Hi, everyone, > > > > > > > > > > I would like to propose that we should considering using an > Interface > > > > > Definition Language(IDL) like Protobuf[1] for GraphAr format > > > > > definition. > > > > > Currently we use YAML to describe schema and metadata of graph, and > > > > > data storage with common format like CSV/Parquet. YAML > > > > > provide human-readable ability but it can not provide much > > > > > validation, version-controlled. And various programming languages > > > > > need > > > > > to parse them and check the validation by themself. > > > > > > > > > > Using IDL to describe format would bring benefits like: > > > > > > > > > > • provide a clear, standardized, language-agnostic format > definition > > > > > that can be version-controlled, shared by libraries and make the > > > > > format consistent between implementations. > > > > > • The validation by protobuf can be directly use by our validation > of > > > > > the schema, no need to let the libraries to implement the > validation. > > > > > • Cross-language support, libraries can use the generated structure > > > > > as graph info directly. > > > > > > > > > > > > > > > This proposal is not replace the YAML with Protobuf. We still use > > > > > YAML as the final schema&metadata file for user readable, but with > > > > > IDL to maintaining a > > > > > robust and precis schema definition. It's kind a hybrid strategy to > > > > > accommondates both human and machine needs. > > > > > > > > > > But Using IDL do bring some disadvantages, Sem has list some in the > > > > > comment of pr[2]: > > > > > > > > > > • the generated code is huge and unreadable. > > > > > • the generated code may need to store in git. > > > > > • debugging is very hard. > > > > > > > > > > > > > > > Since this would be a huge change, and I want to hear the thoughts > > > > > about the proposal from you. > > > > > > > > > > > > > > > [1] https://protobuf.dev/ <https://protobuf.dev/ > < > https://protobuf.dev/ <https://protobuf.dev/ > > > > > > > [2] https://github.com/apache/incubator-graphar/pull/475 < > https://github.com/apache/incubator-graphar/pull/475 > < > https://github.com/apache/incubator-graphar/pull/475 < > https://github.com/apache/incubator-graphar/pull/475 > > > > > > > > > > > > Best > > > > > weibin.zen > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected]
