Hi, all
It's a good time to update our roadmap. I open this discussion to
collect and discuss our roadmap. I have drafted some items that we have
discussed in our community meeting.
1. Format
- Define format with protobuf (discuss and vote on [1][2])
- Support multi-labels for vertex and edge
- Standardizing the format v1 specification
2. C++ Library
- Format compatibility to v1
- Make full use of feature of columnar format parquet/ORC to improve
read/write performance
- A simple out-of-core compute engine base on graphar
3. Java / Scala with Spark Library
- Format compatibility to v1
- Modularize the library: split to info/reader/writer...
- Integrate with ldbc_snb_datagen_spark[3]
4 Python with PySpark
- A new PySpark API that work with both Spark Classic and Spark Connect
4. Other
- ETL CLI for graphar data [4]
- More language binding
- Construct a DataHub with GraphAr format
I am looking forward to hearing your thoughts about the roadmap of GraphAr.
[1] https://lists.apache.org/thread/o5bqbhxvcbm6xqj1j1m2h7bhdnsvgsoy
[2] https://lists.apache.org/thread/swg5qb35qxywt6w0k7oxt2srsvqnqgnh
[3] https://github.com/apache/incubator-graphar/issues/463
[4] https://github.com/ldbc/ldbc_snb_datagen_spark