Hi everyone,
I would like to propose introducing support for the Vortex columnar format in GraphAr to enhance storage efficiency and query performance, particularly in analytical and AI scenarios. I have initiated a discussion regarding this proposal at https://github.com/apache/incubator-graphar/discussions/887. Background Currently, several emerging columnar file formats—such as [Vortex](https://github.com/vortex-data/vortex), [Lance](https://github.com/lance-format/lance), [F3](https://github.com/future-file-format/F3), BtrBlocks, Nimble, and Parquet variants—demonstrate strong performance advantages in specific scenarios. I wonder whether supporting these formats in GraphAr could significantly reduce storage overhead and improve query performance at scale. Benefits 1. Introducing the Vortex columnar format can improve storage efficiency and query performance through better compression and vectorized execution. 2. It enables more flexible column-level encoding strategies, which can better align with analytical graph workloads. 3. Vortex is designed to be GPU-friendly, particularly in AI and analytics scenarios. Effects of Modifications 1. Storage layer implementation and format adapters 2. All binding languages require adoption. ```shell enum class FileType : int32_t { CSV = 0, PARQUET = 1, ORC = 2, JSON = 3 }; ``` Evidence from DuckDB Vortex has already been integrated into DuckDB, where it demonstrates substantial performance improvements on analytical workloads such as TPC-H. Reported results show significant gains in scan efficiency and query execution time compared to traditional columnar formats. Details are available in this [blog](https://duckdb.org/2026/01/23/duckdb-vortex-extension). What do others think about this idea? I’m happy to hear suggestions or alternative approaches. Thanks, yao jun
