Similar to other V4 threads, I am starting a thread to gauge interest in adding index support in Iceberg V4 and gather a focus group in this area.
There have been a few discussions related to indexing recently. - Me and Peter Vary are working on a proposal (WIP) to only write position deletes in the Flink streaming writer. It would need a primary key index to work reasonably efficiently. [1] - Xiaoxuan Li has a proposal to leverage index files to improve merge-on-read performance with equality deletes. [2] - pengzhiwei has a proposal to support full-text index and vector index. [3] *Idea: index files* To support those use cases, Iceberg can add support for index files (in addition to data files and delete files). It should be general enough to support different forms of indexing. - Primary key index - Secondary index - Full text index - Vector index This email is a starting point. It is a large topic. A lot of discussions and maturation of the ideas are needed before a formal proposal. Thanks, Steven [1] https://docs.google.com/document/d/1Jz4Fjt-6jRmwqbgHX_u0ohuyTB9ytDzfslS7lYraIjk/ (WIP) [2] https://lists.apache.org/thread/j4zl44g6dllzzyg9ln45pvgoosfhxqrq [3] https://github.com/apache/iceberg/issues/12636