Similar to other V4 threads, I am starting a thread to gauge interest in
adding index support in Iceberg V4 and gather a focus group in this area.

There have been a few discussions related to indexing recently.

   - Me and Peter Vary are working on a proposal (WIP) to only write
   position deletes in the Flink streaming writer. It would need a primary key
   index to work reasonably efficiently. [1]
   - Xiaoxuan Li has a proposal to leverage index files to improve
   merge-on-read performance with equality deletes. [2]
   - pengzhiwei has a proposal to support full-text index and vector index.
   [3]


*Idea: index files*

To support those use cases, Iceberg can add support for index files (in
addition to data files and delete files). It should be general enough to
support different forms of indexing.

   - Primary key index
   - Secondary index
   - Full text index
   - Vector index


This email is a starting point. It is a large topic. A lot of discussions
and maturation of the ideas are needed before a formal proposal.

Thanks,
Steven

[1]
https://docs.google.com/document/d/1Jz4Fjt-6jRmwqbgHX_u0ohuyTB9ytDzfslS7lYraIjk/
(WIP)
[2] https://lists.apache.org/thread/j4zl44g6dllzzyg9ln45pvgoosfhxqrq
[3] https://github.com/apache/iceberg/issues/12636

Reply via email to