nsivabalan commented on a change in pull request #2245: URL: https://github.com/apache/hudi/pull/2245#discussion_r523751401
########## File path: docs/_posts/2020-11-11-hudi-indexing-mechanisms.mb ########## @@ -0,0 +1,92 @@ +--- +title: "Apache Hudi Indexing mechanisms" +excerpt: "Detailing different indexing mechanisms in Hudi and when to use each of them" +author: sivabalan +category: blog +--- + + +## 1. Introduction +Hoodie employs index to find and update the location of incoming records during write operations. Hoodie index is a very critical piece in Hoodie as it gives record level lookup support to Hudi for efficient write operations. This blog talks about different indices and when to use which one. + +Hoodie dataset can be of two types in general, partitioned and non-partitioned. So, most index has two implementations one for partitioned dataset and another for non-partitioned called as global index. + +These are the types of index supported by Hoodie as of now. + +- InMemory +- Bloom +- Simple +- Hbase + +You could use “hoodie.index.type” to choose any of these indices. + +### 1.1 Motivation +Different workloads have different access patterns. Hudi supports different indexing schemes to cater to the needs of different workloads. So depending on one’s use-case, indexing schema can be chosen. + +For eg: ……. Review comment: to be filled. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org