hi there, Just tested preview of RLI (rfc-08), amazing feature. Soon the fast COW (rfc-68) will be based on RLI to get the parquet offsets and allow targeting parquet row groups.
RLI is a global index, therefore it assumes the hudi key is present in at most one parquet file. As a result in the MDT, the RLI is of type struct, and there is a 1:1 mapping w/ a given file. Type: |-- recordIndexMetadata: struct (nullable = true) | |-- partition: string (nullable = false) | |-- fileIdHighBits: long (nullable = false) | |-- fileIdLowBits: long (nullable = false) | |-- fileIndex: integer (nullable = false) | |-- instantTime: long (nullable = false) Content: |event_id:1 |{part=3, -6811947225812876253, -7812062179961430298, 0, 1689147210233}| We would love to use both RLI and FCOW features, but I'm afraid our keys are not unique in our kafka archives. Same key might be present in multiple partitions, and even in multiple slices within partitions. I wonder if the future, RLI could support multiple parquet files (by storing an array of struct for eg). This would enable to leverage LRI in more contexts Thx