Hi Sivabalan, Thanks for your response. The metadata we need to store is indeed per-file, and it is leveraged primarily during reads. Currently, we are using the extraMetadata field in the commit files, but this approach requires reading both the active and archive timelines to extract the information during reads.
We are exploring a solution where the metadata is stored in the MetadataTable for faster retrieval and improved performance. This would also help align with Hudi's internals, as the MetadataTable is primarily used for storing indexes and other metadata-related information. In our solution, we would aim to extend the metadata table schema and include something like this: |-- qbeastMetadata: struct (nullable = true) |-- fileName: string (nullable = false) |-- revision: integer (nullable = false) |-- blocks: struct (nullable = false) |-- id: integer (nullable = false) |-- min: integer (nullable = false) |-- max: integer (nullable = false) |-- elementCount: integer (nullable = false) Looking at the code, it seems that the default schema is defined in the HoodieMetadata.avsc file ( https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc), and the classes that manage the table are generated automatically for this schema. Our question is: what would be the proper way to extend the default schema to include the metadata we need and generate the classes to manage it? Best regards, -Josep