Hi everyone,

In our team, we started exploring the integration of Hudi in Qbeast (
qbeast.io), and over the last few months, I've been diving into Hudi's
internals to identify the best place for integration and the most suitable
components to use.

In Qbeast, we’re introducing a new way of indexing data, which requires
additional metadata for each Parquet file. Currently, we’re storing this
metadata in the extraMetadata field of the commit files, as Hudi allows
user-defined metadata there. However, I’m wondering if this is the best
approach or if it would be better to store this information in the metadata
table.

>From my understanding:

   - The extraMetadata field is flexible and user-defined, which makes it
   easy to use for custom metadata.
   - The metadata table seems more "closed" and focused on specific
   system-level functionalities.

Would it make more sense to continue using extraMetadata, or is there a
recommended way to extend the metadata table to include custom fields like
these? Any guidance or best practices would be greatly appreciated!

Thanks in advance!

Reply via email to