justinmclean commented on code in PR #92: URL: https://github.com/apache/gravitino-site/pull/92#discussion_r2377276192
########## blog/2025-09-24-gravitino-1-0-0-release-notes.mdx: ########## @@ -0,0 +1,128 @@ +--- +title: Apache Gravitino 1.0.0 - From Metadata Management to Contextual Engineering +slug: gravitino-1-0-0-release-notes +authors: [jerryshao] +tags: [apache,gravitino,metadata,multicloud,model,security,government] +--- + +Apache Gravitino was designed from day one to provide a unified framework for metadata management across heterogeneous sources, regions, and clouds—what we define as the metadata lake (or metalake). Throughout its evolution, Gravitino has extended support to multiple data modalities, including tabular metadata from Apache Hive, Apache Iceberg, MySQL, and PostgreSQL; unstructured assets from HDFS and S3; streaming and messaging metadata from Apache Kafka; and metadata for machine learning models. To further strengthen governance in Gravitino, we have also integrated advanced capabilities, including tagging, audit logging, and end-to-end lineage capture. + +After all enterprise metadata has been centralized through Gravitino, it forms a data brain: a structured, queryable, and semantically enriched representation of data assets. This enables not only consistent metadata access but also contextual reasoning and automation across systems. As we approach the 1.0 milestone, our focus shifts from pure metadata storage to metadata-driven contextual engineering—a foundation we call the Metadata-driven Action System. + +The release of Apache Gravitino 1.0.0 marks a significant engineering step forward, with robust APIs, extensible connectors, enhanced governance primitives, improved scalability and reliability in distributed environments. In the following sections, I will dive into the new features and architectural improvements introduced in Gravitino 1.0.0. + +## Metadata-driven action system + +In 1.0.0, we have introduced 3 new components, with which we can build jobs to accomplish the metadata-driven actions, like table compaction, TTL data management, PII identification, etc. These 3 new components are: statistics system, policy system, and job system. + +Taking table compaction as an example: + +* Firstly, users can define the table compaction policy in Gravitino and associate this policy with the tables that need to be compacted. +* Then, users can save the statistics of the table to Gravitino. +* Also, users can define a job template for the compaction. +* Lastly, users can use the statistics with the defined policy to generate the compaction parameters and use these parameters to trigger a compaction job based on the defined job templates. + +### Statistics system + +The statistics system is a new component for the statistics store and retrieval. You can define and store the table/partition level statistics in Gravitino, and also fetch them through Gravitino for different purposes. +For the details of how we design this component, please see [#7268](https://github.com/apache/gravitino/issues/7268). For how to use the statistics system, you can refer to the documentation [here](https://gravitino.apache.org/docs/1.0.0/manage-statistics-in-gravitino/). + +### Policy system + +The policy system helps you define action rules in Gravitino, like compaction rules or TTL rules. The defined policy can be associated with the entities, which means these rules will be enforced on the dedicated metadata. Users can leverage these enforced polices to decide how to trigger an action on the dedicated metadata. + +Please refer to the policy system [documentation](https://gravitino.apache.org/docs/1.0.0/manage-policies-in-gravitino) to know how to use it. If you want to know more about the implementation details of the policy system, please see [#7139](https://github.com/apache/gravitino/issues/7139). Review Comment: Suggested change: For more information on the policy system's implementation details, please refer to [#7139](https://github.com/apache/gravitino/issues/7139). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
