markhoerth opened a new issue, #10521:
URL: https://github.com/apache/gravitino/issues/10521
### Describe the feature
Support SET PROPERTY on tables in the jdbc-postgresql catalog (and JDBC
catalogs generally). Custom user-defined properties should be stored in
Gravitino's own metadata layer rather than being pushed down to the
underlying JDBC database.
### Motivation
A natural and valuable integration pattern is writing metadata back to
Gravitino from external tools like dbt, Airflow, and Kubeflow. These
tools need to annotate tables with two distinct types of metadata:
1. Runtime measurements (statistics) — quality test results, row counts,
freshness timestamps. Gravitino's statistics system handles this
correctly and works on JDBC catalogs today.
2. Provenance metadata (properties) — stable facts about a table's origin
and pipeline relationships that don't change run to run. Examples:
- dbt: model FQN, project name, model type, upstream table lineage
- Airflow: DAG ID, schedule, owning team
- Kubeflow: pipeline ID, serving model name
Currently, attempting to set properties on a jdbc-postgresql table returns:
"Set property is not supported yet"
This blocks provenance metadata write-back entirely for JDBC-backed
catalogs, which are among the most common catalog types in enterprise
deployments. Statistics work correctly today — properties are the remaining gap.
### Describe the solution
Custom/user-defined properties (e.g. dbt.model_fqn, dbt.upstream_tables,
airflow.dag_id) have no meaningful mapping to PostgreSQL or any JDBC
database. They should be stored in Gravitino's own metadata store, not
pushed down to the underlying database. This is analogous to how Gravitino
stores tags and statistics — in its own layer, independent of the underlying
catalog implementation.
A reasonable approach: distinguish between connector-managed properties
(e.g. storage format, partitioning) which are pushed down, and
user-defined properties which are stored in Gravitino's metadata store.
### Additional context
The intended write-back architecture for tools like dbt is:
- Statistics: runtime measurements updated each run (quality test results,
timestamps) — works today on JDBC catalogs
- Properties: stable provenance metadata (lineage, pipeline origin) —
blocked by this issue
- Tags: classification labels (dbt-managed, pii) — works today
- Policies: governance rules (quality thresholds) — works today
Workaround: Lineage metadata can be partially encoded in policy properties,
but policies are semantically governance rules, not provenance metadata, and
are shared across all associated tables rather than being per-table.
Property write-back works correctly on non-JDBC catalogs (Iceberg, Hive).
This issue affects all JDBC catalog providers, not just PostgreSQL.
Related: #10519, #10520 (closed)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]