slinkydeveloper opened a new pull request #18349: URL: https://github.com/apache/flink/pull/18349
## What is the purpose of the change This PR removes the workaround of generating a random `ObjectIdentifier` for anonymous inline tables, that is: * Tables defined only with `TableDescriptor`, e.g. `TableEnvironment#from(TableDescriptor)`. * `DataStream` from/to conversions * `TableResult#collect` Now these tables don't go through the `CatalogManager` anymore and they don't have an assigned `ObjectIdentifier` anymore. Also, for every kind of table (permanent, temporary and anonymous), the table resolved schema, options, and everything inside `ResolvedCatalogTable` is propagated directly through the stack, rather than querying the `Catalog` at a later stage of planning. Among the others, this is a requirement to fix https://issues.apache.org/jira/browse/FLINK-25386. ## Brief change log * `DynamicTableFactory.Context#getObjectIdentifier` deprecated and replaced with `DynamicTableFactory.Context#getIdentifier`, which now returns a `Optional<ObjectIdentifier>`. * Move `TableLookupResult` to the upper level and renamed to `ContextResolvedTable`. Now this POJO is used to propagate `ResolvedCatalogTable`, `ObjectIdentifier` (if any) and associated `Catalog` (if any) throughout the stack. See the Javadoc for details. * Renamed `CatalogQueryOperation` and `CatalogSinkModifyOperation` to `SourceQueryOperation` and `SinkModifyOperation` and propagate `ContextResolvedTable`. The renaming is done as the name `Catalog` is confusing and implies that the table can be found inside a catalog, which is not the case anymore for anonymous tables. * Methods in `Table`, `TableEnvironment` and `StatementSet` creating/accepting anonymous tables now don't generate `ObjectIdentifier` anymore. * Methods in `Table`, `TableEnvironment` and `StatementSet` using `ObjectIdentifier` will perform a lookup at the call site, rather than later during the planning. Note that this is a *breaking change* in the behaviour of the APIs, as before one could first define the pipeline and later fill the catalog with the tables, while with this change the invocation of a method like `tEnv.from(String tableName)` will fail if the table has not been included inside the catalog before. Neverthless, this shouldn't be a problem for majority of the users and will improve the usability of our APIs, as now they "fail fast". * Now `QueryOperationConverter` manually creates the `RelNode` for anonymous tables, skipping Calcite table resolution * Every intermediate table representation between the `Operation`s and the final `ExecNode`s now embed `ContextResolvedTable`, rather than `ObjectIdentifier` and `ResolvedCatalogTable`. Because this is a substantial refactoring, in order to simplify the review, I isolated the most important changes in separate commits. The last commit is a big one, and essentially includes the propagation of the renamings, the propagation of `ContextResolvedTable` through the stack, and little changes to remove the assumption that `ObjectIdentifier` is always non null. There is also an hotfix "Add default implementation for `DynamicTableFactory.Context#getEnrichmentOptions` to alleviate breaking change" not related to this PR. ## Verifying this change All the changes are verified by existing tests. Some tests had been adapted and hardened, see changes to `TableEnvironmentTest`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: yes ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org