[GitHub] [flink] slinkydeveloper opened a new pull request #18349: [FLINK-25609][table] Anonymous/inline tables don't require ObjectIdentifier anymore

GitBox Thu, 13 Jan 2022 02:12:58 -0800


slinkydeveloper opened a new pull request #18349:
URL: https://github.com/apache/flink/pull/18349



   ## What is the purpose of the change
   
   This PR removes the workaround of generating a random `ObjectIdentifier` for 
anonymous inline tables, that is:
   
   * Tables defined only with `TableDescriptor`, e.g. 
`TableEnvironment#from(TableDescriptor)`.
   * `DataStream` from/to conversions
   * `TableResult#collect`
   
   Now these tables don't go through the `CatalogManager` anymore and they 
don't have an assigned `ObjectIdentifier` anymore.
   
   Also, for every kind of table (permanent, temporary and anonymous), the 
table resolved schema, options, and everything inside `ResolvedCatalogTable` is 
propagated directly through the stack, rather than querying the `Catalog` at a 
later stage of planning. Among the others, this is a requirement to fix 
https://issues.apache.org/jira/browse/FLINK-25386.  
   
   ## Brief change log
   
   * `DynamicTableFactory.Context#getObjectIdentifier` deprecated and replaced 
with `DynamicTableFactory.Context#getIdentifier`, which now returns a 
`Optional<ObjectIdentifier>`.
   * Move `TableLookupResult` to the upper level and renamed to 
`ContextResolvedTable`. Now this POJO is used to propagate 
`ResolvedCatalogTable`,  `ObjectIdentifier` (if any) and associated `Catalog` 
(if any) throughout the stack. See the Javadoc for details.
   * Renamed `CatalogQueryOperation` and `CatalogSinkModifyOperation` to 
`SourceQueryOperation` and `SinkModifyOperation` and propagate 
`ContextResolvedTable`. The renaming is done as the name `Catalog` is confusing 
and implies that the table can be found inside a catalog, which is not the case 
anymore for anonymous tables.
   * Methods in `Table`, `TableEnvironment` and `StatementSet` 
creating/accepting anonymous tables now don't generate `ObjectIdentifier` 
anymore.
   * Methods in `Table`, `TableEnvironment` and `StatementSet` using 
`ObjectIdentifier` will perform a lookup at the call site, rather than later 
during the planning. Note that this is a *breaking change* in the behaviour of 
the APIs, as before one could first define the pipeline and later fill the 
catalog with the tables, while with this change the invocation of a method like 
`tEnv.from(String tableName)` will fail if the table has not been included 
inside the catalog before. Neverthless, this shouldn't be a problem for 
majority of the users and will improve the usability of our APIs, as now they 
"fail fast".
   * Now `QueryOperationConverter` manually creates the `RelNode` for anonymous 
tables, skipping Calcite table resolution
   * Every intermediate table representation between the `Operation`s and the 
final `ExecNode`s now embed `ContextResolvedTable`, rather than 
`ObjectIdentifier` and `ResolvedCatalogTable`.
   
   Because this is a substantial refactoring, in order to simplify the review, 
I isolated the most important changes in separate commits. The last commit is a 
big one, and essentially includes the propagation of the renamings, the 
propagation of `ContextResolvedTable` through the stack, and little changes to 
remove the assumption that `ObjectIdentifier` is always non null. 
   
   There is also an hotfix "Add default implementation for 
`DynamicTableFactory.Context#getEnrichmentOptions` to alleviate breaking 
change" not related to this PR.
   
   ## Verifying this change
   
   All the changes are verified by existing tests. Some tests had been adapted 
and hardened, see changes to `TableEnvironmentTest`.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: yes
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] slinkydeveloper opened a new pull request #18349: [FLINK-25609][table] Anonymous/inline tables don't require ObjectIdentifier anymore

Reply via email to