[GitHub] [flink] luoyuxia commented on pull request #22166: [FLINK-31409][hive] Hive dialect should use public interfaces

via GitHub Wed, 26 Apr 2023 04:37:50 -0700


luoyuxia commented on PR #22166:
URL: https://github.com/apache/flink/pull/22166#issuecomment-1523271352


   > Thank you for the update. I think this is in good shape. During reviewing 
the code, I find the methods of `CatalogRegistry` are not smooth to use. For 
example,
   > 
   > 1. the `ResolvedCatalogBaseTable` vs `ContextResolvedTable` mentioned 
above,
   > 2. the `Optional` return type is trivial, we always throw exceptions when 
working with an empty Optional result.
   > 3. it's a long method chain to get the current catalog, but this is a 
frequent call.
   > 
   > I'm thinking about revisiting the design of `CatalogRegistry` into this:
   > 
   > ```java
   > /** A catalog registry for dealing with catalogs. */
   > @PublicEvolving
   > public interface CatalogRegistry {
   >     /** Get the current database. */
   >     String getCurrentDatabase();
   > 
   >     /** Gets the current catalog. */
   >     String getCurrentCatalog();
   > 
   >     /**
   >      * Returns the full name of the given table path, this name may be 
padded with current
   >      * catalog/database name based on the {@code identifier's} length.
   >      *
   >      * @param identifier an unresolved identifier
   >      * @return a fully qualified object identifier
   >      */
   >     ObjectIdentifier qualifyIdentifier(UnresolvedIdentifier identifier);
   > 
   >     /** Gets the current catalog. */
   >     Catalog getCurrentCatalog();
   > 
   >     /**
   >      * Gets a catalog by name.
   >      *
   >      * @param catalogName name of the catalog to retrieve
   >      * @return the requested catalog
   >      * @throws CatalogNotExistException if the catalog does not exist
   >      */
   >     Catalog getCatalogOrError(String catalogName);
   > 
   >     /**
   >      * Gets a fully qualified and resolved table with context by name. If 
the path is not yet fully
   >      * qualified use {@link #qualifyIdentifier(UnresolvedIdentifier)} 
first.
   >      *
   >      * @param objectIdentifier full path of the table to retrieve
   >      * @return resolved table with context that the path points to
   >      * @throws TableNotExistException if the table does not exist
   >      */
   >     ContextResolvedTable getTableOrError(ObjectIdentifier 
objectIdentifier);
   > 
   >     /**
   >      * Return whether the table with a fully qualified table path is 
temporary or not.
   >      *
   >      * @param objectIdentifier full path of the table
   >      * @return the table is temporary or not.
   >      */
   >     boolean isTemporaryTable(ObjectIdentifier objectIdentifier);
   > 
   >     /**
   >      * Retrieves a partition with a fully qualified table path and 
partition spec.
   >      *
   >      * @param tableIdentifier full path of the table to retrieve
   >      * @param partitionSpec full partition spec
   >      * @return partition in the table.
   >      */
   >     Optional<CatalogPartition> getPartition(
   >             ObjectIdentifier tableIdentifier, CatalogPartitionSpec 
partitionSpec);
   >     
   >     /**
   >      * Retrieves a partition with a fully qualified table path and 
partition spec.
   >      *
   >      * @param tableIdentifier full path of the table to retrieve
   >      * @param partitionSpec full partition spec
   >      * @return partition in the table.
   >      * @throws PartitionNotExistException if the partition does not exist
   >      */
   >     CatalogPartition getPartitionOrError(
   >             ObjectIdentifier tableIdentifier, CatalogPartitionSpec 
partitionSpec);
   > }
   > ```
   > 
   > What do you think?
   
   Thanks for the advice, I like the idea that getOrError to make caller easy 
to use.
   So, I add getCatalogOrError to this method since it'll always throw error in 
Hive dialect if the catalog doesn't exist.
   
   But for some other methods `getTableOrError`, the HiveParser actually won't 
always throw exception in [this method 
](https://github.com/apache/flink/blob/3d350485755e47d2b09ac1b2067b6119ef960b5a/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/copy/HiveParserSemanticAnalyzer.java#L2086).
   So, we'll still need another method `getTable` which returns an option.
   
   And for the method `getPartitionOrError`, it will throw 
`PartitionNotExistException` which is not a runtime exception.
   The caller like HiveParser will still need to catch the exception and then 
wrap it to a RunTimeException. For me, it seems not to be much easy to call 
this method, at lest in HiveParser with existing codebase.
   
   Beside, I'm trying to expose less interfaces in this case that it seems not 
as so inconvenient to use to me and some extra interfaces seem to be 
verbose/duplicated to me.
   
   Btw, `ContextResolvedTable` is an `Internal` class, so we still can't return 
`ContextResolvedTable` in method `CatalogRegistry#getResolvedCatalogBaseTable`. 
To simply the name, I rename it to `CatalogRegistry#getCatalogBaseTable`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] luoyuxia commented on pull request #22166: [FLINK-31409][hive] Hive dialect should use public interfaces

Reply via email to