[ https://issues.apache.org/jira/browse/FLINK-20416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271553#comment-17271553 ]
Sebastian Liu commented on FLINK-20416: --------------------------------------- Hi [~jark], [~lirui], After our discussion, I changed the original design intention of this ticket. Change to add only add a HiveCachedCatalog. There are below benefits: * Each catalog own cache by itself, cache strategies can be adjusted flexibly. Include: cache interval, update mode etc. * It does no restrict on related implementations of other catalogs. * Improve the performance for Flink OLAP based Hive. I have also updated the design doc, [https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing] Looking forward for your next suggestion. > Need a cached catalog for HiveCatalog > ------------------------------------- > > Key: FLINK-20416 > URL: https://issues.apache.org/jira/browse/FLINK-20416 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common, Connectors / Hive, Table SQL / API, > Table SQL / Planner > Reporter: Sebastian Liu > Priority: Major > Labels: pull-request-available > > For OLAP scenarios, There are usually some analytical queries which running > time is relatively short. These queries are also sensitive to latency. In the > current Blink sql processing, parse/validate/optimize stages are all need > meta data from catalog API. But each request to the catalog requires re-run > of the underlying meta query. > > We may need a cached catalog which can cache the table schema and statistic > info to avoid unnecessary repeated meta requests. > Design > doc:[https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing] > I have submitted a related PR for adding a genetic cached catalog, which can > delegate other implementations of {{AbstractCatalog. }} > {{[https://github.com/apache/flink/pull/14260]}} -- This message was sent by Atlassian Jira (v8.3.4#803005)