arina-ielchiieva commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore for Tables component URL: https://github.com/apache/drill/pull/2060#issuecomment-615879117 @paul-rogers good questions. Though none of them are addressed in this PR, since this PR only adds support for Drill Metastore `tables` component. I will provide below extended answer to your questions with the guidelines on what could be done to support use cases you have asked about. `First question`: Short answer: yes but some parts should be implemented first. Extended answer: Let's assume we want to store schema for HTTP plugin tables in Drill Metastore and use it when querying data from this plugin. `ANALYZE TABLE` command collects data about table, including schema, statistics etc. It allows user to provide schema inline as well. For example: `ANALYZE TABLE table(dfs.tmp.region(schema=>'inline=(id int, country varchar)')) REFRESH METADATA`. You can also call it only with schema and without statistics but you will need to disable statistics collection using session option: `planner.statistics.use`, in future `ANALYZE TABLE` command can be updated to do this without setting the option. Now `ANALYZE TABLE` command works only with file based tables. So first we will need to extend it to support analysis for tables from storage plugins. Maybe add interfaces that each storage plugin would need to implement. `ANALYZE TABLE` command will gather data for such tables and transfer it to the Drill Metastore. Drill Metastore will store it and will be able to provide it when asked (this part is implemented already). Currently, only file based format plugins work with Drill Metastore, so last step would be to integrate Drill Metastore usage in HTTP plugin or any other plugin. `Second question`: Short answer: yes, but you will have to implement new components. Extended answer: Drill Metastore consists of megastore-api (which contains Metastore interfaces and general classes) and metastore implementations, now we have Iceberg, this PR adds also RDBMS. Drill Metastore interface consists of components. Now we have only `tables` component which stores metadata for Drill tables, including their segments, files, row groups and partitions if any. `Views` component is present but not implemented. https://github.com/apache/drill/blob/master/metastore/metastore-api/src/main/java/org/apache/drill/metastore/Metastore.java So what if you want to add new component, for example, `pstore`? Just add new component to the `DrillMetastore` interface. As you wrote, it would store information `for plugins, UDFs, security credentials and more` so I think it's better to create separate component to each information type: ``` Plugins plugins(); Udfs udfs(); Credentials credentials(); ``` For each component you would also need to come up with some `unit` which will be used to provide info to the Metastore and back. For example, for `tables` component there is `TableMetadataUnit` unit. Then you would need to implement these interfaces in Drill Iceberg and RDBMS Metastore implementations. In Iceberg each component would have it's own Iceberg table, in RDBMS - one or several database tables. Most of the code is already written, you would just need to add code specific for each new component. And last step is to integrate Metastore calls in Drill code where you will need to use it. `DrillMetastore` is accessible though `DrillbitContext`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services