arina-ielchiieva commented on issue #2060: DRILL-7706: Implement Drill RDBMS 
Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#issuecomment-615879117
 
 
   @paul-rogers good questions. Though none of them are addressed in this PR, 
since this PR only adds support for Drill Metastore `tables` component. I will 
provide below extended answer to your questions with the guidelines on what 
could be done to support use cases you have asked about.
   
   `First question`:
   Short answer: yes but some parts should be implemented first.
   Extended answer:
   Let's assume we want to store schema for HTTP plugin tables in Drill 
Metastore and use it when querying data from this plugin.
   `ANALYZE TABLE` command collects data about table, including schema, 
statistics etc. It allows user to provide schema inline as well. For example: 
`ANALYZE TABLE table(dfs.tmp.region(schema=>'inline=(id int, country 
varchar)')) REFRESH METADATA`.
   
   You can also call it only with schema and without statistics but you will 
need to disable statistics collection using session option: 
`planner.statistics.use`, in future `ANALYZE TABLE` command can be updated to 
do this without setting the option.
   
   Now `ANALYZE TABLE` command works only with file based tables. So first we 
will need to extend it to support analysis for tables from storage plugins. 
Maybe add interfaces that each storage plugin would need to implement. 
   `ANALYZE TABLE` command will gather data for such tables and transfer it to 
the Drill Metastore. Drill Metastore will store it and will be able to provide 
it when asked (this part is implemented already). 
   Currently, only file based format plugins work with Drill Metastore, so last 
step would be to integrate Drill Metastore usage in HTTP plugin or any other 
plugin.
   
   `Second question`:
   Short answer: yes, but you will have to implement new components.
   Extended answer:
   Drill Metastore consists of megastore-api (which contains Metastore 
interfaces and general classes) and metastore implementations, now we have 
Iceberg, this PR adds also RDBMS.
   Drill Metastore interface consists of components. Now we have only `tables` 
component which stores metadata for Drill tables, including their segments, 
files, row groups and partitions if any. `Views` component is present but not 
implemented. 
https://github.com/apache/drill/blob/master/metastore/metastore-api/src/main/java/org/apache/drill/metastore/Metastore.java
   
   So what if you want to add new component, for example, `pstore`? Just add 
new component to the `DrillMetastore` interface. As you wrote, it would store 
   information `for plugins, UDFs, security credentials and more` so I think 
it's better to create separate component to each information type:
   ```
   Plugins plugins();
   Udfs udfs();
   Credentials credentials();
   ```
   For each component you would also need to come up with some `unit` which 
will be used to provide info to the Metastore and back. For example, for 
`tables` component there is `TableMetadataUnit` unit.
   Then you would need to implement these interfaces in Drill Iceberg and RDBMS 
Metastore implementations. In Iceberg each component would have it's own 
Iceberg table, in RDBMS - one or several database tables. Most of the code is 
already written, you would just need to add code specific for each new 
component.
   And last step is to integrate Metastore calls in Drill code where you will 
need to use it. `DrillMetastore` is accessible though `DrillbitContext`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to