Thanks for this! Will review this week! On Thu, Jul 15, 2021 at 5:15 AM 18717838093 <18717838...@126.com> wrote:
> > > Hi, experts. > > > Currently, Hudi sql statements for DML are executed by Hive Driver with > concatenation SQL statements in most cases. The way SQL is concatenated is > hard to maintain and the code is easy to break. Other than that, multiple > versions of Hive cannot be supported at the moment and makes a lot of > headaches for users to use. So, I would like to refactor and refine these > two things for getting a better design and more convenient for users to use. > > > for example, the following function use driver to execute sql. > > HiveSyncTool#syncHoodieTable used for creating a database by driver. > HoodieHiveClient#createTable, for creating a table by driver. > HoodieHiveClient#addPartitionsToTable by driver. > HoodieHiveClient#updatePartitionsToTable by driver. > HoodieHiveClient#updateTableDefinition, alter table by driver. > > > > > Other than that, HoodieHiveClient#updateTableProperties, > HoodieHiveClient#scanTablePartitions, HoodieHiveClient#doesTableExist and > etc, those metadata operation use client api to execute sql. Consider from > the design, the two pieces are not aligned. So I would think we need to > abstract a unified interface completely for all stuff contact with HMS and > does not use Driver to execute DML. As for the hive that can support > multiple versions, we can add a shim layer to support different versions of > HMS. > > > I have a preliminary conception of the design in RFC-31 ( > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment). > I hope everyone can help with some reviews and provide some suggestions. > thank you very much. > > > - Looking forward to your reply. > > > minglei > > > > > | | > 18717838093 > | > | > 18717838...@126.com > | > 签名由网易邮箱大师定制 > >