Thanks for this! Will review this week!

On Thu, Jul 15, 2021 at 5:15 AM 18717838093 <18717838...@126.com> wrote:

>
>
> Hi, experts.
>
>
> Currently, Hudi sql statements for DML are executed by Hive Driver with
> concatenation SQL statements in most cases. The way SQL is concatenated is
> hard to maintain and the code is easy to break. Other than that, multiple
> versions of Hive cannot be supported at the moment and makes a lot of
> headaches for users to use. So, I would like to refactor and refine these
> two things for getting a better design and more convenient for users to use.
>
>
> for example, the following function use driver to execute sql.
>
> HiveSyncTool#syncHoodieTable used for creating a database by driver.
> HoodieHiveClient#createTable, for creating a table by driver.
> HoodieHiveClient#addPartitionsToTable by driver.
> HoodieHiveClient#updatePartitionsToTable by driver.
> HoodieHiveClient#updateTableDefinition, alter table by driver.
>
>
>
>
> Other than that, HoodieHiveClient#updateTableProperties,
> HoodieHiveClient#scanTablePartitions, HoodieHiveClient#doesTableExist and
> etc, those metadata operation use client api to execute sql. Consider from
> the design, the two pieces are not aligned. So I would think we need to
> abstract a unified interface completely for all stuff contact with HMS and
> does not use Driver to execute DML. As for the hive that can support
> multiple versions, we can add a shim layer to support different versions of
> HMS.
>
>
> I have a preliminary conception of the design in RFC-31 (
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment).
> I hope everyone can help with some reviews and provide some suggestions.
> thank you very much.
>
>
> - Looking forward to your reply.
>
>
> minglei
>
>
>
>
> | |
> 18717838093
> |
> |
> 18717838...@126.com
> |
> 签名由网易邮箱大师定制
>
>

Reply via email to