Hi Butao,

Thanks for sharing your PR! I didn't find trinodb/trino-hive-apache
or trinodb/hive-thrift.

As mentioned in the PR, the current Thrift definitions might not be the
final version, but it sounds reasonable to give information to external
products since we versioned Hive 4 beta. I'm curious if anyone why we give
different engine names to Hive and Impala and what are the recommended
options.

Thanks,
Okumin



On Fri, Aug 11, 2023 at 10:39 AM Butao Zhang <butaozha...@163.com> wrote:

> Hi, Okumin
>
>
> I have encountered this issue before, and the 'validWriteIdList' is also a
> incompatibility parameter. I have submit a PR in trino-hive-apache repo,
> and you can refer to https://github.com/trinodb/trino-hive-apache/pull/43
> .
> IIUC, the 'engine' parameter is used to differentiate between stats
> produced by different engines(Hive&Spark&Presto&Impala), but it seems that
> the downstream engines do not want to adopt&realize the new 'engine'
> parameter.
> At present, if some engines(e.g. Trino) use the customized thrift api to
> interact wiht hms, it must change its thrift file to match the thrift
> definition of hms.
> BTW, maybe we can change hms thrift file to make the 'engine' parameter
> optional and then other customized thrift client will not have
> compatibility issues.
>
> Thanks,
>
> Butao Zhang
>
> ---- Replied Message ----
> | From | Okumin<m...@okumin.com> |
> | Date | 8/10/2023 23:41 |
> | To | <dev@hive.apache.org> |
> | Subject | How to use `engine` introduced by HIVE-22046 |
> Hi Hive developers,
>
> I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm
> testing integration between Hive 4 and other software. If I understand
> correctly, clients are currently required to additionally specify the
> engine name when they get or update column statistics.
>
> - https://issues.apache.org/jira/browse/HIVE-22046
> - https://github.com/apache/hive/pull/741
>
> For example, Trino has a feature to use column stats and it fails. Note
> that I am not 100% sure about Trino's implementation or behavior.
>
> ```
> trino> create table hive.default.test_trino (id int);
> Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset!
> Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id],
> engine:null)
> ```
>
> I have two questions about this feature.
>
> (1) Should any engine use a unique engine name?
>
> I guess some software can store or use stats compatible with Hive. I wonder
> if it can reuse engine=hive in that case, or should use a different name
> like engine=trino.
>
> I see Impala gives a unique engine name to metastore. Taking a glance,
> Spark is unlikely to be using col stats of Hive directly.
>
> - https://issues.apache.org/jira/browse/IMPALA-8842
>
> (2) Should Hive Metastore use engine=hive as a default value?
>
> If other compatible software can reuse engine=hive, it could be an option
> to accept requests with the old format assuming its engine is "hive" for
> compatibility. Or should they explicitly specify engine=hive when using
> Hive 4?
>
> Regards,
> Okumin
>

Reply via email to