Hi Hive developers,

I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm
testing integration between Hive 4 and other software. If I understand
correctly, clients are currently required to additionally specify the
engine name when they get or update column statistics.

- https://issues.apache.org/jira/browse/HIVE-22046
- https://github.com/apache/hive/pull/741

For example, Trino has a feature to use column stats and it fails. Note
that I am not 100% sure about Trino's implementation or behavior.

```
trino> create table hive.default.test_trino (id int);
Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset!
Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id],
engine:null)
```

I have two questions about this feature.

(1) Should any engine use a unique engine name?

I guess some software can store or use stats compatible with Hive. I wonder
if it can reuse engine=hive in that case, or should use a different name
like engine=trino.

I see Impala gives a unique engine name to metastore. Taking a glance,
Spark is unlikely to be using col stats of Hive directly.

- https://issues.apache.org/jira/browse/IMPALA-8842

(2) Should Hive Metastore use engine=hive as a default value?

If other compatible software can reuse engine=hive, it could be an option
to accept requests with the old format assuming its engine is "hive" for
compatibility. Or should they explicitly specify engine=hive when using
Hive 4?

Regards,
Okumin

Reply via email to