Hi Hive experts,
I would like to extend Hive SQL language to provide a way to create Iceberg
partitioned tables like this:
create table iceberg_test(
level string,
event_time timestamp,
message string,
register_time date,
telephone array <string>
)
partition by spec(
level identity,
event_time identity,
event_time hour,
register_time day
)
stored as iceberg;
The problem is that this syntax is very specific of Iceberg, and I think it is
not a good idea to change the Hive syntax globally to accommodate a specific
use-case.
The following CREATE TABLE statement could archive the same thing:
create table iceberg_test(
level string,
event_time timestamp,
message string,
register_time date,
telephone array <string>
)
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
I am looking for a way to rewrite the original (Hive syntactically not correct)
query to a new (syntactically correct) one.
I was checking the hooks as a possible solution, but I have found that:
HiveDriverRunHook.preDriverRun can get the original / syntactically not correct
query, but I have found no way to rewrite it to a syntactically correct one (it
looks like a read only query)
HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a syntactically
correct query to start with
Any other ideas how to archive the goals above? Either with Hooks, or with any
other way?
Thanks,
Peter