Re: Hive SQL extension

Peter Vary Thu, 22 Oct 2020 07:22:15 -0700

Let's assume that this feature would be useful for Iceberg tables, but useless 
and even problematic/forbidden for other tables. :)


My thinking is, that it could make Hive much more user friendly, if we would 
allow for extensions in language.

With Iceberg integration we plan to do several extensions which might not be 
useful for other tables. Some examples:
When creating tables we want to send additional information to the storage 
layer, and pushing everything in properties is a pain (not really user friendly)
We would like to allow querying table history for iceberg tables (previous 
snapshotId-s, timestamps, etc)
We would like to allow time travel for iceberg tables based on the data queried 
above
We would like to allow the user to see / manage / remove old snapshots

These are all very specific Iceberg related stuff, and most probably will not 
work / useful for any other type of the tables, so I think adding them to Hive 
parser would be a stretch.

On the other hand if we do not provide SQL interface for accessing these 
features then the users will turn to Spark/Impala/Presto to be able to work 
with Iceberg tables.

As for your specific question for handling syntax errors (I have just started 
to think about how would I do it, so feel free to suggest better methods):
Let's assume that we have a hook which can get the sql command as an input and 
can rewrite it to a new SQL command
I would write simplified parser which tries to be as simple as possible for the 
specific command
Based on the parsing I would return the same command / throw an exception / 
rewrite the command

Admittedly this solution is working only if we can make every feature work 
without changing other part of Hive, and we just want to add "syntactic sugar" 
to it. (Do not underestimate the benefits of syntactic sugar :))

Thanks,
Peter


> On Oct 22, 2020, at 11:44, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Hi Peter,
> 
> I am nowhere near being an expert but just wanted to share my thoughts.
> 
> If I understand correctly you would like some syntactic sugar in Hive to 
> support partitioning as per Iceberg. I cannot tell if that's really useful or 
> not but from my point of view it doesn't seem a very good idea to introduce 
> another layer of parsing before the actual parser (don't know if there is one 
> already). For instance, how are you gonna handle the situation where there 
> are syntax errors in your sugared part and what the end user should see? 
> 
> No matter how it is added if you give the possibility to the user to write 
> such queries it becomes part of the Hive syntax and as such a job of the 
> parser. 
> 
> Best,
> Stamatis
> 
> 
> On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <pv...@cloudera.com 
> <mailto:pv...@cloudera.com>> wrote:
> Hi Hive experts,
> 
> I would like to extend Hive SQL language to provide a way to create Iceberg 
> partitioned tables like this:
> create table iceberg_test(
>         level string,
>         event_time timestamp,
>         message string,
>         register_time date,
>         telephone array <string>
>     )
>     partition by spec(
>         level identity,
>         event_time identity,
>         event_time hour,
>         register_time day
>     )
>     stored as iceberg;
> 
> The problem is that this syntax is very specific of Iceberg, and I think it 
> is not a good idea to change the Hive syntax globally to accommodate a 
> specific use-case.
> The following CREATE TABLE statement could archive the same thing:
> create table iceberg_test(
>         level string,
>         event_time timestamp,
>         message string,
>         register_time date,
>         telephone array <string>
>     )
>     STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
>     TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
> 
> I am looking for a way to rewrite the original (Hive syntactically not 
> correct) query to a new (syntactically correct) one.
> 
> I was checking the hooks as a possible solution, but I have found that:
> HiveDriverRunHook.preDriverRun can get the original / syntactically not 
> correct query, but I have found no way to rewrite it to a syntactically 
> correct one (it looks like a read only query)
> HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a 
> syntactically correct query to start with
> 
> Any other ideas how to archive the goals above? Either with Hooks, or with 
> any other way?
> 
> Thanks,
> Peter

Re: Hive SQL extension

Reply via email to