Hi all, I do not know if that may be of interest to you, but there are other projects that could benefit from this. For instance, ADQL <https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html> (Astronomical Data Query Language) is a SQL-like language that defines some higher-level functions that enable powerful geospatial queries. Projects like queryparser <https://github.com/aipescience/queryparser> are able to translate from ADQL to vendor-SQL for MySQL or PostreSQL. In this case, the syntactic sugar is implemented as an external layer on top, but could very well be implemented in a rewrite hook if available.
Cheers, Pau. Missatge de Peter Vary <pv...@cloudera.com> del dia dj., 22 d’oct. 2020 a les 16:21: > > Let's assume that this feature would be useful for Iceberg tables, but > useless and even problematic/forbidden for other tables. :) > > My thinking is, that it could make Hive much more user friendly, if we > would allow for extensions in language. > > With Iceberg integration we plan to do several extensions which might not > be useful for other tables. Some examples: > > - When creating tables we want to send additional information to the > storage layer, and pushing everything in properties is a pain (not really > user friendly) > - We would like to allow querying table history for iceberg tables > (previous snapshotId-s, timestamps, etc) > - We would like to allow time travel for iceberg tables based on the > data queried above > - We would like to allow the user to see / manage / remove old > snapshots > > > These are all very specific Iceberg related stuff, and most probably will > not work / useful for any other type of the tables, so I think adding them > to Hive parser would be a stretch. > > On the other hand if we do not provide SQL interface for accessing these > features then the users will turn to Spark/Impala/Presto to be able to work > with Iceberg tables. > > As for your specific question for handling syntax errors (I have just > started to think about how would I do it, so feel free to suggest better > methods): > > - Let's assume that we have a hook which can get the sql command as an > input and can rewrite it to a new SQL command > - I would write simplified parser which tries to be as simple as > possible for the specific command > - Based on the parsing I would return the same command / throw an > exception / rewrite the command > > > Admittedly this solution is working only if we can make every feature work > without changing other part of Hive, and we just want to add "syntactic > sugar" to it. (Do not underestimate the benefits of syntactic sugar :)) > > Thanks, > Peter > > > On Oct 22, 2020, at 11:44, Stamatis Zampetakis <zabe...@gmail.com> wrote: > > Hi Peter, > > I am nowhere near being an expert but just wanted to share my thoughts. > > If I understand correctly you would like some syntactic sugar in Hive to > support partitioning as per Iceberg. I cannot tell if that's really useful > or not but from my point of view it doesn't seem a very good idea to > introduce another layer of parsing before the actual parser (don't know if > there is one already). For instance, how are you gonna handle the situation > where there are syntax errors in your sugared part and what the end user > should see? > > No matter how it is added if you give the possibility to the user to write > such queries it becomes part of the Hive syntax and as such a job of the > parser. > > Best, > Stamatis > > > On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <pv...@cloudera.com> wrote: > >> Hi Hive experts, >> >> I would like to extend Hive SQL language to provide a way to create >> Iceberg partitioned tables like this: >> >> create table iceberg_test( >> level string, >> event_time timestamp, >> message string, >> register_time date, >> telephone array <string> >> ) >> partition by spec( >> level identity, >> event_time identity, >> event_time hour, >> register_time day >> ) >> stored as iceberg; >> >> >> The problem is that this syntax is very specific of Iceberg, and I think >> it is not a good idea to change the Hive syntax globally to accommodate a >> specific use-case. >> The following CREATE TABLE statement could archive the same thing: >> >> create table iceberg_test( >> level string, >> event_time timestamp, >> message string, >> register_time date, >> telephone array <string> >> ) >> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' >> TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...'); >> >> >> I am looking for a way to rewrite the original (Hive syntactically not >> correct) query to a new (syntactically correct) one. >> >> I was checking the hooks as a possible solution, but I have found that: >> >> - HiveDriverRunHook.preDriverRun can get the original / syntactically >> not correct query, but I have found no way to rewrite it to a >> syntactically >> correct one (it looks like a read only query) >> - HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a >> syntactically correct query to start with >> >> >> Any other ideas how to archive the goals above? Either with Hooks, or >> with any other way? >> >> Thanks, >> Peter >> > > -- ---------------------------------- Pau Tallada Crespí Departament de Serveis Port d'Informació Científica (PIC) Tel: +34 93 170 2729 ----------------------------------