Re: Hive SQL extension

Pau Tallada Fri, 23 Oct 2020 08:52:39 -0700

Hi all,

I do not know if that may be of interest to you, but there are other
projects that could benefit from this.
For instance, ADQL
<https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html>
(Astronomical Data Query Language) is a SQL-like language that defines some
higher-level functions that enable powerful geospatial queries. Projects
like queryparser <https://github.com/aipescience/queryparser> are able to
translate from ADQL to vendor-SQL for MySQL or PostreSQL. In this case, the
syntactic sugar is implemented as an external layer on top, but could very
well be implemented in a rewrite hook if available.


Cheers,

Pau.

Missatge de Peter Vary <[email protected]> del dia dj., 22 d’oct. 2020 a
les 16:21:

>
> Let's assume that this feature would be useful for Iceberg tables, but
> useless and even problematic/forbidden for other tables. :)
>
> My thinking is, that it could make Hive much more user friendly, if we
> would allow for extensions in language.
>
> With Iceberg integration we plan to do several extensions which might not
> be useful for other tables. Some examples:
>
>    - When creating tables we want to send additional information to the
>    storage layer, and pushing everything in properties is a pain (not really
>    user friendly)
>    - We would like to allow querying table history for iceberg tables
>    (previous snapshotId-s, timestamps, etc)
>    - We would like to allow time travel for iceberg tables based on the
>    data queried above
>    - We would like to allow the user to see / manage / remove old
>    snapshots
>
>
> These are all very specific Iceberg related stuff, and most probably will
> not work / useful for any other type of the tables, so I think adding them
> to Hive parser would be a stretch.
>
> On the other hand if we do not provide SQL interface for accessing these
> features then the users will turn to Spark/Impala/Presto to be able to work
> with Iceberg tables.
>
> As for your specific question for handling syntax errors (I have just
> started to think about how would I do it, so feel free to suggest better
> methods):
>
>    - Let's assume that we have a hook which can get the sql command as an
>    input and can rewrite it to a new SQL command
>    - I would write simplified parser which tries to be as simple as
>    possible for the specific command
>    - Based on the parsing I would return the same command / throw an
>    exception / rewrite the command
>
>
> Admittedly this solution is working only if we can make every feature work
> without changing other part of Hive, and we just want to add "syntactic
> sugar" to it. (Do not underestimate the benefits of syntactic sugar :))
>
> Thanks,
> Peter
>
>
> On Oct 22, 2020, at 11:44, Stamatis Zampetakis <[email protected]> wrote:
>
> Hi Peter,
>
> I am nowhere near being an expert but just wanted to share my thoughts.
>
> If I understand correctly you would like some syntactic sugar in Hive to
> support partitioning as per Iceberg. I cannot tell if that's really useful
> or not but from my point of view it doesn't seem a very good idea to
> introduce another layer of parsing before the actual parser (don't know if
> there is one already). For instance, how are you gonna handle the situation
> where there are syntax errors in your sugared part and what the end user
> should see?
>
> No matter how it is added if you give the possibility to the user to write
> such queries it becomes part of the Hive syntax and as such a job of the
> parser.
>
> Best,
> Stamatis
>
>
> On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <[email protected]> wrote:
>
>> Hi Hive experts,
>>
>> I would like to extend Hive SQL language to provide a way to create
>> Iceberg partitioned tables like this:
>>
>> create table iceberg_test(
>>         level string,
>>         event_time timestamp,
>>         message string,
>>         register_time date,
>>         telephone array <string>
>>     )
>>     partition by spec(
>>         level identity,
>>         event_time identity,
>>         event_time hour,
>>         register_time day
>>     )
>>     stored as iceberg;
>>
>>
>> The problem is that this syntax is very specific of Iceberg, and I think
>> it is not a good idea to change the Hive syntax globally to accommodate a
>> specific use-case.
>> The following CREATE TABLE statement could archive the same thing:
>>
>> create table iceberg_test(
>>         level string,
>>         event_time timestamp,
>>         message string,
>>         register_time date,
>>         telephone array <string>
>>     )
>>     STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
>>     TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
>>
>>
>> I am looking for a way to rewrite the original (Hive syntactically not
>> correct) query to a new (syntactically correct) one.
>>
>> I was checking the hooks as a possible solution, but I have found that:
>>
>>    - HiveDriverRunHook.preDriverRun can get the original / syntactically
>>    not correct query, but I have found no way to rewrite it to a 
>> syntactically
>>    correct one (it looks like a read only query)
>>    - HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a
>>    syntactically correct query to start with
>>
>>
>> Any other ideas how to archive the goals above? Either with Hooks, or
>> with any other way?
>>
>> Thanks,
>> Peter
>>
>
>

-- 
----------------------------------
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
----------------------------------

Re: Hive SQL extension

Reply via email to