Validating Hive statements

2020-02-19 Thread Odon Copon
Hi,
I was wondering what would be the easiest way to validate a Hive script
with multiple query statements offline. I thought it was possible to do
that will the following Java code but it doesn't look like is possible for
all of them:

---
import org.apache.hadoop.hive.ql.parse.ParseDriver;

ParseDriver pd = new ParseDriver();
pd.parse(query);
---

With that tiny snippet, I'm able to validate some queries but fails with
other statements like:
 - ADD JAR ;

or:
 - SET =true;

I would like not to omit those and be able to parse a script with multiple
query statements.
Are there any tips you could give me to help me with this? Currently, I'm
splitting by semicolon and discarding lines with ADD and SET statements,
but there must something there I'm missing.

Thanks.


Re: Validating Hive statements

2020-02-19 Thread Elliot West
Hi,

If I recall correctly, not all script input is handled by the parser, and
the CLI takes care of some statements prior to parsing of SQL - comments
are an example. Also, there is some divergence between Beeline and hive
CLI. In HiveRunner we handled this by providing different CLI emulations:

https://github.com/klarna/HiveRunner/tree/master/src/main/java/com/klarna/hiverunner/sql/cli

Elliot.

On Wed, 19 Feb 2020 at 13:55, Odon Copon  wrote:

> Hi,
> I was wondering what would be the easiest way to validate a Hive script
> with multiple query statements offline. I thought it was possible to do
> that will the following Java code but it doesn't look like is possible for
> all of them:
>
> ---
> import org.apache.hadoop.hive.ql.parse.ParseDriver;
>
> ParseDriver pd = new ParseDriver();
> pd.parse(query);
> ---
>
> With that tiny snippet, I'm able to validate some queries but fails with
> other statements like:
>  - ADD JAR ;
>
> or:
>  - SET =true;
>
> I would like not to omit those and be able to parse a script with multiple
> query statements.
> Are there any tips you could give me to help me with this? Currently, I'm
> splitting by semicolon and discarding lines with ADD and SET statements,
> but there must something there I'm missing.
>
> Thanks.
>


Re: Validating Hive statements

2020-02-19 Thread Odon Copon
Hi Elliot,
Thanks for your quick response.
Are you saying that things like SETs and other stuff is handled by the CLI
and doesn't reach the parser? Is there any example or testing I could check
to see how does this work?

Thanks.

On Wed, 19 Feb 2020 at 14:21, Elliot West  wrote:

> Hi,
>
> If I recall correctly, not all script input is handled by the parser, and
> the CLI takes care of some statements prior to parsing of SQL - comments
> are an example. Also, there is some divergence between Beeline and hive
> CLI. In HiveRunner we handled this by providing different CLI emulations:
>
>
> https://github.com/klarna/HiveRunner/tree/master/src/main/java/com/klarna/hiverunner/sql/cli
>
> Elliot.
>
> On Wed, 19 Feb 2020 at 13:55, Odon Copon  wrote:
>
>> Hi,
>> I was wondering what would be the easiest way to validate a Hive script
>> with multiple query statements offline. I thought it was possible to do
>> that will the following Java code but it doesn't look like is possible for
>> all of them:
>>
>> ---
>> import org.apache.hadoop.hive.ql.parse.ParseDriver;
>>
>> ParseDriver pd = new ParseDriver();
>> pd.parse(query);
>> ---
>>
>> With that tiny snippet, I'm able to validate some queries but fails with
>> other statements like:
>>  - ADD JAR ;
>>
>> or:
>>  - SET =true;
>>
>> I would like not to omit those and be able to parse a script with
>> multiple query statements.
>> Are there any tips you could give me to help me with this? Currently, I'm
>> splitting by semicolon and discarding lines with ADD and SET statements,
>> but there must something there I'm missing.
>>
>> Thanks.
>>
>


Re: Validating Hive statements

2020-02-19 Thread Odon Copon
Hi,
I can confirm as Elliot was mentioning, the CLI takes care of comments
(justed tested that) and splitting the statements, but ADDs and SETs are
still kept, and the parser breaks when trying to parse them.
Is there any other middle step I should be aware of?
Thanks.

On Wed, 19 Feb 2020 at 14:51, Odon Copon  wrote:

> Hi Elliot,
> Thanks for your quick response.
> Are you saying that things like SETs and other stuff is handled by the CLI
> and doesn't reach the parser? Is there any example or testing I could check
> to see how does this work?
>
> Thanks.
>
> On Wed, 19 Feb 2020 at 14:21, Elliot West  wrote:
>
>> Hi,
>>
>> If I recall correctly, not all script input is handled by the parser, and
>> the CLI takes care of some statements prior to parsing of SQL - comments
>> are an example. Also, there is some divergence between Beeline and hive
>> CLI. In HiveRunner we handled this by providing different CLI emulations:
>>
>>
>> https://github.com/klarna/HiveRunner/tree/master/src/main/java/com/klarna/hiverunner/sql/cli
>>
>> Elliot.
>>
>> On Wed, 19 Feb 2020 at 13:55, Odon Copon  wrote:
>>
>>> Hi,
>>> I was wondering what would be the easiest way to validate a Hive script
>>> with multiple query statements offline. I thought it was possible to do
>>> that will the following Java code but it doesn't look like is possible for
>>> all of them:
>>>
>>> ---
>>> import org.apache.hadoop.hive.ql.parse.ParseDriver;
>>>
>>> ParseDriver pd = new ParseDriver();
>>> pd.parse(query);
>>> ---
>>>
>>> With that tiny snippet, I'm able to validate some queries but fails with
>>> other statements like:
>>>  - ADD JAR ;
>>>
>>> or:
>>>  - SET =true;
>>>
>>> I would like not to omit those and be able to parse a script with
>>> multiple query statements.
>>> Are there any tips you could give me to help me with this? Currently,
>>> I'm splitting by semicolon and discarding lines with ADD and SET
>>> statements, but there must something there I'm missing.
>>>
>>> Thanks.
>>>
>>