Force users to specify partition indexes in queries

2015-09-29 Thread Smit Shah
Hey,

We have exposed hive console to our org via command line and through Hue
ui. However, we are currently facing issues when user runs a blanket select
or insert query from table without specifying partition indexes.

I wanted to know if its possible to enforce users to provide partition
index in queries or fail otherwise. (Our partition indexes are date and
time). Also, if that's possible we also want to restrict the range of date
to 2 days so no malicious user can effect the cluster availability by
querying for months worth of data.


If this is not doable in Hive at the moment, I am interested in writing a
patch for it. I am not familiar with hive codebase so not sure how complex
this is. Any hints or tips would be great and if I do need to write such a
patch, I would be happy to contribute back to the source.


Regards,
Smit


Re: Force users to specify partition indexes in queries

2015-09-29 Thread Gopal Vijayaraghavan

> If this is not doable in Hive at the moment, I am interested in writing
>a patch for it. I am not familiar with hive codebase so not sure how
>complex this is. Any hints or tips would be great and if I do need to
>write such a patch, I would be happy to contribute back to the source.

There are execution hooks in hive, which can be used to walk the operator
tree & look for a TableScan operator.

Here's an example of me doing something really horrible with a hook

https://github.com/t3rmin4t0r/captain-hook


You can have your hive hook look up the table + filter clause & then deny
a query.

Cheers,
Gopal




Re: Force users to specify partition indexes in queries

2015-09-29 Thread Ashutosh Chauhan
set hive.mapred.mode = strict;
This will fail query if query doesnt specify filter containing partitioning
column.

On Tue, Sep 29, 2015 at 10:56 AM, Smit Shah  wrote:

> Hey,
>
> We have exposed hive console to our org via command line and through Hue
> ui. However, we are currently facing issues when user runs a blanket select
> or insert query from table without specifying partition indexes.
>
> I wanted to know if its possible to enforce users to provide partition
> index in queries or fail otherwise. (Our partition indexes are date and
> time). Also, if that's possible we also want to restrict the range of date
> to 2 days so no malicious user can effect the cluster availability by
> querying for months worth of data.
>
>
> If this is not doable in Hive at the moment, I am interested in writing a
> patch for it. I am not familiar with hive codebase so not sure how complex
> this is. Any hints or tips would be great and if I do need to write such a
> patch, I would be happy to contribute back to the source.
>
>
> Regards,
> Smit
>


Re: Force users to specify partition indexes in queries

2015-09-29 Thread Sathi Chowdhury
hive.exec.dynamic.partition.mode=strict ,seems does the same thing.


From: Ashutosh Chauhan
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Date: Tuesday, September 29, 2015 at 1:48 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Subject: Re: Force users to specify partition indexes in queries

set hive.mapred.mode = strict;
This will fail query if query doesnt specify filter containing partitioning 
column.

On Tue, Sep 29, 2015 at 10:56 AM, Smit Shah 
mailto:who...@gmail.com>> wrote:
Hey,

We have exposed hive console to our org via command line and through Hue ui. 
However, we are currently facing issues when user runs a blanket select or 
insert query from table without specifying partition indexes.

I wanted to know if its possible to enforce users to provide partition index in 
queries or fail otherwise. (Our partition indexes are date and time). Also, if 
that's possible we also want to restrict the range of date to 2 days so no 
malicious user can effect the cluster availability by querying for months worth 
of data.


If this is not doable in Hive at the moment, I am interested in writing a patch 
for it. I am not familiar with hive codebase so not sure how complex this is. 
Any hints or tips would be great and if I do need to write such a patch, I 
would be happy to contribute back to the source.


Regards,
Smit



Re: Force users to specify partition indexes in queries

2015-09-30 Thread Smit Shah
hive.mapred.mode = strict is what I need but I want them to restrict to
only one particular index and that too inside a particular max range. I
think hive hooks seems like what I need but my problem is anyone can
override the config in the session.

So if someone discovers that its enforced using postAnalyze hook, then he
can disable it.

Is there a way around this? or my only option is to patch at hive server
code?

On Wed, Sep 30, 2015 at 2:28 AM, Sathi Chowdhury <
sathi.chowdh...@lithium.com> wrote:

> hive.exec.dynamic.partition.mode=strict ,seems does the same thing.
>
>
> From: Ashutosh Chauhan
> Reply-To: "user@hive.apache.org"
> Date: Tuesday, September 29, 2015 at 1:48 PM
> To: "user@hive.apache.org"
> Subject: Re: Force users to specify partition indexes in queries
>
> set hive.mapred.mode = strict;
> This will fail query if query doesnt specify filter containing
> partitioning column.
>
> On Tue, Sep 29, 2015 at 10:56 AM, Smit Shah  wrote:
>
>> Hey,
>>
>> We have exposed hive console to our org via command line and through Hue
>> ui. However, we are currently facing issues when user runs a blanket select
>> or insert query from table without specifying partition indexes.
>>
>> I wanted to know if its possible to enforce users to provide partition
>> index in queries or fail otherwise. (Our partition indexes are date and
>> time). Also, if that's possible we also want to restrict the range of date
>> to 2 days so no malicious user can effect the cluster availability by
>> querying for months worth of data.
>>
>>
>> If this is not doable in Hive at the moment, I am interested in writing a
>> patch for it. I am not familiar with hive codebase so not sure how complex
>> this is. Any hints or tips would be great and if I do need to write such a
>> patch, I would be happy to contribute back to the source.
>>
>>
>> Regards,
>> Smit
>>
>
>


Re: Force users to specify partition indexes in queries

2015-09-30 Thread Ashutosh Chauhan
For your second use case, there is another config :
*hive.limit.query.max.table.partition*
set it to number of partitions you want to allow in a query.

On Wed, Sep 30, 2015 at 5:01 AM, Smit Shah  wrote:

> hive.mapred.mode = strict is what I need but I want them to restrict to
> only one particular index and that too inside a particular max range. I
> think hive hooks seems like what I need but my problem is anyone can
> override the config in the session.
>
> So if someone discovers that its enforced using postAnalyze hook, then he
> can disable it.
>
> Is there a way around this? or my only option is to patch at hive server
> code?
>
> On Wed, Sep 30, 2015 at 2:28 AM, Sathi Chowdhury <
> sathi.chowdh...@lithium.com> wrote:
>
>> hive.exec.dynamic.partition.mode=strict ,seems does the same thing.
>>
>>
>> From: Ashutosh Chauhan
>> Reply-To: "user@hive.apache.org"
>> Date: Tuesday, September 29, 2015 at 1:48 PM
>> To: "user@hive.apache.org"
>> Subject: Re: Force users to specify partition indexes in queries
>>
>> set hive.mapred.mode = strict;
>> This will fail query if query doesnt specify filter containing
>> partitioning column.
>>
>> On Tue, Sep 29, 2015 at 10:56 AM, Smit Shah  wrote:
>>
>>> Hey,
>>>
>>> We have exposed hive console to our org via command line and through Hue
>>> ui. However, we are currently facing issues when user runs a blanket select
>>> or insert query from table without specifying partition indexes.
>>>
>>> I wanted to know if its possible to enforce users to provide partition
>>> index in queries or fail otherwise. (Our partition indexes are date and
>>> time). Also, if that's possible we also want to restrict the range of date
>>> to 2 days so no malicious user can effect the cluster availability by
>>> querying for months worth of data.
>>>
>>>
>>> If this is not doable in Hive at the moment, I am interested in writing
>>> a patch for it. I am not familiar with hive codebase so not sure how
>>> complex this is. Any hints or tips would be great and if I do need to write
>>> such a patch, I would be happy to contribute back to the source.
>>>
>>>
>>> Regards,
>>> Smit
>>>
>>
>>
>