filter operations only

Namit Jain (JIRA) Sat, 28 Jul 2012 13:37:37 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424418#comment-13424418
 ]


Namit Jain commented on HIVE-2925:
----------------------------------

Copying a mail I sent to dev@ for review....

Currently, hive does not launch map-reduce jobs for the following queries:

select * from <T> where <condition on partition columns> (limit <n>)?

This behavior is not configurable, and cannot be altered.

HIVE-2925 wants to extend this behavior. The goal is not to spawn map-reduce 
jobs for the following queries:

Select <expr> from <T> where <any condition> (limit <n>)?

It is currently controlled by one parameter: 
hive.aggressive.fetch.task.conversion, based on which it is decided, whether to 
spawn
map-reduce jobs or not for the queries of the above type. Note that this can be 
beneficial for certain types of queries, since it is
avoiding the expensive step of spawning map-reduce. However, it can be pretty 
expensive for certain types of queries: selecting
a very large number of rows, the query having a very selective filter (which is 
satisfied by a very number of rows, and therefore involves
scanning a very large table) etc. The user does not have any control on this. 
Note that it cannot be done by hooks, since the pre-semantic
hooks does not have enough information: type of the query, inputs etc. and it 
is too late to do anything in the post-semantic hook (the 
query plan has already been altered).

I would like to propose the following configuration parameters to control this 
behavior.
hive.fetch.task.conversion: true, false, auto

If the value is true, then all queries with only selects and filters will be 
converted
If the value is false, then no query will be converted
If the value is auto (which should be the default behavior), there should be 
additional parameters to control the semantics.

hive.fetch.task.auto.limit.threshold               ---> integer value X1 
hive.fetch.task.auto.inputsize.threshold      ---> integer value X2

If either the query has a limit lower than X1, or the input size is smaller 
than X2, the queries containing only filters and selects will be converted to 
not use
map-reudce jobs.

                
> Support non-MR fetching for simple queries with select/limit/filter 
> operations only
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2925
>                 URL: https://issues.apache.org/jira/browse/HIVE-2925
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-2925.D2607.1.patch, HIVE-2925.D2607.2.patch, 
> HIVE-2925.D2607.3.patch, HIVE-2925.D2607.4.patch
>
>
> It's trivial but frequently asked by end-users. Currently, select queries 
> with simple conditions or limit should run MR job which takes some time 
> especially for big tables, making the people irritated.
> For that kind of simple queries, using fetch task would make them happy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2925) Support non-MR fetching for simple queries with select/limit/filter operations only

Reply via email to