I like Navis's idea. The timeout can be configurable.
On 7/29/12 6:47 AM, "Navis류승우" <navis....@nexr.com> wrote: >I was thinking of timeout for fetching, 2000msec for example. How about >that? > >2012년 7월 29일 일요일에 Edward Capriolo<edlinuxg...@gmail.com>님이 작성: >> If where condition is too complex , selecting specific columns seems >simple >> enough and useful. >> >> On Saturday, July 28, 2012, Namit Jain <nj...@fb.com> wrote: >>> Currently, hive does not launch map-reduce jobs for the following >queries: >>> >>> select * from <T> where <condition on partition columns> (limit <n>)? >>> >>> This behavior is not configurable, and cannot be altered. >>> >>> HIVE-2925 wants to extend this behavior. The goal is not to spawn >> map-reduce jobs for the following queries: >>> >>> Select <expr> from <T> where <any condition> (limit <n>)? >>> >>> It is currently controlled by one parameter: >> hive.aggressive.fetch.task.conversion, based on which it is decided, >> whether to spawn >>> map-reduce jobs or not for the queries of the above type. Note that >>>this >> can be beneficial for certain types of queries, since it is >>> avoiding the expensive step of spawning map-reduce. However, it can be >> pretty expensive for certain types of queries: selecting >>> a very large number of rows, the query having a very selective filter >> (which is satisfied by a very number of rows, and therefore involves >>> scanning a very large table) etc. The user does not have any control on >> this. Note that it cannot be done by hooks, since the pre-semantic >>> hooks does not have enough information: type of the query, inputs etc. >> and it is too late to do anything in the post-semantic hook (the >>> query plan has already been altered). >>> >>> I would like to propose the following configuration parameters to >>>control >> this behavior. >>> hive.fetch.task.conversion: true, false, auto >>> >>> If the value is true, then all queries with only selects and filters >>>will >> be converted >>> If the value is false, then no query will be converted >>> If the value is auto (which should be the default behavior), there >>>should >> be additional parameters to control the semantics. >>> >>> hive.fetch.task.auto.limit.threshold ---> integer value >>>X1 >>> hive.fetch.task.auto.inputsize.threshold ---> integer value X2 >>> >>> If either the query has a limit lower than X1, or the input size is >> smaller than X2, the queries containing only filters and selects will be >> converted to not use >>> map-reudce jobs. >>> >>> >>> Comments… >>> >>> -namit >>> >>> >>> >>