Re: Throttling hive queries

Todd Lipcon Thu, 17 Dec 2009 12:18:46 -0800

Hi Lee,

The MapReduce framework in general makes it hard for you assign fewer
mappers than there are blocks in the input data, when using FileInputFormat.
Is your input set about 42GB with a 64M block size, or 84G with a 128M block
size?


-Todd

On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote:

> Here is the query that I am running, just in case someone has an idea of
> how to improve it.
>
> SELECT
>      CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'),
>      CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'),
>      CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'),
>      CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'),
>      CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'),
>      CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'),
>      ...
>      ...
>      ...
>  FROM
>      FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON
> (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID)
>  WHERE (PRSS.date_key >= '2009121600' AND
>        PRSS.date_key < '2009121700') OR
>       (PRSC.date_key >= '2009121600' AND
>        PRSC.date_key < '2009121700')
>
>
> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
> Cell: 718-930-7947
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Thursday, December 17, 2009 11:03 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Throttling hive queries
>
> You should be able
>
> hive > set mapred.map.tasks=1000
> hive > set mapred.reduce.tasks=5
>
> In some cases mappers is controlled by input files (pre hadoop 20)
>
>
> On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com> wrote:
> > Is there a way to throttle hive queries?
> >
> > For example, I want to tell hive to not use more then 1000 mappers and
>
> > 5 reducers for a particular query (or session).
> >
>

Re: Throttling hive queries

Reply via email to