Re: Throttling hive queries

Zheng Shao Sat, 19 Dec 2009 12:20:13 -0800

We plan to run a vote and branch 0.5 around early Jan.

However we do run trunk for some adhoc queries (note that trunk and
branch 0.4 can share the metastore and data on hdfs) and branch 0.4
for production queries.


Hive trunk does support combine file input format but the fix to
hadoop 0.20 was committed only in Oct 19 (see HADOOP-5759)


Zheng



On 12/18/09, Ryan LeCompte <lecom...@gmail.com> wrote:
> Any ideas when 0.5.0 will be released? I could take advantage of this too.
>
> Thanks,
> Ryan
>
> On Fri, Dec 18, 2009 at 2:03 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hi Sagi,
>>
>> Sounds like you need CombineFileInputFormat. See:
>>
>> https://issues.apache.org/jira/browse/HIVE-74
>>
>> -Todd
>>
>>
>> On Fri, Dec 18, 2009 at 10:24 AM, Sagi, Lee <ls...@shopping.com> wrote:
>>
>>>  Yes that's true, I have a process that runs and pulls 3 weblogs files
>>> every hour from 10 servers...10*3*24=720 (not all hours have all the
>>> files)
>>>
>>>
>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>>> Cell: 718-930-7947
>>>
>>>
>>>  ------------------------------
>>> *From:* Todd Lipcon [mailto:t...@cloudera.com]
>>> *Sent:* Thursday, December 17, 2009 4:24 PM
>>>
>>> *To:* hive-user@hadoop.apache.org
>>> *Subject:* Re: Throttling hive queries
>>>
>>> Hi Sagi,
>>>
>>> Any chance you're running on a directory that has 614 small files?
>>>
>>> -Todd
>>>
>>> On Thu, Dec 17, 2009 at 2:30 PM, Sagi, Lee <ls...@shopping.com> wrote:
>>>
>>>>    Todd, Here is the job info.
>>>>
>>>>
>>>>
>>>> Counter Map Reduce Total File Systems HDFS bytes read 199,115,508 0
>>>> 199,115,508 HDFS bytes written 0 9,665,472 9,665,472 Local bytes read 0
>>>> 321,210,205 321,210,205 Local bytes written 204,404,812 321,210,205
>>>> 525,615,017 Job Counters Launched reduce tasks 0 0 1 Rack-local map
>>>> tasks 0 0 614 Launched map tasks 0 0 37,130 Data-local map tasks 0 0
>>>> 36,516 org.apache.hadoop.hive.ql.exec.FilterOperator$Counter PASSED 0
>>>> 10,572 10,572 FILTERED 0 217,305 217,305
>>>> org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS 0
>>>> 0 0 Map-Reduce Framework Reduce input groups 0 429,557 429,557 Combine
>>>> output records 0 0 0 Map input records 429,557 0 429,557 Reduce output
>>>> records 0 0 0 Map output bytes 201,425,848 0 201,425,848 Map input bytes
>>>> 199,115,508 0 199,115,508 Map output records 429,557 0 429,557 Combine
>>>> input records 0 0 0 Reduce input records 0 429,557 429,557
>>>>
>>>>
>>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>>>> Cell: 718-930-7947
>>>>
>>>>
>>>>  ------------------------------
>>>> *From:* Todd Lipcon [mailto:t...@cloudera.com]
>>>> *Sent:* Thursday, December 17, 2009 12:18 PM
>>>>
>>>> *To:* hive-user@hadoop.apache.org
>>>> *Subject:* Re: Throttling hive queries
>>>>
>>>>   Hi Lee,
>>>>
>>>> The MapReduce framework in general makes it hard for you assign fewer
>>>> mappers than there are blocks in the input data, when using
>>>> FileInputFormat.
>>>> Is your input set about 42GB with a 64M block size, or 84G with a 128M
>>>> block
>>>> size?
>>>>
>>>> -Todd
>>>>
>>>> On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote:
>>>>
>>>>> Here is the query that I am running, just in case someone has an idea
>>>>> of
>>>>> how to improve it.
>>>>>
>>>>> SELECT
>>>>>      CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'),
>>>>>      CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'),
>>>>>      CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'),
>>>>>      CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'),
>>>>>      CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'),
>>>>>      CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'),
>>>>>      ...
>>>>>      ...
>>>>>      ...
>>>>>  FROM
>>>>>      FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON
>>>>> (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID)
>>>>>  WHERE (PRSS.date_key >= '2009121600' AND
>>>>>        PRSS.date_key < '2009121700') OR
>>>>>       (PRSC.date_key >= '2009121600' AND
>>>>>        PRSC.date_key < '2009121700')
>>>>>
>>>>>
>>>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>>>>> Cell: 718-930-7947
>>>>>
>>>>> -----Original Message-----
>>>>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>>>>> Sent: Thursday, December 17, 2009 11:03 AM
>>>>> To: hive-user@hadoop.apache.org
>>>>> Subject: Re: Throttling hive queries
>>>>>
>>>>>  You should be able
>>>>>
>>>>> hive > set mapred.map.tasks=1000
>>>>> hive > set mapred.reduce.tasks=5
>>>>>
>>>>> In some cases mappers is controlled by input files (pre hadoop 20)
>>>>>
>>>>>
>>>>> On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com> wrote:
>>>>> > Is there a way to throttle hive queries?
>>>>> >
>>>>> > For example, I want to tell hive to not use more then 1000 mappers
>>>>> > and
>>>>>
>>>>> > 5 reducers for a particular query (or session).
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

-- 
Sent from Gmail for mobile | mobile.google.com

Yours,
Zheng

Re: Throttling hive queries

Reply via email to