Any ideas when 0.5.0 will be released? I could take advantage of this too.

Thanks,
Ryan

On Fri, Dec 18, 2009 at 2:03 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Sagi,
>
> Sounds like you need CombineFileInputFormat. See:
>
> https://issues.apache.org/jira/browse/HIVE-74
>
> -Todd
>
>
> On Fri, Dec 18, 2009 at 10:24 AM, Sagi, Lee <ls...@shopping.com> wrote:
>
>>  Yes that's true, I have a process that runs and pulls 3 weblogs files
>> every hour from 10 servers...10*3*24=720 (not all hours have all the files)
>>
>>
>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>> Cell: 718-930-7947
>>
>>
>>  ------------------------------
>> *From:* Todd Lipcon [mailto:t...@cloudera.com]
>> *Sent:* Thursday, December 17, 2009 4:24 PM
>>
>> *To:* hive-user@hadoop.apache.org
>> *Subject:* Re: Throttling hive queries
>>
>> Hi Sagi,
>>
>> Any chance you're running on a directory that has 614 small files?
>>
>> -Todd
>>
>> On Thu, Dec 17, 2009 at 2:30 PM, Sagi, Lee <ls...@shopping.com> wrote:
>>
>>>    Todd, Here is the job info.
>>>
>>>
>>>
>>> Counter Map Reduce Total File Systems HDFS bytes read 199,115,508 0
>>> 199,115,508 HDFS bytes written 0 9,665,472 9,665,472 Local bytes read 0
>>> 321,210,205 321,210,205 Local bytes written 204,404,812 321,210,205
>>> 525,615,017 Job Counters Launched reduce tasks 0 0 1 Rack-local map
>>> tasks 0 0 614 Launched map tasks 0 0 37,130 Data-local map tasks 0 0
>>> 36,516 org.apache.hadoop.hive.ql.exec.FilterOperator$Counter PASSED 0
>>> 10,572 10,572 FILTERED 0 217,305 217,305
>>> org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS 0
>>> 0 0 Map-Reduce Framework Reduce input groups 0 429,557 429,557 Combine
>>> output records 0 0 0 Map input records 429,557 0 429,557 Reduce output
>>> records 0 0 0 Map output bytes 201,425,848 0 201,425,848 Map input bytes
>>> 199,115,508 0 199,115,508 Map output records 429,557 0 429,557 Combine
>>> input records 0 0 0 Reduce input records 0 429,557 429,557
>>>
>>>
>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>>> Cell: 718-930-7947
>>>
>>>
>>>  ------------------------------
>>> *From:* Todd Lipcon [mailto:t...@cloudera.com]
>>> *Sent:* Thursday, December 17, 2009 12:18 PM
>>>
>>> *To:* hive-user@hadoop.apache.org
>>> *Subject:* Re: Throttling hive queries
>>>
>>>   Hi Lee,
>>>
>>> The MapReduce framework in general makes it hard for you assign fewer
>>> mappers than there are blocks in the input data, when using FileInputFormat.
>>> Is your input set about 42GB with a 64M block size, or 84G with a 128M block
>>> size?
>>>
>>> -Todd
>>>
>>> On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote:
>>>
>>>> Here is the query that I am running, just in case someone has an idea of
>>>> how to improve it.
>>>>
>>>> SELECT
>>>>      CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'),
>>>>      CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'),
>>>>      CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'),
>>>>      CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'),
>>>>      CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'),
>>>>      CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'),
>>>>      ...
>>>>      ...
>>>>      ...
>>>>  FROM
>>>>      FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON
>>>> (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID)
>>>>  WHERE (PRSS.date_key >= '2009121600' AND
>>>>        PRSS.date_key < '2009121700') OR
>>>>       (PRSC.date_key >= '2009121600' AND
>>>>        PRSC.date_key < '2009121700')
>>>>
>>>>
>>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 |
>>>> Cell: 718-930-7947
>>>>
>>>> -----Original Message-----
>>>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>>>> Sent: Thursday, December 17, 2009 11:03 AM
>>>> To: hive-user@hadoop.apache.org
>>>> Subject: Re: Throttling hive queries
>>>>
>>>>  You should be able
>>>>
>>>> hive > set mapred.map.tasks=1000
>>>> hive > set mapred.reduce.tasks=5
>>>>
>>>> In some cases mappers is controlled by input files (pre hadoop 20)
>>>>
>>>>
>>>> On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com> wrote:
>>>> > Is there a way to throttle hive queries?
>>>> >
>>>> > For example, I want to tell hive to not use more then 1000 mappers and
>>>>
>>>> > 5 reducers for a particular query (or session).
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to