Any ideas when 0.5.0 will be released? I could take advantage of this too. Thanks, Ryan
On Fri, Dec 18, 2009 at 2:03 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Sagi, > > Sounds like you need CombineFileInputFormat. See: > > https://issues.apache.org/jira/browse/HIVE-74 > > -Todd > > > On Fri, Dec 18, 2009 at 10:24 AM, Sagi, Lee <ls...@shopping.com> wrote: > >> Yes that's true, I have a process that runs and pulls 3 weblogs files >> every hour from 10 servers...10*3*24=720 (not all hours have all the files) >> >> >> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | >> Cell: 718-930-7947 >> >> >> ------------------------------ >> *From:* Todd Lipcon [mailto:t...@cloudera.com] >> *Sent:* Thursday, December 17, 2009 4:24 PM >> >> *To:* hive-user@hadoop.apache.org >> *Subject:* Re: Throttling hive queries >> >> Hi Sagi, >> >> Any chance you're running on a directory that has 614 small files? >> >> -Todd >> >> On Thu, Dec 17, 2009 at 2:30 PM, Sagi, Lee <ls...@shopping.com> wrote: >> >>> Todd, Here is the job info. >>> >>> >>> >>> Counter Map Reduce Total File Systems HDFS bytes read 199,115,508 0 >>> 199,115,508 HDFS bytes written 0 9,665,472 9,665,472 Local bytes read 0 >>> 321,210,205 321,210,205 Local bytes written 204,404,812 321,210,205 >>> 525,615,017 Job Counters Launched reduce tasks 0 0 1 Rack-local map >>> tasks 0 0 614 Launched map tasks 0 0 37,130 Data-local map tasks 0 0 >>> 36,516 org.apache.hadoop.hive.ql.exec.FilterOperator$Counter PASSED 0 >>> 10,572 10,572 FILTERED 0 217,305 217,305 >>> org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS 0 >>> 0 0 Map-Reduce Framework Reduce input groups 0 429,557 429,557 Combine >>> output records 0 0 0 Map input records 429,557 0 429,557 Reduce output >>> records 0 0 0 Map output bytes 201,425,848 0 201,425,848 Map input bytes >>> 199,115,508 0 199,115,508 Map output records 429,557 0 429,557 Combine >>> input records 0 0 0 Reduce input records 0 429,557 429,557 >>> >>> >>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | >>> Cell: 718-930-7947 >>> >>> >>> ------------------------------ >>> *From:* Todd Lipcon [mailto:t...@cloudera.com] >>> *Sent:* Thursday, December 17, 2009 12:18 PM >>> >>> *To:* hive-user@hadoop.apache.org >>> *Subject:* Re: Throttling hive queries >>> >>> Hi Lee, >>> >>> The MapReduce framework in general makes it hard for you assign fewer >>> mappers than there are blocks in the input data, when using FileInputFormat. >>> Is your input set about 42GB with a 64M block size, or 84G with a 128M block >>> size? >>> >>> -Todd >>> >>> On Thu, Dec 17, 2009 at 11:32 AM, Sagi, Lee <ls...@shopping.com> wrote: >>> >>>> Here is the query that I am running, just in case someone has an idea of >>>> how to improve it. >>>> >>>> SELECT >>>> CONCAT(CONCAT('"', PRSS.DATE_KEY), '"'), >>>> CONCAT(CONCAT('"', PRSC.DATE_KEY), '"'), >>>> CONCAT(CONCAT('"', PRSS.VOTF_REQUEST_ID), '"'), >>>> CONCAT(CONCAT('"', PRSC.VOTF_REQUEST_ID), '"'), >>>> CONCAT(CONCAT('"', PRSS.PRS_REQUEST_ID), '"'), >>>> CONCAT(CONCAT('"', PRSC.PRS_REQUEST_ID), '"'), >>>> ... >>>> ... >>>> ... >>>> FROM >>>> FCT_PRSS PRSS FULL OUTER JOIN FCT_PRSC PRSC ON >>>> (PRSS.PRS_REQUEST_ID = PRSC.PRS_REQUEST_ID) >>>> WHERE (PRSS.date_key >= '2009121600' AND >>>> PRSS.date_key < '2009121700') OR >>>> (PRSC.date_key >= '2009121600' AND >>>> PRSC.date_key < '2009121700') >>>> >>>> >>>> Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | >>>> Cell: 718-930-7947 >>>> >>>> -----Original Message----- >>>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >>>> Sent: Thursday, December 17, 2009 11:03 AM >>>> To: hive-user@hadoop.apache.org >>>> Subject: Re: Throttling hive queries >>>> >>>> You should be able >>>> >>>> hive > set mapred.map.tasks=1000 >>>> hive > set mapred.reduce.tasks=5 >>>> >>>> In some cases mappers is controlled by input files (pre hadoop 20) >>>> >>>> >>>> On Thu, Dec 17, 2009 at 1:58 PM, Sagi, Lee <ls...@shopping.com> wrote: >>>> > Is there a way to throttle hive queries? >>>> > >>>> > For example, I want to tell hive to not use more then 1000 mappers and >>>> >>>> > 5 reducers for a particular query (or session). >>>> > >>>> >>> >>> >> >