Oh Cool Jeff, this is really helpful!

I actually have been dealing with relatively *small* records (of about 500 MB) 
and running very simple programs akin to wordcount so I will play around with 
io.sort.record.percent and see what results I get.

Thanks!

-----Original Message-----
From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Tuesday, December 22, 2009 8:35 PM
To: common-user@hadoop.apache.org
Subject: Re: io.sort.mb configuration?

Hey Mark,

While you're grokking this aspect of MapReduce's configuration, you may want
to check out https://issues.apache.org/jira/browse/MAPREDUCE-64, which is on
its way into trunk right now. Chris Douglas from Yahoo! has posted a very
nice explanation of how buffers are managed during the shuffle and which
parameters affect the behavior.

Regards,
Jeff

On Tue, Dec 22, 2009 at 12:30 PM, Mark Vigeant <mark.vige...@riskmetrics.com
> wrote:

> Thank you for the responses guys!
>
> First, to Patrick, I didn't set it in the code, though I will try it
> because that's a really good idea to set it there, so I shall play around
> with that.
>
> Long: I should have clarified, I am using 0.20.1, and so this is a bit
> different. I set the parameter in mapred-site.xml and for some reason it's
> just not getting implemented. Thank you anyways, though!
>
> -Mark
>
> -----Original Message-----
> From: Long Van Nguyen Dinh [mailto:munt...@gmail.com]
> Sent: Tuesday, December 22, 2009 12:17 PM
> To: common-user@hadoop.apache.org
> Subject: Re: io.sort.mb configuration?
>
> Hadoop has a default file (hadoop-default.xml - version 19) for all
> configuration, don't change the values in that file (they won't be
> affected), copy the parameter to the file hadoop-site.xml where you
> set up the cluster and set the value you want there.
>
> Long Van
>
> On Tue, Dec 22, 2009 at 11:40 AM, Patrick Angeles
> <patrickange...@gmail.com> wrote:
> > You can also set that param per-job. Maybe you called some code that did
> > that behind the scenes?
> >
> > On Tue, Dec 22, 2009 at 11:10 AM, Mark Vigeant <
> mark.vige...@riskmetrics.com
> >> wrote:
> >
> >> Hey Everyone-
> >>
> >> I've been playing around with Hadoop and Hbase for a while and I noticed
> >> that when running a program to upload data into an HTable I saw the
> output:
> >>
> >> INFO mapred.MapTask: io.sort.mb = 100
> >>
> >> Which is the default value, but in the mapred configuration on all
> machines
> >> in my cluster I set this value to 250. Could it be that my program is
> not
> >> accessing the configuration properly? Is that too large a value? Or is
> it
> >> most likely just a foolish syntax error on my part?
> >>
> >> Thank you very much, all input is appreciated.
> >>
> >> Mark Vigeant
> >> RiskMetrics Group, Inc.
> >>
> >>
> >> This email message and any attachments are for the sole use of the
> intended
> >> recipients and may contain proprietary and/or confidential information
> which
> >> may be privileged or otherwise protected from disclosure. Any
> unauthorized
> >> review, use, disclosure or distribution is prohibited. If you are not an
> >> intended recipient, please contact the sender by reply email and destroy
> the
> >> original message and any copies of the message as well as any
> attachments to
> >> the original message.
> >>
> >
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.

Reply via email to