ok, thank you, Jeff. Good to know. I actually expected to rely on this for a
wide range of issues (most common being task jvm parameters override).

On Wed, Dec 29, 2010 at 11:29 AM, Jeff Eastman <[email protected]> wrote:

> I've found the problem: the MahoutDriver uses a Map to organize the command
> line arguments and this reorders them so that the -D arguments may not be
> first. This causes them to be treated as job-specific options, causing the
> failures. I'm working on a fix.
>
> Jeff
>
> -----Original Message-----
> From: Jeff Eastman [mailto:[email protected]]
> Sent: Tuesday, December 28, 2010 5:19 PM
> To: [email protected]
> Subject: RE: where i can set -Dmapred.map.tasks=X
>
> That's where I'm beginning to look too. It seems the driver code is working
> correctly (I thought I had tested that) but the CLI isn't.
>
> The original post was for -Dmapred.map.tasks but I noticed the reduce.tasks
> didn't work either.
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:[email protected]]
> Sent: Tuesday, December 28, 2010 5:15 PM
> To: [email protected]
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> Oh, so you are trying to set number of reduce tasks. i missed that,
> original
> post was about # of map tasks. sorry.
>
> No, no idea why that error pops up in mahout command line. i would need to
> dig into the mahout's cli code -- i don't thing i dug that deep there
> before.
>
> On Tue, Dec 28, 2010 at 5:06 PM, Jeff Eastman <[email protected]> wrote:
>
> > It's very odd: when I run k-means from Eclipse and add
> > -Dmapred.reduce.tasks=10 as the first argument the driver loves it and
> > job.getNumReduceTasks() is set correctly to 10. When I run the same
> command
> > line using bin/mahout; however, it fails:  with "Unexpected
> > -Dmapred.reduce.tasks=10 while processing Job-Specific Options.
> >
> > The CLI invocation is: ./bin/mahout kmeans -Dmapred.reduce.tasks-10 -I
> ...
> >
> >
> >
> > -----Original Message-----
> > From: Dmitriy Lyubimov [mailto:[email protected]]
> > Sent: Tuesday, December 28, 2010 4:55 PM
> > To: [email protected]
> > Subject: Re: where i can set -Dmapred.map.tasks=X
> >
> > PPS it doesn't tell you what InputFileFormat actually uses for it as a
> > property, and i don't remember on top of my head either. but i assume you
> > could use them with -D as well.
> >
> > On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > In particular, QJob is one of the drivers that uses that , in the
> > following
> > > way:
> > >
> > > f ( minSplitSize>0)
> > >  SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize);
> > >
> > > Interestng pecularity about that parameter is that in the current
> hadoop
> > > release for anything derived from InputFileFormat it ensures that all
> > splits
> > > are at least that big and the last split is at least times 1.1  that
> big.
> > I
> > > am not quite sure why special treatment for the last split but that's
> how
> > it
> > > goes there.
> > >
> > > -Dmitriy
> > >
> > >
> > > On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov <[email protected]
> > >wrote:
> > >
> > >> Jeff,
> > >>
> > >> it's mahout-376 patch i don't think it is committed. the driver class
> > >> there is SSVDCli, for your convenience you can find it here :
> > >>
> >
> https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd
> > >>
> > >> but like i said, i did not try to use it with -D option since i wanted
> > to
> > >> give an explicit option to increase split size if needed (and a help
> for
> > >> it). Another reason is that solver has a series of jobs and only those
> > >> reading the source matrix have anything to do with the split size.
> > >>
> > >>
> > >> -d
> > >>
> > >>
> > >> On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <[email protected]>
> > wrote:
> > >>
> > >>> What's the driver class? If the -D parameters are working for you I
> > want
> > >>> to compare to the clustering drovers
> > >>>
> > >>> -----Original Message-----
> > >>> From: Dmitriy Lyubimov [mailto:[email protected]]
> > >>> Sent: Tuesday, December 28, 2010 4:37 PM
> > >>> To: [email protected]
> > >>> Subject: Re: where i can set -Dmapred.map.tasks=X
> > >>>
> > >>> as far as i understand, this option is not forced. I suspect it
> > actually
> > >>> means 'minimum degree of parallelism'. so if you expect to use that
> to
> > >>> reduce number of mappers, i don't think this is expected to work so
> > much.
> > >>> The one that do enforce anything are min split size and max split
> size
> > in
> > >>> file input so i guess you can try those. I rely on them (and open it
> up
> > >>> as a
> > >>> job-specific option) in stochastic svd.
> > >>>
> > >>> but usually forcing split size to increase creates a 'superslits'
> > >>> problem,
> > >>> where a lot of data is moved around to just supply data to mappers.
> > which
> > >>> is
> > >>> perhaps why this option is meant to increase parallelism only, but
> > >>> probably
> > >>> not to decrease it.
> > >>>
> > >>> -d
> > >>>
> > >>> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <[email protected]>
> > >>> wrote:
> > >>>
> > >>> > This is supposed to be a generic option. You should be able to
> > specify
> > >>> > Hadoop options such as this on the command line invocation of your
> > >>> favorite
> > >>> > Mahout routine, but I'm having a similar problem setting
> > >>> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with
> > and
> > >>> > without a space after the -D.
> > >>> >
> > >>> > Can someone point me to a Mahout command where this does work? Both
> > >>> drivers
> > >>> > extend AbstractJob and do the usual option processing pushups. I
> > don't
> > >>> have
> > >>> > Hadoop source locally so I can't debug the generic options parsing.
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: beneo_7 [mailto:[email protected]]
> > >>> > Sent: Monday, December 27, 2010 10:45 PM
> > >>> > To: [email protected]
> > >>> > Subject: where i can set -Dmapred.map.tasks=X
> > >>> >
> > >>> > i read onMahout in Action that I should set -Dmapred.map.tasks=X
> > >>> > but it did not work for hadoop
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Reply via email to