The Tree Map and Set classes preserve the order of addition to the Map/Set.

On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman <[email protected]> wrote:
> The patch to MahoutDriver involves the code in the for loop at lines 203-216. 
> If the arg.startsWith("-D") then the arg needs to be added to argsList at 
> position 1, else at the end. I will commit a patch for this tonight as I have 
> not got my Narus CLA signed yet.
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:[email protected]]
> Sent: Wednesday, December 29, 2010 11:46 AM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> ok, thank you, Jeff. Good to know. I actually expected to rely on this for a
> wide range of issues (most common being task jvm parameters override).
>
> On Wed, Dec 29, 2010 at 11:29 AM, Jeff Eastman <[email protected]> wrote:
>
>> I've found the problem: the MahoutDriver uses a Map to organize the command
>> line arguments and this reorders them so that the -D arguments may not be
>> first. This causes them to be treated as job-specific options, causing the
>> failures. I'm working on a fix.
>>
>> Jeff
>>
>> -----Original Message-----
>> From: Jeff Eastman [mailto:[email protected]]
>> Sent: Tuesday, December 28, 2010 5:19 PM
>> To: [email protected]
>> Subject: RE: where i can set -Dmapred.map.tasks=X
>>
>> That's where I'm beginning to look too. It seems the driver code is working
>> correctly (I thought I had tested that) but the CLI isn't.
>>
>> The original post was for -Dmapred.map.tasks but I noticed the reduce.tasks
>> didn't work either.
>>
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> Sent: Tuesday, December 28, 2010 5:15 PM
>> To: [email protected]
>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>
>> Oh, so you are trying to set number of reduce tasks. i missed that,
>> original
>> post was about # of map tasks. sorry.
>>
>> No, no idea why that error pops up in mahout command line. i would need to
>> dig into the mahout's cli code -- i don't thing i dug that deep there
>> before.
>>
>> On Tue, Dec 28, 2010 at 5:06 PM, Jeff Eastman <[email protected]> wrote:
>>
>> > It's very odd: when I run k-means from Eclipse and add
>> > -Dmapred.reduce.tasks=10 as the first argument the driver loves it and
>> > job.getNumReduceTasks() is set correctly to 10. When I run the same
>> command
>> > line using bin/mahout; however, it fails:  with "Unexpected
>> > -Dmapred.reduce.tasks=10 while processing Job-Specific Options.
>> >
>> > The CLI invocation is: ./bin/mahout kmeans -Dmapred.reduce.tasks-10 -I
>> ...
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Dmitriy Lyubimov [mailto:[email protected]]
>> > Sent: Tuesday, December 28, 2010 4:55 PM
>> > To: [email protected]
>> > Subject: Re: where i can set -Dmapred.map.tasks=X
>> >
>> > PPS it doesn't tell you what InputFileFormat actually uses for it as a
>> > property, and i don't remember on top of my head either. but i assume you
>> > could use them with -D as well.
>> >
>> > On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov <[email protected]>
>> > wrote:
>> >
>> > > In particular, QJob is one of the drivers that uses that , in the
>> > following
>> > > way:
>> > >
>> > > f ( minSplitSize>0)
>> > >  SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize);
>> > >
>> > > Interestng pecularity about that parameter is that in the current
>> hadoop
>> > > release for anything derived from InputFileFormat it ensures that all
>> > splits
>> > > are at least that big and the last split is at least times 1.1  that
>> big.
>> > I
>> > > am not quite sure why special treatment for the last split but that's
>> how
>> > it
>> > > goes there.
>> > >
>> > > -Dmitriy
>> > >
>> > >
>> > > On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov <[email protected]
>> > >wrote:
>> > >
>> > >> Jeff,
>> > >>
>> > >> it's mahout-376 patch i don't think it is committed. the driver class
>> > >> there is SSVDCli, for your convenience you can find it here :
>> > >>
>> >
>> https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd
>> > >>
>> > >> but like i said, i did not try to use it with -D option since i wanted
>> > to
>> > >> give an explicit option to increase split size if needed (and a help
>> for
>> > >> it). Another reason is that solver has a series of jobs and only those
>> > >> reading the source matrix have anything to do with the split size.
>> > >>
>> > >>
>> > >> -d
>> > >>
>> > >>
>> > >> On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <[email protected]>
>> > wrote:
>> > >>
>> > >>> What's the driver class? If the -D parameters are working for you I
>> > want
>> > >>> to compare to the clustering drovers
>> > >>>
>> > >>> -----Original Message-----
>> > >>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> > >>> Sent: Tuesday, December 28, 2010 4:37 PM
>> > >>> To: [email protected]
>> > >>> Subject: Re: where i can set -Dmapred.map.tasks=X
>> > >>>
>> > >>> as far as i understand, this option is not forced. I suspect it
>> > actually
>> > >>> means 'minimum degree of parallelism'. so if you expect to use that
>> to
>> > >>> reduce number of mappers, i don't think this is expected to work so
>> > much.
>> > >>> The one that do enforce anything are min split size and max split
>> size
>> > in
>> > >>> file input so i guess you can try those. I rely on them (and open it
>> up
>> > >>> as a
>> > >>> job-specific option) in stochastic svd.
>> > >>>
>> > >>> but usually forcing split size to increase creates a 'superslits'
>> > >>> problem,
>> > >>> where a lot of data is moved around to just supply data to mappers.
>> > which
>> > >>> is
>> > >>> perhaps why this option is meant to increase parallelism only, but
>> > >>> probably
>> > >>> not to decrease it.
>> > >>>
>> > >>> -d
>> > >>>
>> > >>> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <[email protected]>
>> > >>> wrote:
>> > >>>
>> > >>> > This is supposed to be a generic option. You should be able to
>> > specify
>> > >>> > Hadoop options such as this on the command line invocation of your
>> > >>> favorite
>> > >>> > Mahout routine, but I'm having a similar problem setting
>> > >>> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with
>> > and
>> > >>> > without a space after the -D.
>> > >>> >
>> > >>> > Can someone point me to a Mahout command where this does work? Both
>> > >>> drivers
>> > >>> > extend AbstractJob and do the usual option processing pushups. I
>> > don't
>> > >>> have
>> > >>> > Hadoop source locally so I can't debug the generic options parsing.
>> > >>> >
>> > >>> > -----Original Message-----
>> > >>> > From: beneo_7 [mailto:[email protected]]
>> > >>> > Sent: Monday, December 27, 2010 10:45 PM
>> > >>> > To: [email protected]
>> > >>> > Subject: where i can set -Dmapred.map.tasks=X
>> > >>> >
>> > >>> > i read onMahout in Action that I should set -Dmapred.map.tasks=X
>> > >>> > but it did not work for hadoop
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to