The Tree Map and Set classes preserve the order of addition to the Map/Set.
On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman <[email protected]> wrote: > The patch to MahoutDriver involves the code in the for loop at lines 203-216. > If the arg.startsWith("-D") then the arg needs to be added to argsList at > position 1, else at the end. I will commit a patch for this tonight as I have > not got my Narus CLA signed yet. > > -----Original Message----- > From: Dmitriy Lyubimov [mailto:[email protected]] > Sent: Wednesday, December 29, 2010 11:46 AM > To: [email protected] > Cc: [email protected] > Subject: Re: where i can set -Dmapred.map.tasks=X > > ok, thank you, Jeff. Good to know. I actually expected to rely on this for a > wide range of issues (most common being task jvm parameters override). > > On Wed, Dec 29, 2010 at 11:29 AM, Jeff Eastman <[email protected]> wrote: > >> I've found the problem: the MahoutDriver uses a Map to organize the command >> line arguments and this reorders them so that the -D arguments may not be >> first. This causes them to be treated as job-specific options, causing the >> failures. I'm working on a fix. >> >> Jeff >> >> -----Original Message----- >> From: Jeff Eastman [mailto:[email protected]] >> Sent: Tuesday, December 28, 2010 5:19 PM >> To: [email protected] >> Subject: RE: where i can set -Dmapred.map.tasks=X >> >> That's where I'm beginning to look too. It seems the driver code is working >> correctly (I thought I had tested that) but the CLI isn't. >> >> The original post was for -Dmapred.map.tasks but I noticed the reduce.tasks >> didn't work either. >> >> -----Original Message----- >> From: Dmitriy Lyubimov [mailto:[email protected]] >> Sent: Tuesday, December 28, 2010 5:15 PM >> To: [email protected] >> Subject: Re: where i can set -Dmapred.map.tasks=X >> >> Oh, so you are trying to set number of reduce tasks. i missed that, >> original >> post was about # of map tasks. sorry. >> >> No, no idea why that error pops up in mahout command line. i would need to >> dig into the mahout's cli code -- i don't thing i dug that deep there >> before. >> >> On Tue, Dec 28, 2010 at 5:06 PM, Jeff Eastman <[email protected]> wrote: >> >> > It's very odd: when I run k-means from Eclipse and add >> > -Dmapred.reduce.tasks=10 as the first argument the driver loves it and >> > job.getNumReduceTasks() is set correctly to 10. When I run the same >> command >> > line using bin/mahout; however, it fails: with "Unexpected >> > -Dmapred.reduce.tasks=10 while processing Job-Specific Options. >> > >> > The CLI invocation is: ./bin/mahout kmeans -Dmapred.reduce.tasks-10 -I >> ... >> > >> > >> > >> > -----Original Message----- >> > From: Dmitriy Lyubimov [mailto:[email protected]] >> > Sent: Tuesday, December 28, 2010 4:55 PM >> > To: [email protected] >> > Subject: Re: where i can set -Dmapred.map.tasks=X >> > >> > PPS it doesn't tell you what InputFileFormat actually uses for it as a >> > property, and i don't remember on top of my head either. but i assume you >> > could use them with -D as well. >> > >> > On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov <[email protected]> >> > wrote: >> > >> > > In particular, QJob is one of the drivers that uses that , in the >> > following >> > > way: >> > > >> > > f ( minSplitSize>0) >> > > SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize); >> > > >> > > Interestng pecularity about that parameter is that in the current >> hadoop >> > > release for anything derived from InputFileFormat it ensures that all >> > splits >> > > are at least that big and the last split is at least times 1.1 that >> big. >> > I >> > > am not quite sure why special treatment for the last split but that's >> how >> > it >> > > goes there. >> > > >> > > -Dmitriy >> > > >> > > >> > > On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov <[email protected] >> > >wrote: >> > > >> > >> Jeff, >> > >> >> > >> it's mahout-376 patch i don't think it is committed. the driver class >> > >> there is SSVDCli, for your convenience you can find it here : >> > >> >> > >> https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd >> > >> >> > >> but like i said, i did not try to use it with -D option since i wanted >> > to >> > >> give an explicit option to increase split size if needed (and a help >> for >> > >> it). Another reason is that solver has a series of jobs and only those >> > >> reading the source matrix have anything to do with the split size. >> > >> >> > >> >> > >> -d >> > >> >> > >> >> > >> On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <[email protected]> >> > wrote: >> > >> >> > >>> What's the driver class? If the -D parameters are working for you I >> > want >> > >>> to compare to the clustering drovers >> > >>> >> > >>> -----Original Message----- >> > >>> From: Dmitriy Lyubimov [mailto:[email protected]] >> > >>> Sent: Tuesday, December 28, 2010 4:37 PM >> > >>> To: [email protected] >> > >>> Subject: Re: where i can set -Dmapred.map.tasks=X >> > >>> >> > >>> as far as i understand, this option is not forced. I suspect it >> > actually >> > >>> means 'minimum degree of parallelism'. so if you expect to use that >> to >> > >>> reduce number of mappers, i don't think this is expected to work so >> > much. >> > >>> The one that do enforce anything are min split size and max split >> size >> > in >> > >>> file input so i guess you can try those. I rely on them (and open it >> up >> > >>> as a >> > >>> job-specific option) in stochastic svd. >> > >>> >> > >>> but usually forcing split size to increase creates a 'superslits' >> > >>> problem, >> > >>> where a lot of data is moved around to just supply data to mappers. >> > which >> > >>> is >> > >>> perhaps why this option is meant to increase parallelism only, but >> > >>> probably >> > >>> not to decrease it. >> > >>> >> > >>> -d >> > >>> >> > >>> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <[email protected]> >> > >>> wrote: >> > >>> >> > >>> > This is supposed to be a generic option. You should be able to >> > specify >> > >>> > Hadoop options such as this on the command line invocation of your >> > >>> favorite >> > >>> > Mahout routine, but I'm having a similar problem setting >> > >>> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with >> > and >> > >>> > without a space after the -D. >> > >>> > >> > >>> > Can someone point me to a Mahout command where this does work? Both >> > >>> drivers >> > >>> > extend AbstractJob and do the usual option processing pushups. I >> > don't >> > >>> have >> > >>> > Hadoop source locally so I can't debug the generic options parsing. >> > >>> > >> > >>> > -----Original Message----- >> > >>> > From: beneo_7 [mailto:[email protected]] >> > >>> > Sent: Monday, December 27, 2010 10:45 PM >> > >>> > To: [email protected] >> > >>> > Subject: where i can set -Dmapred.map.tasks=X >> > >>> > >> > >>> > i read onMahout in Action that I should set -Dmapred.map.tasks=X >> > >>> > but it did not work for hadoop >> > >>> > >> > >>> >> > >> >> > >> >> > > >> > >> > -- Lance Norskog [email protected]
