On the note of this thread, it is also my understanding that maui/torque on linux is not able to preempt jobs. That is to say, once a job has been assigned resources and starts running, there is no way to "suspend/pause" the job. I also don't think there is a way (nor is it necessarily a good idea) to kill a job because a higher priority one comes around.
I actually have a similar situation here, and am working on the best way to deal with it. So far, my best thought would be to put a limit on max wall time something like 12 or 24 hours. So if a user wants to run a job that will take longer, they'll have to split it up or make it checkpoint itself. Then, they'll have to submit it multiple times. If the queue grows while the user is running, when their wall time is up, their "continuation" job will be bottom priority. This is far from the best solution, but the best I've come up with so far. In my situation, I have around 7 users that have combined varying amounts of $$ to build a cluster. They each want to make sure they have access to at least as much as they have paid for. This sounds perfect for fair share. Except I know one or two, who are contributing at the 10% or less level, typically submit jobs that will run for 48-96 hours per run. One of the largest contributers runs jobs for 2-4hours per run, and needs relatively timely responses. So, one of my fears is that the small-percent user will queue a bunch of their jobs during an idle time and get most/all of the cluster. Then the queue will grow for the next 2-3 days (upsetting users) before they get to run. On the other hand, having idle nodes and a queue is also not conducive to appreciative users :(. Suggestions? Thanks! --Jim On Feb 8, 2008 4:32 AM, Bas van der Vlies <[EMAIL PROTECTED]> wrote: > Steve Young wrote: > > Hi Bas, > > Thanks for the reply! I was beginning to wonder if I was the only > > person on this list ;-). > > Is is not high traffic list. > > > I had tried to do a maxproc and/or maxmem > > on my classes. However, it just didn't seem to work well. If no one > > else was using the cluster, capping a user off at a certain amount of > > processors while the rest of the processors remain idle didn't seem > > to be making the cluster efficient at utilizing all the resources. If > > lets say a user (A) met that maxproc/maxmem limit and was still able > > to have jobs backfilled after that would seem to me a good way to > > solve my problem. This way I could guarantee them a certain amount of > > processors while allowing over that limit to be backfilled thus > > making the cluster efficient by utilizing all the resources possible. > > If another user (B) were to submit, the same thing should happen for > > them. The first user's (A) backfilled jobs would get pre-empted to > > run the second users jobs up to the maxproc limit and then backfill > > the rest. However, with maui, when you hit your limit your done. No > > other jobs will get backfilled to run. > > So while that didn't seem to work out as I'd like I thought it would > > be great if fairshare could be calculated based on the number of > > users on the queue. If there are 2 users, then 1 / 2 users would give > > them each 50% fairshare. If there were 4 users, then fairshare would > > get calculated as 1 / 4 users giving them each 25% fairshare. I'm > > sure there are drawback to this as well. > > this is not possible with maui. Dynamically fairshare depended on the > number of users. > > > Basically, all I want to do is give each user a fairshare of > > resources while still running as many jobs as possible in order to > > keep the utilization close to 100%. Especially when there are jobs in > > the Q state. It doesn't seem to make sense to cap everyone at a > > certain limit and queue the rest of their jobs while resources sit idle. > > I've struggled with trying to solve this situation for quite a while > > now. This kind of surprises me as I don't think what I'd like to see > > happen is something out of the ordinary. Isn't this what most > > clusters want to achieve? > > > That is true that is why the most cluster use a scheduler with backfill. So > it tries to run jobs on the idle nodes that are scheduled for a big > parallel job. This impossible with a FIFO scheduler > > > > On another note I am confused by setting a default of 25% fairshare > > for everyone. If you have less than 4 people using the system then > > your going to have idle resources. Then when you exceed 4 people > > utilizing the cluster what happens to fairshare then? Does maui > > start setting fairshare equally to each user up to 25%? Let say you > > have 5 users. Does fairshare try to keep everyone at 20%? Or does it > > give the first four 25% and the fifth have to wait for some of the > > first four's jobs to complete? > > > > If you give every user for eg 25% then it tries to give every user a > fairshare of 25% and set the FSUSERWEIGHT to 1, read also the fairshare > policy. > > userA en userB did not run any jobs yet, so they have both a priority of: > - 25 * 1 = 25 > > So all jobs of userA and userB has the same priority. So the first one in > the queue will run. > > Now for eg userA has run serveral job and user has used 30%, his priority > will be: > - 25 - 30 * 1 = -5 > > So if userB submit a job it will be the first job to be run. So it is very > dynamic and the FS is not something static. The FS make sure that users > that did not submit many jobs will have a higher priority then a user run > a lot of jobs. > > It just depend on your workload. When userA submits a lot of long jobs and > fill up the whole system that you have to wait before the first job is > finished or use a preemptive mechanism to suspend the jobs. But we do not > use this. > > On our site we use the MAXPS/FS on group + users and this setup works as > desired. > > > > > > > Again, thanks for the reply =). I hope to hear more from other sites > > about how they accomplished this. > > > > -Steve > > > > > > On Feb 7, 2008, at 3:14 AM, Bas van der Vlies wrote: > > > >> On Feb 6, 2008, at 9:55 PM, Steve Young wrote: > >> > >>> Hi, > >>> Ok I guess this is my 3rd and last attempt to ask how to > >>> do this. > >>> All I would really like is to know how to make a person's fairshare = > >>> 1 / <number of users who have submitted jobs> > >>> With a fairshare like this it would seem like then everyone would get > >>> an equal amount of resources. > >>> > >>> Aside from running as a FIFO scheduler this would seem to be the next > >>> step for most. Allowing anyone to run any number of jobs but assuring > >>> that each user gets an equal fairshare of resources. This example > >>> would fit nicely in the Appendix "Case Studies" in the maui admin > >>> manual. I'd be happy to help put a case study together to show this > >>> if someone could help explain how to do it =). Thanks, > >>> > >>> -Steve > >>> > >>> > >> Steve, > >> > >> I do not know if this answer your question. When for eg a user > >> has a fairshare of 25% and he has used all of it. He gets a > >> negative priority. You can set in the maui.cfg to: > >> http://www.clusterresources.com/products/maui/docs/ > >> a.fparameters.shtml#rejectnegpriojobs > >> > >> > >> what we have done is every user has a MAXPS for eg 600 hours, now > >> he can run: > >> * 100 jobs for 6 hours > >> * or 10 jobs for 60 hours > >> * ... > >> > >> This is another solution that a user does not monopolize the > >> system. It also very dynamic when a user submits 11 jobs with a > >> walltime of 60 hours. The system will schedule 10 jobs and after 6 > >> hours the 11th job can run, because 10 * 6 = 60 hours are > >> reallocated to the user's MAXPS > >> > >> At our site we use MAXPS and FAIRSHARE to balance the system and > >> preventing users to monopolize the system. > >> > >> Regards > >> > >>> > >>> > >>> On Jan 31, 2008, at 12:18 PM, Steve Young wrote: > >>> > >>>> Hi, > >>>> So perhaps my first question didn't make any sense ;-). > >>>> Basically, > >>>> I am trying to figure out how to prevent a user from tying up all > >>>> the resources by submitting a large amount of jobs to the queue. If > >>>> no one else has requested any resources then as many jobs that can > >>>> run should. However, if someone else submits jobs I'd like for them > >>>> to get their fair-share of resources without having to wait for the > >>>> first users jobs to complete and free up resources. > >>>> Perhaps, I've answered my question with "fairshare" and > >>>> should be > >>>> looking more closely at that. What I don't understand is if I were > >>>> to set a fairshare of lets say 25% per user, how would this effect > >>>> the queue system if no one else was running jobs. Essentially, I'd > >>>> want this user to get 100% of the resources until someone else were > >>>> to submit. Here is the policy I'd like to simulate: > >>>> > >>>> A user is guaranteed up to 32cpu's or 32Gb of ram. Above that, > >>>> jobs will get backfilled. Backfilled jobs will get suspended as > >>>> more users utilize the queue. > >>>> > >>>> Perhaps, I have this all wrong and should be thinking about it > >>>> differently. I'd love to see some examples of what others have done > >>>> to remedy this situation. Thanks in advance, > >>>> > >>>> -Steve > >>>> > >>>> > >>>> > >>>> On Jan 28, 2008, at 2:55 PM, Steve Young wrote: > >>>> > >>>>> Hi, > >>>>> I am trying to figure out how to make the following work > >>>>> within > >>>>> our cluster using torque/maui: > >>>>> > >>>>> > >>>>> I'd like a user to be able to submit as many jobs as they like. > >>>>> However, they should only be allowed up to 32cpu or 32gb of > >>>>> memory. After that if there are idle resources then the rest of > >>>>> their jobs can be backfilled on idle nodes. > >>>>> > >>>>> If another user submits jobs they should get the same policy and > >>>>> pre-empt any backfilled jobs (if that's required to meet the 32cpu > >>>>> or memory limit). > >>>>> > >>>>> So basically, I think this should be fairly common. I want to run > >>>>> as many jobs as possible on idle resources but only guarantee the > >>>>> jobs that fall under the MAXPROC/MAXMEM policy. I've implemented > >>>>> the MAXPROC/MAXMEM policy but it appears backfill won't work for > >>>>> the remaining jobs. So I am assuming backfill has to abide by the > >>>>> MAXPROC/MAXMEM policy I have in place. Can anyone give me some > >>>>> pointers to the proper way to implement this? Thanks in advance! > >>>>> > >>>>> -Steve > >>>>> > >>>>> > >>>>> [root@ maui]# cat maui.cfg (edited for some content) > >>>>> # maui.cfg 3.2.6p14 > >>>>> > >>>>> > >>>>> # Resource Manager Definition > >>>>> > >>>>> RMCFG[JAKE] TYPE=PBS > >>>>> > >>>>> > >>>>> RMPOLLINTERVAL 00:00:30 > >>>>> > >>>>> SERVERPORT 42559 > >>>>> SERVERMODE NORMAL > >>>>> > >>>>> # Admin: http://clusterresources.com/mauidocs/a.esecurity.html > >>>>> > >>>>> > >>>>> LOGDIR /var/log/maui > >>>>> LOGFILE maui.log > >>>>> LOGFILEMAXSIZE 100000000 > >>>>> #LOGLEVEL 3 > >>>>> LOGLEVEL 2 > >>>>> LOGFILEROLLDEPTH 5 > >>>>> STATDIR /var/log/maui/stats > >>>>> SERVERHOMEDIR /usr/maui/ > >>>>> TOOLSDIR /usr/maui/tools/ > >>>>> LOGDIR /var/log/maui/ > >>>>> STATDIR /usr/maui/stats/ > >>>>> #LOCKFILE /usr/maui/maui.pid > >>>>> SERVERCONFIGFILE /usr/maui/maui.cfg > >>>>> CHECKPOINTFILE /var/log/maui/maui.ck > >>>>> > >>>>> # Misc configs > >>>>> > >>>>> ENABLEMULTINODEJOBS TRUE > >>>>> JOBMAXOVERRUN 00:01:00 > >>>>> #SYSTEMDEFAULTJOBWALLTIME 1:00:00:00 > >>>>> USEMACHINESPEED ON > >>>>> #PREEMPTPOLICY CHECKPOINT > >>>>> PREEMPTPOLICY SUSPEND > >>>>> CREDWEIGHT 1 > >>>>> CLASSWEIGHT 1 > >>>>> QOSWEIGHT 1 > >>>>> RESCTLPOLICY ANY > >>>>> > >>>>> # Job Priority: http://clusterresources.com/mauidocs/ > >>>>> 5.1jobprioritization.html > >>>>> > >>>>> QUEUETIMEWEIGHT 1 > >>>>> > >>>>> # FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html > >>>>> > >>>>> FSPOLICY DEDICATEDPS > >>>>> FSDEPTH 7 > >>>>> FSINTERVAL 86400 > >>>>> FSDECAY 0.80 > >>>>> > >>>>> # Throttling Policies: http://clusterresources.com/mauidocs/ > >>>>> 6.2throttlingpolicies.html > >>>>> > >>>>> # NONE SPECIFIED > >>>>> > >>>>> # Backfill: http://clusterresources.com/mauidocs/8.2backfill.html > >>>>> > >>>>> BACKFILLPOLICY BESTFIT > >>>>> RESERVATIONPOLICY CURRENTHIGHEST > >>>>> #RESERVATIONPOLICY NEVER > >>>>> RESERVATIONDEPTH 50 > >>>>> RESDEPTH 32 > >>>>> > >>>>> # Node Allocation: http://clusterresources.com/mauidocs/ > >>>>> 5.2nodeallocation.html > >>>>> > >>>>> NODEACCESSPOLICY SHARED > >>>>> #NODEALLOCATIONPOLICY MINRESOURCE > >>>>> #NODEALLOCATIONPOLICY MAXBALANCE > >>>>> NODEALLOCATIONPOLICY FASTEST > >>>>> #NODEAVAILABILITYPOLICY UTILIZED > >>>>> NODEAVAILABILITYPOLICY COMBINED > >>>>> NODEMAXLOAD 1.0 > >>>>> NODELOADPOLICY ADJUSTSTATE > >>>>> > >>>>> > >>>>> # QOS: http://clusterresources.com/mauidocs/7.3qos.html > >>>>> > >>>>> > >>>>> QOSCFG[qm] PRIORITY=100 QFLAGS=PREEMPTEE > >>>>> QOSCFG[md] PRIORITY=100 QFLAGS=PREEMPTEE > >>>>> QOSCFG[faculty] PRIORITY=1000 QFLAGS=PREEMPTOR > >>>>> QOSFEATURES[qm] hamilton g03 > >>>>> QOSFEATURES[md] hamilton > >>>>> > >>>>> # Standing Reservations: http://clusterresources.com/mauidocs/ > >>>>> 7.1.3standingreservations.html > >>>>> > >>>>> # SRSTARTTIME[test] 8:00:00 > >>>>> # SRENDTIME[test] 17:00:00 > >>>>> # SRDAYS[test] MON TUE WED THU FRI > >>>>> # SRTASKCOUNT[test] 20 > >>>>> # SRMAXTIME[test] 0:30:00 > >>>>> > >>>>> # Creds: http://clusterresources.com/mauidocs/ > >>>>> 6.1fairnessoverview.html > >>>>> > >>>>> # USERCFG[DEFAULT] FSTARGET=25.0 > >>>>> # USERCFG[john] PRIORITY=100 FSTARGET=10.0- > >>>>> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi > >>>>> # > >>>>> # Groups > >>>>> # > >>>>> GROUPCFG[faculty] PRIORITY=1000 QLIST=faculty QDEF=faculty > >>>>> GROUPCFG[hamilton] PRIORITY=10 > >>>>> GROUPCFG[users] PRIORITY=10 > >>>>> # > >>>>> # Classes (queue's) > >>>>> # > >>>>> #CLASSCFG[main] QLIST=md:qm > >>>>> CLASSCFG[main] QLIST=md:qm:mercury MAXPROC=32,64 > >>>>> MAXMEM=32768,65536 > >>>>> CLASSCFG[hamilton] QLIST=md:qm > >>>>> > >>>>> > >>>>> > >>>>> torque config > >>>>> ------------------- > >>>>> > >>>>> [root@ maui]# qmgr > >>>>> Max open servers: 4 > >>>>> Qmgr: print server > >>>>> # > >>>>> # Create queues and set their attributes. > >>>>> # > >>>>> # > >>>>> # Create and define queue main > >>>>> # > >>>>> create queue main > >>>>> set queue main queue_type = Execution > >>>>> set queue main Priority = 100 > >>>>> set queue main resources_default.neednodes = main > >>>>> set queue main resources_default.walltime = 24:00:00 > >>>>> set queue main enabled = True > >>>>> set queue main started = True > >>>>> # > >>>>> # Create and define queue hamilton > >>>>> # > >>>>> create queue hamilton > >>>>> set queue hamilton queue_type = Execution > >>>>> set queue hamilton resources_default.neednodes = hamilton > >>>>> set queue hamilton resources_default.walltime = 24:00:00 > >>>>> set queue hamilton enabled = True > >>>>> set queue hamilton started = True > >>>>> # > >>>>> # Set server attributes. > >>>>> # > >>>>> set server scheduling = True > >>>>> set server default_queue = main > >>>>> set server log_events = 511 > >>>>> set server mail_from = adm > >>>>> set server query_other_jobs = True > >>>>> set server resources_default.ncpus = 1 > >>>>> set server resources_default.walltime = 24:00:00 > >>>>> set server scheduler_iteration = 60 > >>>>> set server node_check_rate = 150 > >>>>> set server tcp_timeout = 6 > >>>>> set server job_nanny = True > >>>>> _______________________________________________ > >>>>> mauiusers mailing list > >>>>> mauiusers@supercluster.org > >>>>> http://www.supercluster.org/mailman/listinfo/mauiusers > >>>> _______________________________________________ > >>>> mauiusers mailing list > >>>> mauiusers@supercluster.org > >>>> http://www.supercluster.org/mailman/listinfo/mauiusers > >>> _______________________________________________ > >>> mauiusers mailing list > >>> mauiusers@supercluster.org > >>> http://www.supercluster.org/mailman/listinfo/mauiusers > >> -- > >> Bas van der Vlies > >> [EMAIL PROTECTED] > >> > >> > >> > > > > _______________________________________________ > > mauiusers mailing list > > mauiusers@supercluster.org > > http://www.supercluster.org/mailman/listinfo/mauiusers > > > -- > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: [EMAIL PROTECTED] * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > > _______________________________________________ > mauiusers mailing list > mauiusers@supercluster.org > http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ mauiusers mailing list mauiusers@supercluster.org http://www.supercluster.org/mailman/listinfo/mauiusers