thanks a lot for brief discussion and pointers. will try this and inform. On Fri, Nov 21, 2014 at 3:54 AM, Miller, Timothy <mille...@wfu.edu> wrote:
> Here are some observations/facts that may prove useful but, I'd like to > stress, I don't develop code for either of these products and my deductions > (such as "blocking") may not be strictly technically accurate. > > 1. MPBSJOBUPDATE is the point of the Maui scheduler iteration where it > talks to the Torque server (maybe the MOM too) to get updated information > about a job. > 2. Maui is single threaded and blocks when it is "working" (whether that > is a scheduler iteration or a client querying the server) > 3. TORQUE is also single threaded. Job submissions, MOM status updates, > etc. cause it to block. > 4. TORQUE has much better mechanisms to manage the times when it is > blocking so you see it a whole lot less than in Maui. But, it still can > happen. Plus, most of the time it does block, it is for really short > periods of time (sub-second). > > The direct problem with your Maui commands appears to be that MAUI_SERVER > is blocking while it is in a scheduling iteration. > > My guesses to the root cause would be: > > 1. You have a PBS_MOM process on a compute node that is hung/timing out > or slow to respond to PBS_SERVER causing responses to MAUI_SERVER to wait. > 2. You have a user submitting jobs or otherwise blocking PBS_SERVER, > causing responses to MAUI_SERVER to wait. > > Try playing with your MOM timeout values and see if the duration of your > hangs decreases. > > I hope that helps, > -Tim > > On Nov 18, 2014, at 20:13, Pankaj Dorlikar <pankaj.dorli...@gmail.com> > wrote: > > Thanks, but we have done some customization in Maui for 3.2.6p1 version. > Also, it is tightly integrated with torque and gold. Can you suggest > something else? > On Nov 19, 2014 6:38 AM, "Timothy E Miller" <mille...@wfu.edu> wrote: > > > > Have you tried using the latest version in SVN? It's version is in the > 3.3.x class. I know there were many fixes we found over 3.2.6p1 a couple > years ago when we upgraded. > > > > -Tim > > > > Sent from my iPhone > > > > On Nov 18, 2014, at 20:03, Pankaj Dorlikar <pankaj.dorli...@gmail.com> > wrote: > > > > Hi, > > > > Is there any solution to this issue. Currently, we are working on maui > only. > > ---------- Forwarded message ---------- > > From: Pankaj Dorlikar <pankaj.dorli...@gmail.com> > > Date: Sat, Nov 15, 2014 at 11:09 AM > > Subject: maui commanit wait > > To: mauiusers <mauiusers@supercluster.org> > > > > > > H, > > > > We have maui 3.2.6p1 and torque version 2.5.8 on cent os 6.2 x_64 > platform on more than 100 nodes. However, we are facing the issue of maui > commands taking considerable time to give output. commands simply waits for > ~30-40 seconds to give outputs of showq/checknode etc. At any point of > time, we have more than 60 jobs in execution and around 120 jobs in queue. > When the commands output is waited, we could see that maui stops at > MPBSJOBUPDATE. What could be the solution to this? > > > > -- > > Pankaj V. Dorlikar > > > > > > > > -- > > Pankaj V. Dorlikar > > _______________________________________________ > > mauiusers mailing list > > mauiusers@supercluster.org > > http://www.supercluster.org/mailman/listinfo/mauiusers > -- Pankaj V. Dorlikar
_______________________________________________ mauiusers mailing list mauiusers@supercluster.org http://www.supercluster.org/mailman/listinfo/mauiusers