thanks a lot for brief discussion and pointers. will try this and inform.

On Fri, Nov 21, 2014 at 3:54 AM, Miller, Timothy <mille...@wfu.edu> wrote:

> Here are some observations/facts that may prove useful but, I'd like to
> stress, I don't develop code for either of these products and my deductions
> (such as "blocking") may not be strictly technically accurate.
>
> 1. MPBSJOBUPDATE is the point of the Maui scheduler iteration where it
> talks to the Torque server (maybe the MOM too) to get updated information
> about a job.
> 2. Maui is single threaded and blocks when it is "working" (whether that
> is a scheduler iteration or a client querying the server)
> 3. TORQUE is also single threaded.  Job submissions, MOM status updates,
> etc. cause it to block.
> 4. TORQUE has much better mechanisms to manage the times when it is
> blocking so you see it a whole lot less than in Maui.  But, it still can
> happen.  Plus, most of the time it does block, it is for really short
> periods of time (sub-second).
>
> The direct problem with your Maui commands appears to be that MAUI_SERVER
> is blocking while it is in a scheduling iteration.
>
> My guesses to the root cause would be:
>
> 1.  You have a PBS_MOM process on a compute node that is hung/timing out
> or slow to respond to PBS_SERVER causing responses to MAUI_SERVER to wait.
> 2.  You have a user submitting jobs or otherwise blocking PBS_SERVER,
> causing responses to MAUI_SERVER to wait.
>
> Try playing with your MOM timeout values and see if the duration of your
> hangs decreases.
>
> I hope that helps,
> -Tim
>
> On Nov 18, 2014, at 20:13, Pankaj Dorlikar <pankaj.dorli...@gmail.com>
> wrote:
>
> Thanks, but we have done some customization in Maui for 3.2.6p1 version.
> Also, it is tightly integrated with torque and gold. Can you suggest
> something else?
> On Nov 19, 2014 6:38 AM, "Timothy E Miller" <mille...@wfu.edu> wrote:
> >
> > Have you tried using the latest version in SVN?  It's  version is in the
> 3.3.x class. I know there were many fixes we found over 3.2.6p1 a couple
> years ago when we upgraded.
> >
> > -Tim
> >
> > Sent from my iPhone
> >
> > On Nov 18, 2014, at 20:03, Pankaj Dorlikar <pankaj.dorli...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Is there any solution to this issue. Currently, we are working on maui
> only.
> > ---------- Forwarded message ----------
> > From: Pankaj Dorlikar <pankaj.dorli...@gmail.com>
> > Date: Sat, Nov 15, 2014 at 11:09 AM
> > Subject: maui commanit wait
> > To: mauiusers <mauiusers@supercluster.org>
> >
> >
> > H,
> >
> > We have maui 3.2.6p1 and torque version 2.5.8 on cent os 6.2 x_64
> platform on more than 100 nodes. However, we are facing the issue of maui
> commands taking considerable time to give output. commands simply waits for
> ~30-40 seconds to give outputs of showq/checknode etc. At any point of
> time, we have more than 60 jobs in execution and around 120 jobs in queue.
> When the commands output is waited, we could see that maui stops at
> MPBSJOBUPDATE. What could be the solution to this?
> >
> > --
> > Pankaj V. Dorlikar
> >
> >
> >
> > --
> > Pankaj V. Dorlikar
> > _______________________________________________
> > mauiusers mailing list
> > mauiusers@supercluster.org
> > http://www.supercluster.org/mailman/listinfo/mauiusers
>



-- 
Pankaj V. Dorlikar
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to