Re: Large number of QMgrs on a Solaris box...

Egner, Dan Mon, 08 Sep 2003 12:35:17 -0700

Antony,

Here is some info that I put on the BMC support web site at
http://service.bmc.com/bmc_mq/kb/view_article/0,,4960%2B4977%2B4990%2B4991%2
B10423,00.html


Overview

FDCs are found in /var/mqm/errors.  Also the error logs are there.
AMQERR01.LOG is the current error log.  The FDC filenames look like this:
AMQ10450.0.FDC.  Where 10450 is the pid of the process throwing the FDC.
This resolution describes when tuning is needed due to a lack of semaphore
undo structures.  The FDCs will have these characteristics:

Probe Id: XY324192
Probe Description :- AMQ6119: An internal MQSeries error has occurred ('22
| - Invalid argument' from semop.)

This problem is seen most frequently with Solaris.  It is not common with
HP-UX and I have never seen the problem with AIX.

Solution

Recommendation:
1. For Solaris, increase the value of SEMUME and SEMMNU in /etc/system and
boot.

2. Make sure you do not delete (ipcrm) MQSeries semaphores while MQSeries is
running. This is not likely to be the problem but I have seen it.
======
DETAILS:
The following are details on how to come to the conclusion above.
At a Solaris computer type:

man semop

And see:

     EINVAL    The semid argument is not a valid semaphore  iden-
               tifier, or the number of individual semaphores for
               which the  calling  process  requests  a  SEM_UNDO
               would exceed the limit.

The EINVAL is E = error and INVAL = INVALID.

So the FDC is saying that the semop call returned EINVAL.  Looks like the
number of SEM_UNDOs being requested is too large for the current kernel.
Looks like you need to increase the number of SEM_UNDOs allowed.

Looking at
http://www.sun.com/sun-on-net/itworld/UIR960101perf.html we see that the
kernel parm to increase is semsys:seminfo_semume.  The kernel parms can be
increased by changing /etc/system and then booting. How much should you
increase it?  I usually double it until I get to 4096 and then increase by
smaller amounts.

In the past I have seen increasing increasing just SEMUME helped with the
EINVAL problem.  But, to be safe you may wish to increase both SEMUME and
SEMMNU.

I usually double the values to start with.  It is difficult to come up with
exact recommendations.  However, here is some info that will help. See
http://www-3.ibm.com/software/ts/mqseries/txppacs/supportpacs/mp00.pdf
Page 60 of 71.  Here is the quote:
"SEMMNU ,  semaphore undo's was increased from 256 to 2048. Without this
only approx 90 clients could be MQCONNected."
So based on that I would say that in addition to SEMUME (number of semaphore
undos per process), also we have SEMNU (total number of undos in the
system).  It sounds like SEMMNU=256 handles approximately 90 MQCONNs.  You
could extrapolate from there.

ANOTHER POSSIBILITY: Notice that the error explanation for EINVAL includes
two possibilities.  This article describes how to fix one possible problem.
The other possibility is that the semaphore identifier being used by
MQSeries is invalid.  This probably means that something or someone has
deleted the semaphore.  Please examine for this, are there any jobs that use
ipcrm to delete, to clean up, semaphores?  It is ok to delete semaphores
that are not MQSeries semaphores or to cleanup MQSeries semaphores while
MQSeries is down.  But if you delete (ipcrm) semaphores while MQSeries is
running this error will occur.

Dan Egner
IBM WebSphere MQ V5.3 System Administration Certified
Product Support
BMC Software, Inc

-----Original Message-----
From: Antony Boggis [mailto:[EMAIL PROTECTED]
Sent: Monday, September 08, 2003 2:02 PM
To: [EMAIL PROTECTED]
Subject: Large number of QMgrs on a Solaris box...


I am in the midst of troubleshooting an issue where I am getting huge
numbers of FDC files generated on one of our development Solaris machines.
Some of these FDC files are are REALLY large too.

The machine in question is an 8 CPU box with 32GB RAM.

There are currently 70+ queue managers *DEFINED* on the box, many in a
"local" cluster (ie the CLUSTER of queue managers is all on the same box -
this is a DEVELOPMENT machine... in the real world the cluster members would
be on different physical machines).

At the moment I am showing 27 active queue managers.

The kernel parameters have been updated to the followiung values:

set shmsys:shminfo_shmmax = 4294967295
set shmsys:shminfo_shmseg = 2048
set shmsys:shminfo_shmmni = 2048
set semsys:seminfo_semaem = 16384
set semsys:seminfo_semmni = 1024
set semsys:seminfo_semmap = 1026
set semsys:seminfo_semmns = 16384
set semsys:seminfo_semmsl = 10000
set semsys:seminfo_semopm = 100
set semsys:seminfo_semmnu = 2048
set semsys:seminfo_semume = 256
set msgsys:msginfo_msgmap = 1026
set msgsys:msginfo_msgmax = 4096

In addition to the FDC files in /var/mqm/errors, the AMQERROR0x files are
also filled with:

AMQ6119: An internal MQSeries error has occurred ('22 - Invalid argument'
from semop.)

If anyone can shed some more light on the problem, I'd apreciate it. I am in
the process of reviewing the solaris kernel params to see if that (likely)
is the root of the problem.

Regards,

tonyB.

Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://vm.akh-wien.ac.at/MQSeries.archive

Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://vm.akh-wien.ac.at/MQSeries.archive

Re: Large number of QMgrs on a Solaris box...

Reply via email to