Re: CPU Usage in a cluster environment.

Potkay, Peter M (PLC, IT) Wed, 26 May 2004 07:16:25 -0700

When you say the CLUSSNDR channel was running, were you looking at the manually defined CLUSSNDR, or the AutoDefined CLUSSNDR? The manual one may have been running, but it was not being used to send any messges. The AutoDefined CLUSSNDR is the one that was (attempting) to do the work. If the CLUSSRCVR was paused, then the partner CLUSSNDR (the auto defined one) must have been retrying.

If a channel is retrying, that knocks it down a couple of notches inside the Cluster WorkLoad Algorithim. If there were multiple QMs hosting the destination queue, and all the AutoDefined CLUSSNDRs to them were RETRYING, then the algorithim would be spinning on all these messages trying all the alternate paths over and over.

Were any messages going to the DLQ on the target QMs? I would think that after the CLUSSRCVR went through its MessageRetryCount x MessageRetryInterval, it would put one message to the DLQ, before repeating the process on the next message it got.

-----Original Message-----
From: Antony Boggis [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 25, 2004 12:11 PM
To: [EMAIL PROTECTED]
Subject: Re: CPU Usage in a cluster environment.

Yes, the "backed up" system was the qmgr with >5,000,000 messages on the cluster xmit q. I would not expect any "rerouting" to happen since the queue managers were "alive" and reachable. There was just no room for any more messages. No amount of re-routing would have changed that.

What I thought interesting is that both the cluster sender channel instances from this system to the remote systems (where the dest q was full) were in a Running state, but the partner cluster receiver channels were Paused.

However, your point about the cluster workload exit being called for every message every time a channel retry interval passes may be a clue. The system is now "cleared", but I will try some more testing again soon.

Paul Clarke also suggested running a trace. A good point. I shall try that also.

tonyB.

From: MQSeries List [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, May 25, 2004 12:15 AM
To: [EMAIL PROTECTED]
Subject: Re: CPU Usage in a cluster environment. [Deutsche Boerse Systems:Virus checked]

Antony,

if the "backed up" system is the Queuemanager with the many
messages in the SYSTEM.CLUSTER.TRANSMIT.QUEUE then
the "cluster rerouting" may cause the cpu usage.

If there are messages on the SYSTEM.CUSTER.TRANSMIT.QUEUE and
the chosen destination (channel) is in retry, then MQ will try at every
retry interval of the channel to find an alternate route to the destination queue.
This is done by reading the message from the SYSTEM.CLUSTER.XMIT.QUEUE,
followed by a put to the target queue. this will drive the cluster workload mechanism
and make MQ chose a new destination (if any). Unfortunately, if there is only one
destination, then this is just a waste of cpu and log space (if messages are persistent).

I do not know if this is really the reason, because i do not know what amqzlaa0_nd is
doing, so this is just a guess...

Regards, Stefan

Antony Boggis <[EMAIL PROTECTED]>
Sent by: MQSeries List <[EMAIL PROTECTED]>
24.05.2004 21:32
Please respond to MQSeries List

To: [EMAIL PROTECTED]
cc: (bcc: Stefan Raabe/DBS/GDB)
Subject: CPU Usage in a cluster environment. [Deutsche Boerse Systems:Virus checked]
.

Environment: Solaris 5.8 (24 CPUs, 98Gb RAM), WMQ 5.3 CSD05.

I have a cluster of 4 queue managers. This past weekend we were running some tests sending volumes of messages. After a period of time we had some application issues (not MQ) on two of the receiving queue managers. Eventually several local queues filled and, as expected the cluster receiver channal on those queue managers went into a PAUSED state and messages (to the tune of > 500,000) have piled up on the sending system's SYSTEM.CLUSTER.TRANSMIT.QUEUE. This all comes as no surprise and I'd expect to hear you all say "working as designed".

My question is this... on the "backed up" system, which to all intents and purposes, is now idle (no applications are sending messages), why is my CPU usage (process: amqzlaa0_nd) pretty much maxing out one of the systems CPU's (>4% CPU usage on this 24 CPU box)? 'top' shows a CPU time for amqzlaa0_nd of 41.5H.

The sending system is actually showing the cluster sender channels as RUNNING, but the receiving end is showing a status of PAUSED.

Regards,

Antony Boggis.

------------------------------------------------------------------------------------------
Diese E-Mail enthaelt vertrauliche oder rechtlich geschuetzte Informationen.
Wenn Sie nicht der beabsichtigte Empfaenger sind, informieren Sie bitte
sofort den Absender und loeschen Sie diese E-Mail. Das unbefugte Kopieren
dieser E-Mail oder die unbefugte Weitergabe der enthaltenen Informationen
ist nicht gestattet.

The information contained in this message is confidential or protected by
law. If you are not the intended recipient, please contact the sender and
delete this message. Any unauthorised copying of this message or
unauthorised distribution of the information contained herein is prohibited.

This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited. If
you are not the intended recipient, please notify the sender
immediately by return email and delete this communication and destroy all copies.

Re: CPU Usage in a cluster environment.

Reply via email to