Well, I think I have proved what everyone was saying. A channel speed of Normal is what is getting me here. Neil Casey and Brian McCarty provided the exact explanation as to why: non persistent messages sent over a normal speed channel cause persistent message to be written to the channel sync queue, which requires disk. I set up some tests in my lab environment. Here are the results.
**************************************************************************** ********* Server1: SpokeQM1, Win2000 SP2, MQ 5.3 CSD03 RemoteQ called FinalQ that points to FinalQ on SpokeQM2, via transmit queue HubQM1 Server2: HubQM1, Win2000 SP2, MQ 5.2.1 CSD05 QMAlias called SpokeQM2, which sends messages to transmit queue SpokeQM2.XMITQ Server3: SpokeQM2, Win2000 SP2, MQ 5.2.1 CSD05 Local queue called FinalQ **************************************************************************** *********** Test #1: Channel SpokeQM1.HubQM1 has a speed of NORMAL. Start putting 1K Non-Persistent(NP) messages every 250 milliseconds to the remote queue def on SpokeQM1. Results #1: As expected, constant disk writes on the server that houses HubQM1. Test #2: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 1K NP messages every 250 milliseconds to the remote queue def on SpokeQM1. Results #2: As expected, no disk activity at all on the server that houses HubQM1. Actually, there was disk activity when the channel started/ended, but for the whole duration while the channel was running, no I/O. Test #3: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 70,000 byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1. Results #3: No disk activity at all on the server that houses HubQM1. ??? These messages are larger than the 64K queue buffer, so why are the messages flying thru the hub with no I/O? I am happy with these results, just that it is unexpected. Could it be that the Sending MCA to SpokeQM2 has the XMIT queue open ready for messages, with an outstanding GET? But I thought this was a feature new to 5.3 only. Test #4: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000 byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1. Every 45 seconds or so, I send over a Persistent 5000 byte message on the same channel. Results #4: As expected, no disk activity at all on the server that houses HubQM1, except every 45 seconds when the P message comes over. Test #5: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000 byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1. As the messages are flowing, yank the 2 cables that connect this server to the SAN (Veritas was disabled so it would not try and fail over). Results #5: No effect at all. Even though the server had no hard disk, these messages still kept flying thru the server as if nothing at all was wrong. Test #6: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000 byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1. At the same time, start putting 5000 byte P messages over the same channel. As the messages are flowing, yank the 2 cables that connect this server to the SAN (Veritas was disabled so it would not try and fail over). Results #6: Everything backs up. Both NP and P messages are backed up in the XMITQ on SPOKEQM1. As soon as the cables are plugged back in, the messages start flowing again. Test #7: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000 byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1. At the same time, start putting 5000 byte P messages over A DIFFERENT CHANNEL between SpokeQM1 and HubQM1. As the messages are flowing, yank the 2 cables that connect this server to the SAN (Veritas was disabled so it would not try and fail over). Results #7: Everything backs up on the channel that was dealing with P messages. The channel that had only NP messages was not effected at all. As soon as the cables are plugged back in, the messages start flowing again on the secondary channel. The primary channel that had NP messages never blinked. So now I am kinda stuck. Back in the production environment, what to do? I can set the channel between SpokeQM1 and the HUB to fast, as it is a dedicated channel for this application anyway. I'll just let them know of the possibility (very remote) that the channel may lose their message. SAN blips are a lot more frequent than MQ losing NP messages over a FAST channel. But what do I do with the CLUSRCVR channels? They are a shared resource for the whole company. Do I let this one application dictate that these channels get switched to FAST, at the risk of other apps having NP message lost. Granted, we have a pretty reliable network here, but man, what a waste of time trying to hunt for messages that get lost over the fast channel. (For anyone just jumping into the thread now, please don't suggest that I just make the messages persistent. That is not the answer. Read the thread from the beginning to see why). What do most people out there have their Cluster channel speeds at? Dennis, you also mentioned moving the logs off of the SAN, but then that kinda defeats the purpose of having these servers HA. -----Original Message----- From: Miller, Dennis [mailto:[EMAIL PROTECTED] Sent: Friday, May 30, 2003 6:59 PM To: [EMAIL PROTECTED] Subject: Re: How a MQSeries Hub does its thing with persistent / non-persi stent messages In order to get the behaviour you want, the task processing NP messages must not use or be dependent on SAN I/O whatsoever. Now you don't have absolute control of the I/O that MQ uses. For example, NP messages can spill to disk if they are large or many, and sometimes MQ will use disk scratchpads for various reasons. The good news is that IBM has put a lot of effort into optimizing the throughput of NP messages, so avoids disk I/O unless absolutely necessary. I still think you are probably experiencing log I/O because the channels are doing your NP messages under syncpoint. Change your Ha-Ha channels to NPMSPEED=fast and see if it makes a difference. Ultimately, I think you need to move your logs off the SAN. > -----Original Message----- > From: Potkay, Peter M (PLC, IT) [SMTP:[EMAIL PROTECTED] > Sent: Friday, May 30, 2003 2:32 PM > To: [EMAIL PROTECTED] > Subject: Re: How a MQSeries Hub does its thing with persistent / non-persi stent messages > > For Ha-Has, I made a dedicated channel for this app from SPOKE1 to HUBQM. > The only messages going over this channel are non persistent. Thousands of > messages are zooming across this channel every hour. The XMIT queue never > got deeper than 2. The speed is normal. A bin change hits our SAN, which the > HUB needs, and the XMIT queue backed up to 22 for a couple of seconds! Since > then there have been no changes to the SAN, and the XMIT queue again has not > gotten over 2. This to me reinforces that fact that a disk outage on the HUB > is effecting non persistent messages somehow. And I am beginning to think > there is no way around it. :( > > > > > > About the messages being non persistent / persistent and the channel speed: > > Even though the messages are non persistent, I still care about them. I have > always been of the mind set that whether a message is persistent or not has > more to do with how difficult it is for the apps to reproduce the message if > it got lost. If it is a big deal, then make it persistent. It will survive > anything and eventually be processed. Messages that tend to sit in queues > for a long time are susceptible to QMs going down, and thus should be made > persistent if they need to survive. > > The messages in this app are inquiry style. They are invalid 5 seconds after > the fact. Even if they were persistent and survived a QM restart, they would > still be invalid, so why incur the performance penalties of persistence? > Now, that's not to say we don't care if they get lost or not. I always shake > my head when I hear people say "I made it non persistent because I don't > care if it gets lost or not". If you don't care, why did you bother to send > it in the first place?!?!? What if MQ was losing 50% of the nonpersistent > messages? I couldn't tell the app "Hey just resend them, they are only > inquiry messages anyway!" Nor could I say, "Every message in this company is > going to be persistent. We don't want to bother with lost messages ever". > Its my job to config MQ to be as reliable as possible. > > An application that sends non persistent inquiry messages that will be > invalid in 5 seconds has a reasonable assumption that MQ will do everything > it can to deliver them. Just because they don't need to survive a QM restart > doesn't mean they are less important. > > I feel the happy medium between "Make all message persistent" and "Don't > expect all your messages to always make it to the other side" is to set the > message channel speed to normal, as long as conditions warrant it. If you > got a BATCHINT of 100 and a BATCHSIZE of 200 and your XMIT queues regularly > back up, and the occasional non persistent message is being held back until > the batch commits, then no way, the speed should be fast, and live with the > fact that it may get lost. > > But I bet that is not how many of anybody's channels run. I bet most of us> > have XMIT queues that are normally empty, and the BATCHINT is still set to > the default of 0. In this case, setting the speed to normal will have very > little effect on overall performance, but will insure that no messages ever > get lost. > > I wonder why IBM choose to have the default setting of the channel speed set > to fast? Seems to me it would be better to make the default normal. This > would perform just fine for most people and would help MQ's rep of never > losing messages. You have no idea what a pain it was discovering that MQ was > losing messages over a particular fast channel. Days of blaming the apps > with losing the messages, hunting in DLQs all over the place, XMIT queues, > application queues, etc. The real kick in the pants is that when a message > is lost like this, there is ZERO record of the fact. You are left scratching > you head. The man hours wasted on hunting for a message lost like this is > just not worth it. I'll gladly take a tiny performance hit in a tiny > percentage of the messages I send over an already very fast product. > > Any people looking to pump up the performance of a channel above and beyond > this could then tweak the channel to fast, only after realizing messages > could get lost. Maybe when it was time to decide what value to use as a > default, the logic was "We have a choice of making our product faster out of > the box or making our message delivery more assured out of the box". And the > choice was to make it fast, in case customers are running performance > comparisons against other messaging systems like SONINMQ or MSMQ. Who knows, > this is only a guess. > > > > > > > -----Original Message----- > From: John Scott [mailto:[EMAIL PROTECTED] > Sent: Friday, May 30, 2003 1:04 PM > To: [EMAIL PROTECTED] > Subject: Re: How a MQSeries Hub does its thing with persistent / > non-persi stent messages > > > I think I joined the thread part way through. Now I'm playing catchup. I've > read you original message which I'll add my 2p (English money) in > revers(ish) order: > > Q3. I then defined a local queue on QMHUB and used one of the spoke QMs to > send non-persistent message to it. 1 GIG worth actually. Now these are not > written to disk, cause they are not persistent, so where are they, in > memory? I see the queue file grew by over a GIG, so doesn't that mean they > are on disk, even though they are non persistent? > > A3: I would expect these to remain in memory until you exceed the amount of > memory allocated to hold these messages, after which MQ must store them on > disk, surely? > > Q1. On day 1, is there any data being written to disk by QMHUB as the > messages fly thru? I assume no, since they are not persistent (but see Q3 > below [ above - J]). > > A1: See question 1. Messages may get logged to disk if MQ runs out of > allocated cache memory. > > Q2. On day 2, even though we have 2 CPUs, we still have only 1 QM, so I > assume all the non persistent messages throughput must be affected by the > persistent messages. My reasoning is, as the persistent messages go in and > out of the QMAliases, and in and out of the XMIT queues, it has to "stop" > and log, right? And if it has to stop and log, then it can't be handling the > non persistent ones at the same time right? They have to wait? > > A3: Elsewhere you mentioned changing the channels to normal rather thanfast. > To me this means non-psist messages get sent in sequence and are > acknowledged by the channels. Now your messages sit in transmission queues > and get read in turn and sent (again waiting for acks from the receiving > end). You want this to stop losing messages (though exactly why I'm not sure > since they're not psistent and will die if the QM is restarted - another > discussion). > > Since you now have non & psistent messages mixed in a XMITQ, I would expect > the psistent onces to "get in the way" of non-persisten ones since they'll> > be read in batches and sent and acknowledged by the receiving end. > > However, these are only mixed by XMITQ. Thus if you have all persistent > going to SPOKEQM1 and all non-persistent going to SPOKEQM2, I would not > expect SPOKEQM2's messages to be delayed by SPOKEQM1's persistent messages. > > Regards > John. > > > -----Original Message----- > From: Potkay, Peter M (PLC, IT) [mailto:[EMAIL PROTECTED] > Sent: 30 May 2003 15:18 > To: [EMAIL PROTECTED] > Subject: Re: How a MQSeries Hub does its thing with persistent / non-persi > stent messages > > > The HUB has dozens of channels to and from each spoke. My question is if one > pair of spokes is exchanging Nonpersistent messages and another pair starts > sending persistent, will they hurt each other. > > I don't think dedicating channels to be persistent or not between a spoke QM > and the HUB will make a difference, since either way, the HUB QM has to deal > with dozens of channels either way. It may make a difference on how fast a > message gets from a particular spoke to the HUB, but not what happens once > it is already there. > > > <SNIP/> > > > ********************************************************************** > > Click here to visit the Argos home page http://www.argos.co.uk > > The information contained in this message or any of its attachments may be > privileged and confidential, and is intended exclusively for the addressee. > The views expressed may not be official policy, but the personal views of > the originator. > If you are not the intended addressee, any disclosure, reproduction, > distribution, dissemination or use of this communication is not authorised. > If you have received this message in error, please advise the sender by > using your reply facility in your e-mail software. > All messages sent and received by Argos Ltd are monitored for virus, high > risk file extensions, and inappropriate content. As a result users should be > aware that mail maybe accessed. > > ********************************************************************** > > Instructions for managing your mailing list subscription are provided in > the Listserv General Users Guide available at http://www.lsoft.com > Archive: http://vm.akh-wien.ac.at/MQSeries.archive > > > This communication, including attachments, is for the exclusive use of > addressee and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, any use, copying, > disclosure, dissemination or distribution is strictly prohibited. If > you are not the intended recipient, please notify the sender > immediately by return email and delete this communication and destroy all copies. > > Instructions for managing your mailing list subscription are provided in > the Listserv General Users Guide available at http://www.lsoft.com > Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive