I had a similar problem. It was even harder to detect because the bottleneck that time was SRDF link and iowait does not show you anything meaningful when all these low level buffers are full. To investigate, it took writing a very simple C application. In pseudocode:
1. In a tight loop forever 1.0. print "write started" + timestamp. 1.1. Write 100M of some garbage (it better to be random) into file A. 1.2. Close file A 1.3. Delete file A 1.4. Output "write finished" + timestamp. Then capture the output for a period of 24 hours and find out if this simple program gives you same problems (like 2-minute delays). If your description is correct, it will. Then give it to your storage team and ask them for a cure. No MQ involved. Hopefully this will help, Pavel Jim Ford <[EMAIL PROTECTED] To: [EMAIL PROTECTED] M> cc: Sent by: MQSeries Subject: Re: MQ "Problem" - Advice Needed List <[EMAIL PROTECTED] n.AC.AT> 10/15/2003 03:56 PM Please respond to MQSeries List That would be a solution. It seems unnecessary for me to have to do any further legwork on this, just to get them to take ownership of something that's so obviously their problem. Maybe I just need to vent. Arrrgh!!! There. That's better. "Thomas, Don" <[EMAIL PROTECTED] To: [EMAIL PROTECTED] OM> cc: Sent by: MQSeries Subject: Re: MQ "Problem" - Advice Needed List <[EMAIL PROTECTED] N.AC.AT> 10/15/2003 02:30 PM Please respond to MQSeries List Doctor, doctor, it hurts when I do this. Well, don't do that anymore. But seriously, try to find other applications that are experiencing these pause also, then they would look rather foolish asking everyone to defend their apps. It's pretty apparent that whatever they are doing is hogging all of the disk i/o, and the MQPUT is definitely a disk intensive operation. Don Thomas EDS - PASC * Phone: +01-412-893-1659 Fax: 412-893-1844 * mailto:[EMAIL PROTECTED] -----Original Message----- From: Jim Ford [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2003 3:20 PM To: [EMAIL PROTECTED] Subject: Re: MQ "Problem" - Advice Needed Actually, they agreed to do that. The pauses stopped. But they can't see where it can be their fault, though, so now I'm required to defend MQSeries in general, and MQPUT in particular. Rick Tsujimoto <[EMAIL PROTECTED] To: [EMAIL PROTECTED] .CANON.COM> cc: Sent by: MQSeries List Subject: Re: MQ "Problem" - Advice Needed <[EMAIL PROTECTED]> 10/15/2003 02:08 PM Please respond to MQSeries List Jim, If they're willing, have them turn off replication. Show them the audit numbers from your apps. Turn on replication and show them the audit numbers again. Jim Ford <[EMAIL PROTECTED] To: [EMAIL PROTECTED] OM> cc: Sent by: Subject: MQ "Problem" - Advice Needed MQSeries List <[EMAIL PROTECTED] en.AC.AT> 10/15/2003 02:34 PM Please respond to MQSeries List We have periodic "pauses" on some of our Solaris servers. CPU usage drops down to nothing for a couple of minutes, then things begin to function normally again. Many of our MQ apps on Solaris were written in the last two years, and maintain exhaustive audit trails, Those audit trails showed that their applications were waiting on an MQPUT for the entire time. Everything on the machine pauses, by the way, but it's these MQ apps that keep the audit trail. So I did some investigation, and eventually discovered that the pauses seemed to coincide with something our storage administrator was running. It is a series of commands - issued on the mainframe - which causes our SAN to replicate to our hotsite. I ran the Unix iostat command during the pauses, and sure enough, the disk service times had gone way up, then back to a couple milliseconds after. So I told the storage team about it (2 months ago!). They claim not to see any problems, and have now decided that it's an MQ problem. They want to meet with me, and have me "explain what the MQPUT command is, and how it works." It seems pretty obvious that they don't intend to work on this, and are in the process of putting up a stonewall by shooting the messenger. It's much like dealing with a difficult vendor, except the fact that we work for the same company in some ways makes it even more difficult. I'd be tempted to raise hell with a vendor, but that could be a career limiting move in this case. So... I'm interested if anyone has seen a similar problem. And I'm also interested in any advice on how to get the SAN team to do the right thing and take ownership of the problem. Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive -- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com Archive: http://vm.akh-wien.ac.at/MQSeries.archive