Hi,

Unfortunately I have a production MQ problem today!!  Actually I
believe it's not a MQ problem but a network problem. However, when
messages cannot be delievered, it's a "MQ problem".

I would like to get some advice from the list before I spent time to
get all different groups to do a trace.

Problem details :

Box 1 
Windows NT4 running MQ 5.3

Box 2
z/OS 1.4 running MQ 5.3

Messages are to be sent from NT to z/os one way only.  No changes on
MQ at both end.  When problem happen, the channel status at QM1 shows
running. However, when look at the detail status , it stops at the 25
messages. That is for example when the xmitq got 100 messages, when I
started the channel and display qlocal(xmit queue), it shows 75
messages.  If I display chs, it shows msg(25) and curmsgs(26).  On the
mainframe end, the rcvr shows running as well (with msg 0) but after
around 5 minutes, the adopt Mca kicks in and reestablish the
connection.  On the Window end, the error log shows a TCP/IP error
10054 (connection forcily closed by remote host). Usually that's due
to network problem.
 
I've tried to change the batchsz to 1 with no luck except the ql(xmit)
becomes 99 and the chs all with MSG become 1. All other errors message
are the same.

I then tried the PING CHANNEL and it worked.   I then dump the xmitq
messages and found that the messages around 1300-1500 bytes so I use
PING CHANNEL with DATALEN(1500). This time....." AMQ9208..error
receiving from host..." in runmqsc.

I then do a command prompt PING without problem.  I do PING again with
length 1500 and 10000 and still ok ( response is fast too).  Then I do
a TSO PING from z/OS wiith length 1500 and 10000 as well without
problem.

As usual, contact network team for the problem and they usually come
back with a ping statistics saying "no problem' on network at all
regardless what -l they used in ping.

So the problem come back to me.  I tried redefine another channel but
with same symptom.  As I strongly believe it's a network problem
between the 2 sites, I use a another qmgr in a UNIX box as a middle
man.  So I send those messages from NT box to the middleman UNIX qmgr
and then to the mainframe qmgr.  Hurray! Now everything works fine and
message all gone through.  So..I believe nothing wrong the the qmgr on
NT and z/OS.

As it's just a temporary solution, now I have to find out the root
cause.  I tried the MQ PING again to some other boxes from the NT box
without problem of data length 1500. I tried multiple time to the z/OS
and figured out problem happen when DATALEN is bigger than or equal to
1427.

Anyone got any idea?  I know ethernet size is 1500 so it looks like
something with MTU size but it should also affect command prompt ping
as well?  What 's the difference between the command prompt ping and
the MQ ping except MQ go through the MQ port and verify the channel
name?

Thanks in advance,

Ian

Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

Reply via email to