The results are coming in and they seem unanimous. Whenever I detect a
file for the center in question having zero records (many occurrences)
it is invariably corrupted. There have been only two occurrences of zero
record files from one of the other centers. Those two files were not
corrupted. There must be something happening because of timing or some
other factor.
 
I tried TRACERTE to the MVS systems at each of the centers. In each
case, there were 8 hops. The difference is in time. The center where
there is no failure had almost instantaneous response while there are
noticeable delays when checking the path to the problem center. 
 
This leaves me with two questions:
 
1. Why am I seeing any instances of zero record files, especially when
there had been thousands of records in some of them in earlier checks? 
2. Why is this OK for one set of files, but not another?
 
Actually, there is a third question: Which component(s) do I open the
PMR(s) against? 
 

Regards, 
Richard Schuh 

 

 

________________________________

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Thursday, December 06, 2007 11:52 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: FTP Append



Good diagnostic technique! 

It's unclear to me if the monitoring is done from z/VM, or on the PCs in
the centers   

If it's done a z/VM, then you're only looking at one side of the
equation.  Can you develop a monitor that runs on the PCs to see what
*they* see every minute?   

Extra credit will be given if that monitor is also run on the PC
immediately before and after each FTP event so that what arrives on z/VM
can be compared with what was on the PC _just before_, and _just after_
every FTP event. 

My apologies if the thoughts in this have already been tried.  In an
apparent attempt to maintain some privacy around what's being done,
sometimes the posts have sometimes been difficult to interpret. 

Mike Walter 
Hewitt Associates 
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates. 



"Schuh, Richard" <[EMAIL PROTECTED]> 

Sent by: "The IBM z/VM Operating System" <IBMVM@LISTSERV.UARK.EDU> 

12/06/2007 01:38 PM 
Please respond to
"The IBM z/VM Operating System" <IBMVM@LISTSERV.UARK.EDU>



To
IBMVM@LISTSERV.UARK.EDU 
cc
Subject
Re: FTP Append

        




I  created a routine that checks the number of records in the files from
the center in question once per minute and reports any that change to or
from zero records. It has given surprising results. One time when there
was a corruption, it reported 0 records immediately before and + records
after. On other occasions, it has recorded similar changes without there
being any corruption. Conversely, there have been 2 cases of corruption
with no indications that the corrupted file was ever empty. I have since
changed the routine to monitor the files from all centers in an effort
to see if these state changes are normal. If I see them from the other
centers, I will have to conclude that they, while strange, are normal. 

Back in 2004, I posted an item about files disappearing from SFS when
FTP was appending to them
(http://listserv.uark.edu/scripts/wa.exe?A2=ind0405&L=IBMVM&P=R29292&D=0
&H=0&I=-3&O=T&T=0&m=49139
<http://listserv.uark.edu/scripts/wa.exe?A2=ind0405&L=IBMVM&P=R29292&D=0
&H=0&I=-3&O=T&T=0&m=49139> ). There was only one response and the
problem went away without ever having been correctly diagnosed and
fixed. This problem seems to be very much the same as the 2004 post
because we did note that the files that disappeared were first reported
as being empty. This time, the problem, if it is related, is more
persistent than before, happening once every few days instead of once
every 3.5 years:-( 

The question is, what is causing this, something in SFS or is it being
done by TCPIP? How can I make the determination? 

Regards,
Richard Schuh 

Original post in the current thread. 

Date:         Wed, 28 Nov 2007 14:12:04 -0800 

Reply-To:     The IBM z/VM Operating System <[log in to unmask]
<http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM%
26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > 

Sender:       The IBM z/VM Operating System <[log in to unmask]
<http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM%
26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > 

From:         "Schuh, Richard" <[log in to unmask]
<http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM%
26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > 

Subject:      FTP Append 

Content-Type: multipart/alternative; 

We have been using FTP to append to daily files from our centers around
the world for eight years now. The way that we have been doing it is
that data is accumulated by a PC at each center. When a threshold is
reached, the PC initiates an FTP session with our VM system and appends
the data to a file whose name and type reflect the location of the
originating system, the type of log file and the date of the collection.
These files reside in the same SFS directory. 

Lately, the files from one of the centers intermittently get corrupted
by overwriting the already written data. For example, data, which is
timestamped, might be collected for three hours and after the next
transmission, the start of the file will bear the timestamp of 03:00:01.
Sometimes, it happens early; other times late (20:39:00 is one recent
example). 

The people who are in charge of this process have checked and rechecked
to verify (1) that all centers are running the same level of software,
(2) that there is nowhere that the PUT command is used in place of
APPEND, and (3) any non-zero return code from any command terminates the
transmission and the error is logged on the PC. So far, no non-zero
return code has been reported; no error log created. 

Has anyone seen this sort of behavior? What might cause it? We have
nearly 20 log files being created on VM using this method and software.
Why is only one file being victimized? 

I have tried FTP to a test file that is locked in XEDIT by a user other
than the owner of the directory. The result was a meaningful error
message accompanied by a non-zero return code. Doing the same from the
owning user gives the expected bad results. The updates of whichever
user ends first get wiped out by the last to do the FINIS. It is only
the update that gets wiped out, not the entire file. The latter test was
just done for completeness of the experiment. In real life, (a) the
owner is a service machine that runs disconnected and never manipulates
these files until they are at least a day old, and (b) the only ones who
can write into the directory are  the owner, the PCs doing the FTPs,
which act under the auspices of the only user explicitly authorized to
write in the directory, and file pool administrators. 

We are running z/VM 5.2.0 at service level 701 (CP, CMS22, and TCP/IP
all at the same service level.) 




________________________________

The information contained in this e-mail and any accompanying documents
may contain information that is confidential or otherwise protected from
disclosure. If you are not the intended recipient of this message, or if
this message has been addressed to you in error, please immediately
alert the sender by reply e-mail and then delete this message, including
any attachments. Any dissemination, distribution or other use of the
contents of this message by anyone other than the intended recipient is
strictly prohibited. All messages sent to and from this e-mail address
may be monitored as permitted by applicable law and regulations to
ensure compliance with our internal policies and to protect our
business. Emails are not secure and cannot be guaranteed to be error
free as they can be intercepted, amended, lost or destroyed, or contain
viruses. You are deemed to have accepted these risks if you communicate
with us by email. 

Reply via email to