The results are coming in and they seem unanimous. Whenever I detect a file for the center in question having zero records (many occurrences) it is invariably corrupted. There have been only two occurrences of zero record files from one of the other centers. Those two files were not corrupted. There must be something happening because of timing or some other factor. I tried TRACERTE to the MVS systems at each of the centers. In each case, there were 8 hops. The difference is in time. The center where there is no failure had almost instantaneous response while there are noticeable delays when checking the path to the problem center. This leaves me with two questions: 1. Why am I seeing any instances of zero record files, especially when there had been thousands of records in some of them in earlier checks? 2. Why is this OK for one set of files, but not another? Actually, there is a third question: Which component(s) do I open the PMR(s) against?
Regards, Richard Schuh ________________________________ From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Thursday, December 06, 2007 11:52 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: FTP Append Good diagnostic technique! It's unclear to me if the monitoring is done from z/VM, or on the PCs in the centers If it's done a z/VM, then you're only looking at one side of the equation. Can you develop a monitor that runs on the PCs to see what *they* see every minute? Extra credit will be given if that monitor is also run on the PC immediately before and after each FTP event so that what arrives on z/VM can be compared with what was on the PC _just before_, and _just after_ every FTP event. My apologies if the thoughts in this have already been tried. In an apparent attempt to maintain some privacy around what's being done, sometimes the posts have sometimes been difficult to interpret. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. "Schuh, Richard" <[EMAIL PROTECTED]> Sent by: "The IBM z/VM Operating System" <IBMVM@LISTSERV.UARK.EDU> 12/06/2007 01:38 PM Please respond to "The IBM z/VM Operating System" <IBMVM@LISTSERV.UARK.EDU> To IBMVM@LISTSERV.UARK.EDU cc Subject Re: FTP Append I created a routine that checks the number of records in the files from the center in question once per minute and reports any that change to or from zero records. It has given surprising results. One time when there was a corruption, it reported 0 records immediately before and + records after. On other occasions, it has recorded similar changes without there being any corruption. Conversely, there have been 2 cases of corruption with no indications that the corrupted file was ever empty. I have since changed the routine to monitor the files from all centers in an effort to see if these state changes are normal. If I see them from the other centers, I will have to conclude that they, while strange, are normal. Back in 2004, I posted an item about files disappearing from SFS when FTP was appending to them (http://listserv.uark.edu/scripts/wa.exe?A2=ind0405&L=IBMVM&P=R29292&D=0 &H=0&I=-3&O=T&T=0&m=49139 <http://listserv.uark.edu/scripts/wa.exe?A2=ind0405&L=IBMVM&P=R29292&D=0 &H=0&I=-3&O=T&T=0&m=49139> ). There was only one response and the problem went away without ever having been correctly diagnosed and fixed. This problem seems to be very much the same as the 2004 post because we did note that the files that disappeared were first reported as being empty. This time, the problem, if it is related, is more persistent than before, happening once every few days instead of once every 3.5 years:-( The question is, what is causing this, something in SFS or is it being done by TCPIP? How can I make the determination? Regards, Richard Schuh Original post in the current thread. Date: Wed, 28 Nov 2007 14:12:04 -0800 Reply-To: The IBM z/VM Operating System <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > Sender: The IBM z/VM Operating System <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > From: "Schuh, Richard" <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > Subject: FTP Append Content-Type: multipart/alternative; We have been using FTP to append to daily files from our centers around the world for eight years now. The way that we have been doing it is that data is accumulated by a PC at each center. When a threshold is reached, the PC initiates an FTP session with our VM system and appends the data to a file whose name and type reflect the location of the originating system, the type of log file and the date of the collection. These files reside in the same SFS directory. Lately, the files from one of the centers intermittently get corrupted by overwriting the already written data. For example, data, which is timestamped, might be collected for three hours and after the next transmission, the start of the file will bear the timestamp of 03:00:01. Sometimes, it happens early; other times late (20:39:00 is one recent example). The people who are in charge of this process have checked and rechecked to verify (1) that all centers are running the same level of software, (2) that there is nowhere that the PUT command is used in place of APPEND, and (3) any non-zero return code from any command terminates the transmission and the error is logged on the PC. So far, no non-zero return code has been reported; no error log created. Has anyone seen this sort of behavior? What might cause it? We have nearly 20 log files being created on VM using this method and software. Why is only one file being victimized? I have tried FTP to a test file that is locked in XEDIT by a user other than the owner of the directory. The result was a meaningful error message accompanied by a non-zero return code. Doing the same from the owning user gives the expected bad results. The updates of whichever user ends first get wiped out by the last to do the FINIS. It is only the update that gets wiped out, not the entire file. The latter test was just done for completeness of the experiment. In real life, (a) the owner is a service machine that runs disconnected and never manipulates these files until they are at least a day old, and (b) the only ones who can write into the directory are the owner, the PCs doing the FTPs, which act under the auspices of the only user explicitly authorized to write in the directory, and file pool administrators. We are running z/VM 5.2.0 at service level 701 (CP, CMS22, and TCP/IP all at the same service level.) ________________________________ The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. All messages sent to and from this e-mail address may be monitored as permitted by applicable law and regulations to ensure compliance with our internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. You are deemed to have accepted these risks if you communicate with us by email.