I created a routine that checks the number of records in the files from the center in question once per minute and reports any that change to or from zero records. It has given surprising results. One time when there was a corruption, it reported 0 records immediately before and + records after. On other occasions, it has recorded similar changes without there being any corruption. Conversely, there have been 2 cases of corruption with no indications that the corrupted file was ever empty. I have since changed the routine to monitor the files from all centers in an effort to see if these state changes are normal. If I see them from the other centers, I will have to conclude that they, while strange, are normal. Back in 2004, I posted an item about files disappearing from SFS when FTP was appending to them (http://listserv.uark.edu/scripts/wa.exe?A2=ind0405&L=IBMVM&P=R29292&D=0 &H=0&I=-3&O=T&T=0&m=49139). There was only one response and the problem went away without ever having been correctly diagnosed and fixed. This problem seems to be very much the same as the 2004 post because we did note that the files that disappeared were first reported as being empty. This time, the problem, if it is related, is more persistent than before, happening once every few days instead of once every 3.5 years:-(
The question is, what is causing this, something in SFS or is it being done by TCPIP? How can I make the determination? Regards, Richard Schuh Original post in the current thread. Date: Wed, 28 Nov 2007 14:12:04 -0800 Reply-To: The IBM z/VM Operating System <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > Sender: The IBM z/VM Operating System <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > From: "Schuh, Richard" <[log in to unmask] <http://listserv.uark.edu/scripts/wa.exe?LOGON=A2%3Dind0711%26L%3DIBMVM% 26P%3DR48751%26D%3D0%26H%3D0%26I%3D-3%26O%3DT%26T%3D0> > Subject: FTP Append Content-Type: multipart/alternative; We have been using FTP to append to daily files from our centers around the world for eight years now. The way that we have been doing it is that data is accumulated by a PC at each center. When a threshold is reached, the PC initiates an FTP session with our VM system and appends the data to a file whose name and type reflect the location of the originating system, the type of log file and the date of the collection. These files reside in the same SFS directory. Lately, the files from one of the centers intermittently get corrupted by overwriting the already written data. For example, data, which is timestamped, might be collected for three hours and after the next transmission, the start of the file will bear the timestamp of 03:00:01. Sometimes, it happens early; other times late (20:39:00 is one recent example). The people who are in charge of this process have checked and rechecked to verify (1) that all centers are running the same level of software, (2) that there is nowhere that the PUT command is used in place of APPEND, and (3) any non-zero return code from any command terminates the transmission and the error is logged on the PC. So far, no non-zero return code has been reported; no error log created. Has anyone seen this sort of behavior? What might cause it? We have nearly 20 log files being created on VM using this method and software. Why is only one file being victimized? I have tried FTP to a test file that is locked in XEDIT by a user other than the owner of the directory. The result was a meaningful error message accompanied by a non-zero return code. Doing the same from the owning user gives the expected bad results. The updates of whichever user ends first get wiped out by the last to do the FINIS. It is only the update that gets wiped out, not the entire file. The latter test was just done for completeness of the experiment. In real life, (a) the owner is a service machine that runs disconnected and never manipulates these files until they are at least a day old, and (b) the only ones who can write into the directory are the owner, the PCs doing the FTPs, which act under the auspices of the only user explicitly authorized to write in the directory, and file pool administrators. We are running z/VM 5.2.0 at service level 701 (CP, CMS22, and TCP/IP all at the same service level.)