- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,

How do you make sure that no process is reading the file before it is fully flushed to disk?

Possible way of testing for this kind of error: before you open a file, use os.stat to determine its size, and write out the size and the file path into a log file. Whenever an error occurs, compare the actual size of the file with the logged value. If they are different, then you have tried to read from a file that was growing at that time.

Suggestion: from the other process, write the file into a different file (for example, "file.gz.tmp"). Once the file is flushed and closed, use os.rename() to give its final name. On POSIX systems, the rename() operation is atomic.


   there seems to be no clear pattern and just randmoly fails. The file
is also just open for read from this program,
   so in theory no way that it can be corrupted.
Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file is not flushed to the disk, then you can only read a fragment of the block, and that changes the CRC.

   I also checked with lsof if there are processes that opened it but
nothing appears..
lsof doesn't work very well over nfs. You can have other processes on different computers (!) writting the file. lsof only lists the processes on the system it is executed on.

- can't really try on the local disk, might take ages unfortunately
(we are rewriting this system from scratch anyway)


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to