On Tue, Dec 14, 2004 at 08:53:58AM -0600, Dave Kleikamp wrote: > On Tue, 2004-12-14 at 08:07 -0600, John Goerzen wrote: > > 1. Some files contained data that no process could possibly have been > > writing to them. It appeared to be blocks of NULLs in some cases. Some > > files were .so files, and I lack the expertise to know specifically > > which chunks were bad, but I did know that they were corrupt (ldconfig > > told me). Some of the files may have contained blocks from other files > > also (but I'm not certain of this either). > > This can normally happen for files that are created shortly before the > crash. Blocks of nulls may be file holes. I don't think .so files are > created sequentially, so it may be possible that there are portions of > the file that had not yet been committed to disk. It's possible that > newly-created files may contain stale data, but I don't think this > happens often in practice (but I'm not sure).
Is there any practical way I could try to address this? I would rather have the files truncated, or even re-linked to /lost+found or something, than have them contain bad data. I also never seemed to encounter this behavior with either ext2 or ext3. Was I just lucky, or is there something fundamentally different about JFS? > > 2. Some files were truncated. This is not unexpected in a crash > > situation, but there were many more files like this than I would have > > expected. > > This would be normal if the files were newly-created. The transaction > that creates the file will be committed to disk earlier than the > transaction(s) which extend the file when data is written. They probably were, or were at least recently updated. > > 3. The total number of files touched by #1 and #2 far exceeds the number > > of files open for writing at the instant the system went down. > > Files aren't committed when they are closed. pdflush usually makes sure Ahh, that was a misconception on my part then. > dirty data is committed to disk within 30 seconds. I believe the It is quite possible that all the corrupted files were created or modified within the 10 seconds prior to crash. As for the .so files, I was running apt-get dist-upgrade at the time, so they were being created/modified at the time of crash; it was just the magnitude of the problem that was startling, especially given that dpkg first unpacks things with a temporary filename, then renames them to their permanent name to try to avoid any corruption in cases like this. I don't know if it does a fsync(), though. > Just to make sure there isn't anything abnormal going on, did fsck run > through all of it's phases, or just phase 0? Use the -f or -n flags to > force it to check everything. I can't say I recall. > > Is this behavior expected from JFS? Is there anything I can do to help > > out next time? > > As long as the affected files were all created or extended > within /proc/sys/vm/dirty_writeback_centisecs, this is expected > behavior. Thanks, John _______________________________________________ Jfs-discussion mailing list [EMAIL PROTECTED] http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion
