On Tue, Dec 14, 2004 at 08:53:58AM -0600, Dave Kleikamp wrote:
> On Tue, 2004-12-14 at 08:07 -0600, John Goerzen wrote:
> > 1. Some files contained data that no process could possibly have been
> > writing to them.  It appeared to be blocks of NULLs in some cases.  Some
> > files were .so files, and I lack the expertise to know specifically
> > which chunks were bad, but I did know that they were corrupt (ldconfig
> > told me).  Some of the files may have contained blocks from other files
> > also (but I'm not certain of this either).
> 
> This can normally happen for files that are created shortly before the
> crash.  Blocks of nulls may be file holes.  I don't think .so files are
> created sequentially, so it may be possible that there are portions of
> the file that had not yet been committed to disk.  It's possible that
> newly-created files may contain stale data, but I don't think this
> happens often in practice (but I'm not sure).

Is there any practical way I could try to address this?  I would rather
have the files truncated, or even re-linked to /lost+found or something,
than have them contain bad data.  I also never seemed to encounter this
behavior with either ext2 or ext3.  Was I just lucky, or is there
something fundamentally different about JFS?

> > 2. Some files were truncated.  This is not unexpected in a crash
> > situation, but there were many more files like this than I would have
> > expected.
> 
> This would be normal if the files were newly-created.  The transaction
> that creates the file will be committed to disk earlier than the
> transaction(s) which extend the file when data is written.

They probably were, or were at least recently updated.

> > 3. The total number of files touched by #1 and #2 far exceeds the number
> > of files open for writing at the instant the system went down.
> 
> Files aren't committed when they are closed.  pdflush usually makes sure

Ahh, that was a misconception on my part then.

> dirty data is committed to disk within 30 seconds.  I believe the

It is quite possible that all the corrupted files were created or
modified within the 10 seconds prior to crash.

As for the .so files, I was running apt-get dist-upgrade at the time, so
they were being created/modified at the time of crash; it was just the
magnitude of the problem that was startling, especially given that dpkg
first unpacks things with a temporary filename, then renames them to
their permanent name to try to avoid any corruption in cases like this.

I don't know if it does a fsync(), though.

> Just to make sure there isn't anything abnormal going on, did fsck run
> through all of it's phases, or just phase 0?  Use the -f or -n flags to
> force it to check everything.

I can't say I recall.

> > Is this behavior expected from JFS?  Is there anything I can do to help
> > out next time?
> 
> As long as the affected files were all created or extended
> within /proc/sys/vm/dirty_writeback_centisecs, this is expected
> behavior.

Thanks,
John

_______________________________________________
Jfs-discussion mailing list
[EMAIL PROTECTED]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion

Reply via email to