yikes following up to self 2x :-(

I've at least found a work-around, still a bit confused about why
this is happening. - At Theo's suggestion, I added the line

touch /var/testfile                to  etc/rc.d/rc.sysinit

On booting after a crash, this generated a "no space left on device" 
error message.

The hack to get operational was simply to add a couple of 'sync' 
commands after fsck.jfs has been run.

The offending line in /etc/rc.d/rc.sysinit is presumably

ca. line 612:     >/var/run/utmp

I suppose it could also be a simple timing problem, although linux
sync(2) is *supposed* to guarantee that writes have occured, or
does fsck.jfs not call sync(2)? .. I doubt timing is the issue, as
several seconds have passed from the time that fsck was run until
the lockup.

I didn't have a lot of time to delve into system code to find
whether e2fsck / fsck.jfs make calls to sync the disk. ..

Are the jfs[IO|Commit|Sync] processes all running by this time?

- just checked, the answer is no, I guess jfs.o is demand loaded by the first 
call to mount a jfs filesystem, not by fsck.jfs

I'm guessing that this explains the problem? jfs.o is loaded at
the time of the 'hang' but has only been so for a second or so. 

if /, /boot were JFS the module would have been loaded earlier,
and the sync's would not be needed?

So is it a race? jfs has been loaded, but hasn't really gotten up
and running yet?

Although my hesitance to jfs-format /, /boot may be the cause of 
this problem I'll hold of on converting them until I have some
idea what's going on here.

forrest


On Mon, 10 Dec 2001 12:47:56 -0500
forrest whitcher <[EMAIL PROTECTED]> forrest whitcher did inscribe thusly:

> Failed boot again, having removed the quotaon line from /etc/rc.d/rc.sysinit,
> IO'm pretty sure that wasn't the problem.
> 
> Here's the circs that cause the failure
> 
>  - prior crash 
>       i.e. my / and /boot filesystems have just undergone fsck
>       Note: these are still ext2fs format
> 
> on one boot failure the console popped up a message that 
> a file could not be written to /var/run/.... -no space left on device
> 
> The set of code following the quotas message squares with this
> 
> Any ideas why it fails only after a crash???? the only things I 
> can think of that would affect the post-crash situation are:
> 
> 
> the / & /boot filesystems have just been fsck'd
> 
> Some files on jfs-formatted /var were not properly closed, and in spite 
> of boot-time fsck / log replay , they hang in the code that's posted below
> 
> 
> Any ideas from what follows here? 
> 
> Hoping for some ideas *before* I start metering this & intentionally
> crashing to find the spot that's hanging
> 
> 
> --- Excerpted from rc.sysinit
> 
> # Clean out /etc.
> rm -f /fastboot /fsckoptions /forcefsck /halt /poweroff
> 
> # Do we need (w|u)tmpx files? We don't set them up, but the sysadmin might...
> _NEED_XFILES=
> [ -f /var/run/utmpx -o -f /var/log/wtmpx ] && _NEED_XFILES=1
> 
> # Clean up /var
> # I'd use find, but /usr may not be mounted.
> for afile in /var/lock/* /var/run/*; do
>    if [ -d "$afile" ]; then
>       [ "`basename $afile`" != "news" -a "`basename $afile`" != "sudo" -a
> "`basename $afile`" != "mon" ] && rm -f $afile/*
>    else
>       rm -f $afile
>    fi
> done
> 
> # Reset pam_console permissions
> [ -x /sbin/pam_console_apply ] && /sbin/pam_console_apply -r
> 
> {
> # Clean up utmp/wtmp
> >/var/run/utmp
> touch /var/log/wtmp
> chgrp utmp /var/run/utmp /var/log/wtmp
> chmod 0664 /var/run/utmp /var/log/wtmp
> if [ -n "$_NEED_XFILES" ]; then
>   >/var/run/utmpx
>   touch /var/log/wtmpx
>   chgrp utmp /var/run/utmpx /var/log/wtmpx
>   chmod 0664 /var/run/utmpx /var/log/wtmpx
> fi
> 
> --- end Excerpt
> 
> 
> 
> 
> 
>

<snip>
_______________________________________________
Jfs-discussion mailing list
[EMAIL PROTECTED]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion

Reply via email to