On 4/13/20 2:29 PM, Stephen Berman wrote:
On Sun, 12 Apr 2020 18:47:08 +0100 Ken Moffat <zarniwh...@ntlworld.com> wrote:

On Sat, Apr 11, 2020 at 06:43:22PM +0200, Stephen Berman wrote:
On Fri, 10 Apr 2020 15:51:22 -0500 Bruce Dubbs <bruce.du...@gmail.com> wrote:

On 4/10/20 3:29 PM, Stephen Berman wrote:
I've built current development LFS using jhalfs and when I invoke (via
sudo or logged in as root) `shutdown -h now', the system appears to hang
while trying to detach the cdrom block device.  Here are the last two
lines printed to the terminal after issuing that command:
Bringing down the loopback interface..........[OK]
sr 5:0:0:0: tag#21 timing out command, waited 120s
and every 2 minutes, the last line repeats with a different tag#.  So
far I haven't had the patience to wait more than six minutes, then I
power off the machine with the start button.  I know this is the cdrom
because on booting there are these messages:
[    6.633004] scsi 5:0:0:0: CD-ROM            HL-DT-ST DVDRAM GH24NSD1
LW00 PQ: 0 ANSI: 5
[ 6.679083] sr 5:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw
xa/form2 cdda tray
[    6.679101] cdrom: Uniform CD-ROM driver Revision: 3.20
[    6.689325] sr 5:0:0:0: Attached scsi CD-ROM sr0
[    6.689399] sr 5:0:0:0: Attached scsi generic sg1 type 5
In addition, the message "timing out command, waited %lus\n" comes from
the function scsi_softirq_done in linux-5.5.9/drivers/scsi/scsi_lib.c.
This only happens with `shutdown -h' or `shutdown -hP', not with
`shutdown -r'.  Moreover, on the same computer I also have LFS 8.4 with
kernel 4.20.12, and there `shutdown -h' works fine.  So it seems to be
an issue with kernel 5.5.9.  When I built the latter I used `make
oldconfig' with the config file of kernel 4.20.12, accepting the
defaults for all new options.  Comparing the two config files, I didn't
notice any evidently relevant difference, e.g. involving SCSI options.
I suppose it's also possible there is some other difference between LFS
8.4 and the current development version that could be involved, but I
have no idea what to look for.  Does anyone here have any ideas or
suggestions for how to track down what's causing the hang and stop it?
Since it is bringing down the loopback interface it is running the bootscript
S90localnet properly.  The only other script is S99halt and that only does
'halt -d -f -i -p'.

-d     Don't write the wtmp record.
-f     Force halt or reboot, don't call shutdown(8).
-i     Shut  down  all network interfaces just before halt or reboot.
-p     When  halting  the system, switch off the power.

Try using 'poweroff' or 'init 0' and see if anything changes.  You can also
try using an older kernel with the current build to validate that it is a
kernel problem.
Thanks for the suggestions.  I tried `poweroff' and the effect was the
same as `shutdown -h', hanging on detaching the cdrom device.  (I didn't
try `init 0' -- as the LFS book says, "init 0 is an alias for the halt
command", so shouldn't it have the same result?)  But I did, on this
system, build and install kernel 4.20.12 from LFS 8.4 -- and when I
booted it and then did `shutdown -h now', the system shut down and the
machine powered off, just as in LFS 8.4.  So that pretty clearly points
the finger at kernel 5.5.9.

I've tried searching the web but found nothing about this problem.  I'm
not sure how best to proceed.  It would be tedious and time-consuming to
build all released kernels between 4.20.12 and 5.5.9, though I might try
one or two, or maybe the current 5.6.2.  If you or anyone else has more
advice, I'm all ears.

Steve Berman
Hi Steve,

it seems a very uncommon problem,
Apparently so, yet I don't think I have unusual hardware; the SCSI drive
in question is an LG DVDRAM model GH24NSD1.

                                   so some questions just in case:

Do you have a CD in the drive when you try to shutdown ?
No.

Is the CD using an old or obscure driver (I'm thinking of old PATA
drives, I guess anything older than that is unlikely to still be
usable.
AFAICT the drive uses the standard SCSI driver, at least I'm not aware
of, nor see in the kernel config file, anything specific to this model
or LG in gneral.

          Hmm, that prompts me to ask if the CD drive is working with
this kernel ?
Yes, I can play CDs and DVDs (with VLC) and mount and read CDRs on it.

Recent kernels have seen a lot of overhauls in the kernel
infrastructure. perhaps your .config for versions after 4.20.12 has
lost something it needs for the CDROM.  Seems unlikely, since it
shows up, but testing that it works will dispel that suggestion.
Consider it dispelled.

For finding which commit caused a problem, you really need to run
git bisect.  I hope you are aware that the stable kernels maintained
by Greg KH use a different git tree from Linus' upstream, and that
while in Linus' tree there is a progression from 4.20.0 to 5.6.0, in
Greg's stable trees the progressions are 4.20.0 to 4.20.last, 5.0.0
to 5.0.last, etc.
I didn't know this, thanks for pointing it out.

Picking random versions can help identify where a problem occurred,
but it needs a plan.  I'll suggest you try 5.6.latest first (5.6.3
at the moment), just in case.  Assuming that is still broken, try
5.6.0 itself to confirm the breakage.  Then try 4.20.0 to confirm
that was ok (we know 4.20.12 was ok, but that is not in Linus's
tree).  You could then bisect in Linus's tree.
I have now built the just released 5.6.4 stable kernel and the problem
persists with it.  I'll try to get to the mainline kernels you suggest
in the next day or two.  (FWIW, on the same machine I also have openSUSE
Tumbleweed installed, which is currently running (a modified) 5.6.2
kernel, and here the machine powers off normally (but I do it via
KDE, not directly with `shutdown -h now'.))

Alternatively, if you can identify in which stable series the
problem first appeared, you might bisect in that series.  But that
is only recommended if the series is still supported (so, 5.4, and
perhaps 5.5) and in any case GKH seems content if his kernels are
bug compatible with Linus'.

I'd better warn you that for us mere mortals git bisection does not
always provide a clear answer, although "timeouts during shutdown"
sounds like the sort of problem that will give a clear 'good or bad'
answer.
I hope so, but if I understand the bisection process correctly, I'm
afraid it may be too time-consuming, since the machine where this
problem occurs is my main work computer.  Is the process this?:
1. Configure the kernel.  Is it nessary to do `make oldconfig' using the
    4.20.12 config file, or could I just run make with that config file
    on later kernel sources?
2. Build and install the kernel and modules.
3. Reboot and then do `shutdown -h now'.
4. Reboot.
5. Lather, rinse, repeat.

Steve Berman

Hi Steve,


What happens if you do "shutdown -hP now"?

The standard "halt" option doesn't actually tell the kernel to shut the computer off IIRC

--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_style

Reply via email to