Bug#583312: [pkg-nvidia-devel] Bug#583312: possible fix

2010-05-27 Thread Russ Allbery
Petter Reinholdtsen  writes:
> [Russ Allbery]

>> If you're experiencing a variant of #521699, then the problem is
>> that the timeout in KDM is too fast.  You need to tell KDM to wait
>> longer; it takes the NVIDIA driver longer to initialize the card
>> than it's willing to wait for.  I suspect that the only thing that
>> parallel booting is doing is starting kdm sooner and hence giving
>> the NVIDIA module even less time to initialize the hardware.

> What is loading the nvidia driver?  When is it done?

It's loaded dynamically by the X server when it starts.  These days, I
believe that's done via the device mappings provided in the
nvidia-kernel-common package, which alias char-major-195* to the nvidia
kernel module, although I'm not deeply familiar with the details of how
dynamic hardware initialization is handled.  But the kernel module is not
loaded until the X server is started, and it's loaded automatically at
that point.

> If it is done by some init.d script,

It's not, unless the mknod commands in the nvidia-kernel init script are
doing some sort of deep magic that I'm fairly sure they're not.  There's
definitely no explicit call to modprobe anywhere in an init script
provided by NVIDIA packages.

-- 
Russ Allbery (r...@debian.org)   



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: possible fix

2010-05-27 Thread Petter Reinholdtsen
[Russ Allbery]
> If you're experiencing a variant of #521699, then the problem is
> that the timeout in KDM is too fast.  You need to tell KDM to wait
> longer; it takes the NVIDIA driver longer to initialize the card
> than it's willing to wait for.  I suspect that the only thing that
> parallel booting is doing is starting kdm sooner and hence giving
> the NVIDIA module even less time to initialize the hardware.

What is loading the nvidia driver?  When is it done?  If it is done by
some init.d script, the init.d script should not exit until the
initialization is done to make sure those scripts depending on it will
work.

Happy hacking,
-- 
Petter Reinholdtsen



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: possible fix

2010-05-27 Thread Russ Allbery
Romane  writes:

>> I had a look at the open bugs against nvidia-glx, and came across
>> #521699 which seem similar to your problem.  Reassigning to the
>> nvidia-glx package to get input from the maintainers of that package,
>> and because I believe the problem is in that package.  Setting
>> serverity to serious, based on the assumtion that this problem will
>> affect all users with parallel booting now enabled by default.

I think parallel booting is a red herring.  The init script for nvidia-glx
has nothing to do with the operation of the X server (take a look at what
it does).

> You seem to have hit the nail on the head. Added that line to the script
> nvidia-glx, commented out the delay I had added in /etc/kde4/kdm/kdmrc,
> rebooted, and it came up to a normal logon screen without even
> displaying the green nVidia logo. I then went in and made sure
> everything was set back to what it was when this issue started for me -
> took out those two lines from the nvidia-glx script that were tried
> earlier, ensured that numbering was still S17nvidia-glx in
> rc2.d. Rebooted. 6 times. Each time without any errors, without seeing
> the nVidia logo, and boot was acceptably and perceptibly quicker than
> what was before even the update of the initscripts yesterday. Only
> things couldn't change was whatever changes running update-rc.d made
> earlier in the day (see history of problem).

If you're experiencing a variant of #521699, then the problem is that the
timeout in KDM is too fast.  You need to tell KDM to wait longer; it takes
the NVIDIA driver longer to initialize the card than it's willing to wait
for.  I suspect that the only thing that parallel booting is doing is
starting kdm sooner and hence giving the NVIDIA module even less time to
initialize the hardware.

See #568969 for the timeout fix that worked for GDM.  It appears to no
longer be a problem with GDM 3 (or at least it's not reproducible for us).

However, it's possible that my understanding here is not complete.

I don't see any obvious way that we can fix this on the NVIDIA side if I'm
understanding the problem correctly.  It takes as long as it takes to
initialize the video card, and the nvidia-glx init script is superfluous
and is going away, so adding delays to it won't work (and I'm skeptical
that's a reliable solution anyway).

-- 
Russ Allbery (r...@debian.org)   



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix

2010-05-27 Thread Romane

Good morning Petter

reassign 583312 nvidia-glx
severity 583312 serious
thanks
Thanks for passing it on. Even if not all users, a sufficient portion 
probably. Have replied all in this email.

I had a look at the open bugs against nvidia-glx, and came across
#521699 which seem similar to your problem.  Reassigning to the
nvidia-glx package to get input from the maintainers of that package,
and because I believe the problem is in that package.  Setting
serverity to serious, based on the assumtion that this problem will
affect all users with parallel booting now enabled by default.

Does it work to add for example 'sleep 5' at the end of the start
section in /etc/init.d/nvidia-glx?  Perhaps something need more time
before X is started?
   
You seem to have hit the nail on the head. Added that line to the script 
nvidia-glx, commented out the delay I had added in /etc/kde4/kdm/kdmrc, 
rebooted, and it came up to a normal logon screen without even 
displaying the green nVidia logo. I then went in and made sure 
everything was set back to what it was when this issue started for me - 
took out those two lines from the nvidia-glx script that were tried 
earlier, ensured that numbering was still S17nvidia-glx in rc2.d. 
Rebooted. 6 times. Each time without any errors, without seeing the 
nVidia logo, and boot was acceptably and perceptibly quicker than what 
was before even the update of the initscripts yesterday. Only things 
couldn't change was whatever changes running update-rc.d made earlier in 
the day (see history of problem).


Checked the various logs, and was unable to see anything that may help.

Anything that I can do that can help to isolate this further?

Ran /usr/share/insserv/make-testsuite again, and have attached the 
output in case of any use.


After the to and fro'ing during the course of the day, am inclined to 
accept your view now that the problem is in the nvidia package.

Happy hacking,
   

With greetings

Romane
set +C
cat <<'EOF' > $insconf
$local_fs   +mountall +mountoverflowtmp +umountfs
$network+networking +ifupdown
$named  +named +dnsmasq +lwresd +bind9 $network
$remote_fs  $local_fs +mountnfs +mountnfs-bootclean +umountnfs +sendsigs
$syslog +rsyslog +sysklogd +syslog-ng +dsyslog +inetutils-syslogd
$portmapportmap
$time   +hwclock
   glibc udev console-screen keymap keyboard-setup console-setup 
cryptdisks cryptdisks-early checkfs-loop
EOF
set -C

addscript acpid <<'EOF'
### BEGIN INIT INFO
# Provides:  acpid
# Required-Start:$remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# X-Start-Before:kdm gdm xdm hal
# X-Stop-After:  kdm gdm xdm hal
# Default-Start: 2 3 4 5
# Default-Stop:  
# Short-Description: Start the Advanced Configuration and Power Interface daemon
# Description:   Provide a socket for X11, hald and others to multiplex
#kernel ACPI events.
### END INIT INFO
EOF

addscript atd <<'EOF'
### BEGIN INIT INFO
# Provides:  atd
# Required-Start:$syslog $time $remote_fs
# Required-Stop: $syslog $time $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop:  0 1 6
# Short-Description: Deferred execution scheduler
# Description:   Debian init script for the atd deferred executions
#scheduler
### END INIT INFO
EOF

addscript bootlogd <<'EOF'
### BEGIN INIT INFO
# Provides:  bootlogd
# Required-Start:mountdevsubfs
# X-Start-Before:hostname keymap keyboard-setup procps pcmcia hwclock 
hwclockfirst hdparm hibernate-cleanup lvm2
# Required-Stop:
# Default-Start: S
# Default-Stop:
# Short-Description: Start or stop bootlogd.
# Description:   Starts or stops the bootlogd log program
#which logs boot messages.
### END INIT INFO
EOF

addscript bootlogs <<'EOF'
### BEGIN INIT INFO
# Provides:  bootlogs
# Required-Start:hostname $local_fs
# Required-Stop:
# Should-Start:  $x-display-manager gdm kdm xdm ldm sdm wdm nodm
# Default-Start: 1 2 3 4 5
# Default-Stop:
# Short-Description: Log file handling to be done during bootup.
# Description:   Various things that don't need to be done particularly
#early in the boot, just before getty is run.
### END INIT INFO
EOF

addscript bootmisc.sh <<'EOF'
### BEGIN INIT INFO
# Provides:  bootmisc
# Required-Start:$remote_fs
# Required-Stop:
# Should-Start:  udev
# Default-Start: S
# Default-Stop:
# Short-Description: Miscellaneous things to be done during bootup.
# Description:   Some cleanup.  Note, it need to run after 
mountnfs-bootclean.sh.
### END INIT INFO
EOF

addscript checkfs.sh <<'EOF'
### BEGIN INIT INFO
# Provides:  checkfs
# Required-Start:checkroot
# Required-Stop:
# Should-Start:  mtab
# Default-Start: S
# Default-Stop:
# X-Interactive: true
# Short-Description: Check all filesystems.
### END INIT INFO
EOF

addscript checkroot.sh <<'EOF'
### BEGIN INIT INF

Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix

2010-05-27 Thread Petter Reinholdtsen
reassign 583312 nvidia-glx
severity 583312 serious
thanks

[Romane]
> I have reached the end of my own options and limited knowledge
> (getting old and forgetful :)), but am most happy to use this
> machine to debug the issue with you if that will help improve
> further an already supurb distribution. Crashing it is not an issue
> - worst comes to worst, can reformat, reinstall (grinning).

I had a look at the open bugs against nvidia-glx, and came across
#521699 which seem similar to your problem.  Reassigning to the
nvidia-glx package to get input from the maintainers of that package,
and because I believe the problem is in that package.  Setting
serverity to serious, based on the assumtion that this problem will
affect all users with parallel booting now enabled by default.

Does it work to add for example 'sleep 5' at the end of the start
section in /etc/init.d/nvidia-glx?  Perhaps something need more time
before X is started?

Happy hacking,
-- 
Petter Reinholdtsen



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix

2010-05-27 Thread Romane

Good morning Petter

I must thank you for your patience.

As far as I know, changing the sequence number of a script will not
affect parallel booting.  Did you use S7nvidia-glx or S07nvidia-glx?
If the former, I suspect this caused the script to not run at all
during boot.  I suspect there is some race issue causing some but not
all boots to fail.
   
Time has proven you correct, and my approach a failure anyways (hangs 
head). I had used S7, but even changing it to S07 made no difference. 
Back now to what it started at - S17nvidia-glx. Basically, whatever went 
right earlier on is now not going right; back to being dropped to the 
console.

Btw, what is the name of the package providing /etc/init.d/nvidia-glx
on your machine (dpkg -S /etc/init.d/nvidia-glx).

$ dpkg -S /etc/init.d/nvidia-glx
nvidia-glx: /etc/init.d/nvidia-glx

Installed from the repositories, but drivers downloaded from nVidia are 
affected also.

Please try the adjusted header for nvidia-glx I posted earlier, and
let me know if it helps.
Made that change, and no change to the boot issue. Rebooted a number of 
times, to make sure not a one-off.


So far, only thing that seems to get me through is to make that change 
mentioned in my earlier email to /etc/kde4/kdm/kdmrc with a timeout 
value of 120. After my earlier overconfident assurance that had possibly 
found a solution, won't say that have it fixed, but over the test boots 
just now done, each boot was successful at getting to the login screen. 
The time from when the screen goes blank to when am presented with the 
logon screen varies from 15 to 18 seconds (give or take). Before, it was 
tossing me to the console after about 12 to 15 seconds (give or take). 
On those grounds, at least things seem to be working, even if not as 
they should :) I have read in my searches that making this change is not 
the preferred method, but ... - I can also make a coffee while I wait 
(laughing).


Have another machine which have been holding off making this update to. 
Took the plunge a little earlier, and it came up on the first boot. Also 
an nVidia card. My 3 machines are always set up identically - can hop 
from one to the other without having to remember which machine am 
sitting at - different hardware, but same system otherwise. So, not a 
consistent issue, as you suggested in your reply to me. The third 
machine is not affected - ATI, not using proprietary drivers.


I have reached the end of my own options and limited knowledge (getting 
old and forgetful :)), but am most happy to use this machine to debug 
the issue with you if that will help improve further an already supurb 
distribution. Crashing it is not an issue - worst comes to worst, can 
reformat, reinstall (grinning).


Have babbled sufficient for now


Happy hacking,
   

With greetings

Romane



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: [Pkg-sysvinit-devel] Bug#583312: possible fix

2010-05-27 Thread Petter Reinholdtsen
[Romane]
> Changed the S17nvidia-glx to S7nvidia-glx, and the beastie booted up to 
> a login screen no problems, and at about the same speed (give or take) 
> as before the installation of the initscripts and sysv... stuff yesterday.

As far as I know, changing the sequence number of a script will not
affect parallel booting.  Did you use S7nvidia-glx or S07nvidia-glx?
If the former, I suspect this caused the script to not run at all
during boot.  I suspect there is some race issue causing some but not
all boots to fail.

Btw, what is the name of the package providing /etc/init.d/nvidia-glx
on your machine (dpkg -S /etc/init.d/nvidia-glx).

> Now, this is not optimal - some update at some time in the future
> will again reorder the scripts, and the problem is very likely to
> repeat itself.

Please try the adjusted header for nvidia-glx I posted earlier, and
let me know if it helps.

Happy hacking,
-- 
Petter Reinholdtsen



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#583312: possible fix

2010-05-27 Thread Romane

Good morning

Further hunting around, and may have found a solution. A little background :

Adding the line "ServerTimeout=120" to the [X-*-Core] section of 
/etc/kde4/kdm/kdmrc enabled the system to boot to a normal login screen. 
In comparison to while having the problem, where was counting 7 to 8 
seconds, give or take, and then being dropped from the green nVidia logo 
back to the console, adding that line I was counting 12 to 15 seconds, 
and then being presented with the logon screen.


 Wanting to test boot order, I deleted the line from kdmrc, and went to 
/etc/rc2.d. The order of the scripts was :


S01nvidia-kernel
S01quemu-kvm
S01speech-despatcher
S14portmap
S15nfs-common
S17nvidia-glx

Changed the S17nvidia-glx to S7nvidia-glx, and the beastie booted up to 
a login screen no problems, and at about the same speed (give or take) 
as before the installation of the initscripts and sysv... stuff yesterday.


Now, this is not optimal - some update at some time in the future will 
again reorder the scripts, and the problem is very likely to repeat itself.


If I could suggest a change to the way that the scripts that do the 
ordering of scripts in rc2.d to be altered so as to action nvidia-glx 
earlier in the boot sequence. Though the problem may not affect 
everyone, doing so may reduce or even eliminate the incidences of this 
issue in the future.


With greetings

Romane



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org