Re: initrd race conditions

2006-02-07 Thread Mike Crowe
On Tue, Feb 07, 2006 at 05:36:51PM +, Steven Haslam wrote:
> There shouldn't be any kernel changes required. Just use mkinitrd.yaird
> to build a new initrd.img.

I installed the sarge-backports yaird and modified
/etc/kernel-pkg.conf to say "ramdisk=mkinitrd.yaird" and then ran
"dpkg-reconfigure linux-image-2.6.15-1-amd64-k8-smp" which seemed to
cause the ramdisk to be rebuilt using yaird. As you explained this
meant that the network drivers were not loaded by the initrd.

I still couldn't get my local udev rules to rename the interfaces on
startup though. In the end I just disabled the e100 in the BIOS - I
hope that the two tg3s will always be discovered in the same
order... :-)

Thanks for all your help.

I think I'd like to log a bug report about this though. Is udev the
best package to do that against?

-- 
Mike Crowe


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: initrd race conditions

2006-02-07 Thread Mike Crowe
Mike Crowe wrote:
>> 1. Failure to boot at all and being dumped at a shell prompt inside
>> the ramdisk if the onboard SATA driver is loaded prior to the
>> megaraid driver. This is despite the fact that nothing is connected
>> to SATA. I've worked around this one by disabling the onboard SATA.

On Tue, Feb 07, 2006 at 02:26:54PM +, Steven Haslam wrote:
> I actually had this exact problem myself for the first time this morning
> with my X2 system, with an initrd built by initramfs-tools.
> 
> Using yaird may help because AIUI it will only load the modules required
> to mount the root filesystem during the initramfs stage, or at least
> will load those first.

Assuming I can persuade the stock 2.6.15 kernel to use yaird rather
than initramfs then I'll give that a go. If I have to compile the
kernel myself I might as well just compile in the drivers I need which
will also solve the problem for me.
 
>> 2. Almost random ordering of ethernet devices between boots. The
>> machine has a single e100 and two tg3 ports. Although I can believe
>> that the two tg3s always appear in the same order I've had the e100
>> detected either first, last or inbetween the two tg3s! Statically
>> configuring IP addresses is very hard if you don't know which will be
>> eth0 next time. I've not fathomed a workaround for this one. This
>> makes bug #342498 entered against the installer even worse.

> You can fix the names of network interfaces using udev-- istr that's
> available in sarge, even though a kernel that supports it isn't (fun).
> 
> e.g. I have:
> 
> bash$ cat /etc/udev/rules.d/010_local.rules

[snip]

But surely that's too late? The ramdisk will load the drivers before
my root filesystem and therefore that rules file can be seen.

-- 
Mike Crowe


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



initrd race conditions

2006-02-07 Thread Mike Crowe
I'm running the backports.org 2.6.15 kernel on an otherwise sarge
amd64 box with a couple of Opteron 275s.

It seems that modules are being loaded by the initrd at startup in
parallel. This seems to lead to serious race conditions with the
following symptoms:

1. Failure to boot at all and being dumped at a shell prompt inside
the ramdisk if the onboard SATA driver is loaded prior to the megaraid
driver. This is despite the fact that nothing is connected to
SATA. I've worked around this one by disabling the onboard SATA.

2. Almost random ordering of ethernet devices between boots. The
machine has a single e100 and two tg3 ports. Although I can believe
that the two tg3s always appear in the same order I've had the e100
detected either first, last or inbetween the two tg3s! Statically
configuring IP addresses is very hard if you don't know which will be
eth0 next time. I've not fathomed a workaround for this one. This
makes bug #342498 entered against the installer even worse.

Does anyone else see this problem? Is it likely to be amd64 specific? 
I haven't really used initrds, udev or 2.6 kernels on the dual
processor x86 boxes we have so I don't know if they suffer similary.

-- 
Mike Crowe


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Make, ccache and unreaped defunct shells

2006-02-03 Thread Mike Crowe
On Fri, Feb 03, 2006 at 12:47:10AM +0100, Frederik Schueler wrote:
> I suggest you pick the backported linux-image-2.6.15-1-amd64-k8-smp
> and yaird from www.backports.org.

That's what I ended up doing not long after posting. The problem
seemed to go away but it's been replaced by another network related
problem that I'm currently discussing on the Linux netdev list.

Once I was convinced that the make problem had gone away I was going
to post a followup.
 
> The ccache version you are using is newer than the one in sarge 
> (2.3-1.1), is it a sid backport?

It's compiled straight from source.

Thanks for the advice.

-- 
Mike Crowe


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Make, ccache and unreaped defunct shells

2006-01-31 Thread Mike Crowe
I've installed sarge/amd64 on a machine with a couple of Opteron
275s. We're using make (Debian version), ccache 2.4 and distcc 2.18.3
to build quite a large amount of code using -j12.

At some point during the build make will stop spawning new
jobs. Looking at the process listing there are a large number
(equivalent to the number passed to -j) of completed shell processes
sat unreaped with make sat asleep waiting for them.

 mac 32196  1.0  0.2 24876 19648 pts/5  S+  11:51   0:02 make
 mac 13234  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14254  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14296  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14308  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14470  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14491  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14518  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14530  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14545  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14571  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14589  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 
 mac 14610  0.0  0.00   0 pts/5 Z+  11:52   0:00 [sh] 

According to /proc/32196/wchan the make process is in pipe_wait.

It seems like there is a fundamental problem with waiting for child
processes somewhere. Has anyone seen anything similar or recommend the
best course of action?

---8<---
ii  libc6   2.3.2.ds1-22
ii  kernel-image-2.6.8-11-amd64 2.6.8-16sarge1
ii  make3.80-9
--->8---

-- 
Mike Crowe


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]