On Samstag 06 Juni 2009, Alexander Puchmayr wrote:
> Hi there!
>
> This week I've tried to setup a home-server, but the system is highly
> instable. The first symptoms were lots of page allocation errors, which
> disappeared after setting the internal memory allocator from SLUB to SLAB
> and increasing the min_free_kbytes in /proc/sys/vm from 8MB to 20MB.
>
> The machine is a AMD Athlon64X2 5050e on a asus M3A78-Pro board with 2x2GB
> RAM. I'm using kernel 2.6.29.4 (vanilla, but the result is the same as
> using 2.6.29-gentoo-r5), and I also upgraded the board's BIOS to the latest
> version (which is 0902)
>
> But still the system freezes after some hours. It just freezes. Console is
> dead, no entry in the logs, no network connectivity, even sysrq doesn't
> seem to do anything. The worst thing is I don't even have an idea what the
> error could be, and in the rare situations when it crashed and the console
> was not blanked, I only see the end of a stack trace, and the intresting
> parts are scrolled out (and I can't scroll back as the console is
> absolutely dead :-(    ) The only button that is still working is the reset
> button, and after rebooting the log does't tell anything (just ends without
> any message)
>
> I inspected my dmesg-output right after booting more precisely, and I've
> found some strange entries which could indicate a problem. What do you
> think about them?
>
> [    0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in
> Gpe0Block: 64/32 [20081204]
> [    0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match
> PM1_EVT_LEN (4)
> ...
> [    0.000000] 4 Processors exceeds NR_CPUS limit of 2
> [    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
> ...
> [    0.000999] Aperture pointing to e820 RAM. Ignoring.
> [    0.000999] Your BIOS doesn't leave a aperture memory hole
> [    0.000999] Please enable the IOMMU option in the BIOS setup
> [    0.000999] This costs you 64 MB of RAM
> [    0.000999] Mapping aperture over 65536 KB of RAM @ 20000000
> [    0.000999] PM: Registered nosave memory: 0000000020000000 -
> 0000000024000000
> ...
> [    0.099055] mtrr: your CPUs had inconsistent fixed MTRR settings
> [    0.099059] mtrr: probably your BIOS does not setup all CPUs.
> [    0.099116] mtrr: corrected configuration.
> ...
> [    0.151260] PCI-DMA: Disabling AGP.
> [    0.151260] PCI-DMA: aperture base @ 20000000 size 65536 KB
> [    0.151260] PCI-DMA: using GART IOMMU.
> [    0.151260] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
> ...
> [    0.163241] system 00:09: iomem range 0xfec00000-0xfec00fff has been
> reserved
> [    0.163305] system 00:09: iomem range 0xfee00000-0xfee00fff has been
> reserved
> [    0.163365] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved
> [    0.163422] system 00:0a: ioport range 0x40b-0x40b has been reserved
> [    0.163480] system 00:0a: ioport range 0x4d6-0x4d6 has been reserved
> [    0.163537] system 00:0a: ioport range 0xc00-0xc01 has been reserved
> [    0.163595] system 00:0a: ioport range 0xc14-0xc14 has been reserved
> [    0.163653] system 00:0a: ioport range 0xc50-0xc51 has been reserved
> [    0.163711] system 00:0a: ioport range 0xc52-0xc52 has been reserved
> [    0.163769] system 00:0a: ioport range 0xc6c-0xc6c has been reserved
> [    0.163827] system 00:0a: ioport range 0xc6f-0xc6f has been reserved
> [    0.163885] system 00:0a: ioport range 0xcd0-0xcd1 has been reserved
> [    0.163942] system 00:0a: ioport range 0xcd2-0xcd3 has been reserved
> [    0.163999] system 00:0a: ioport range 0xcd4-0xcd5 has been reserved
> [    0.164070] system 00:0a: ioport range 0xcd6-0xcd7 has been reserved
> [    0.164127] system 00:0a: ioport range 0xcd8-0xcdf has been reserved
> [    0.164184] system 00:0a: ioport range 0x800-0x89f has been reserved
> [    0.164241] system 00:0a: ioport range 0xb00-0xb3f has been reserved
> [    0.164305] system 00:0a: ioport range 0x900-0x90f has been reserved
> [    0.164363] system 00:0a: ioport range 0x910-0x91f has been reserved
> [    0.164421] system 00:0a: ioport range 0xfe00-0xfefe has been reserved
> [    0.164480] system 00:0a: iomem range 0xffb80000-0xffbfffff has been
> reserved
> [    0.164538] system 00:0a: iomem range 0xfec10000-0xfec1001f has been
> reserved
> [    0.164598] system 00:0c: ioport range 0xe00-0xe0f has been reserved
> [    0.164656] system 00:0c: ioport range 0xe80-0xe8f has been reserved
> [    0.164713] system 00:0c: ioport range 0xf40-0xf4f has been reserved
> [    0.164771] system 00:0c: ioport range 0xa30-0xa3f has been reserved
> [    0.164830] system 00:0d: iomem range 0xe0000000-0xefffffff has been
> reserved
> [    0.164890] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
> [    0.164947] system 00:0e: iomem range 0xc0000-0xcffff has been reserved
> [    0.165018] system 00:0e: iomem range 0xe0000-0xfffff could not be
> reserved
> [    0.165076] system 00:0e: iomem range 0x100000-0xdfffffff could not be
> reserved
> [    0.165158] system 00:0e: iomem range 0xfec00000-0xffffffff could not be
> reserved
> ...
> [   21.298450] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with
> ACPI region SOR1 [0xb00-0xb0f]
> [   21.298454] ACPI: Device needs an ACPI driver
> [   21.298461] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00,
> revision 0
> ...
> [   73.861479] ACPI: I/O resource it87 [0xe85-0xe86] conflicts with ACPI
> region HWRE [0xe85-0xe86]
> [   73.861483] ACPI: Device needs an ACPI driver
>
> Whats does this message "4 Processors exceeds NR_CPUS" say? the system is a
> Dual-Core AMD Athlon64 5050e, AFAIK it has two cores and nothing more. The
> mttr-Message later also indicate that there could be more than 2 CPUs
> available. wondering...
>
> The next thing which seems somewhat strange to me is the AGP aperture and
> the IOMMU. The Mainboard does not have an AGP port, nor does the bios have
> any option to enable. The only thing I can set is the size of the memory
> reservered for the onboad video card, which I set to the smallest value of
> 32MB as the machine will usually not even have a display.
>
> The iomem-range reservation errors at the end? Harmful or not?
>
> The last messages come after loading the hw-sensors modules it87.ko and
> i2c_piix4.
>
> Thanks in advance for suggestions
>       Alex

*sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture have 
a builtin agpgart. This agpgart functions also as an iommu. This is a great 
hack to have a hardware iommu . Intel does not have this, so they rely on 
software. The solution came up while AMD devs and linux kernel devs worked 
together. 
Please read the following links:

http://en.wikipedia.org/wiki/Iommu

http://marc.info/?l=linux-kernel&m=107759901509280&w=2

http://marc.info/?l=linux-kernel&m=107764033904042&w=2

the iommu is needed so 32bit pci devices can live with their pci adress space 
behind 4gb and other sweet things.

Sadly the iommu needs a minimum on memory for itself - and uses the agp-
aperture. This is fine, but mobo vendors suck and make it too small/or not 
available. In that case the kernel is forced to use real memory for the iommu.

In short, that message has nothing to do with your problem.

The NR_CPU message is confusing - I strongly suspect that your kernel config 
is really fucked uo.

The iomem-range messages are harmless.
 
Please enable:

 [] Check for low memory corruption                                             
                                             
 [] Reserve low 64K of RAM on AMI/Phoenix BIOSen

in the kernel config. Also clean it up and remove stuff like 'hyperthreading 
scheduler'.

If the problem persists, start testing your hardware.

I would suspect the PSU.

Reply via email to