Re: Opteron box and 4Gb memory

2007-11-08 Thread Lennart Sorensen
On Tue, Nov 06, 2007 at 12:03:19AM +0100, J.A. Magall?n wrote:
> (correction, this was not AMD website but SuperMicro's)
> 
> I just said the same... Board is a SuperMicro H8DCE. From the FAQ section
> at supermicro:
> 
> Question
> I have an H8DCE motherboard with 4 x 1GB DIMMS installed but the amount of 
> memory displayed in the BIOS is 3.865GB and in 64-bit XP is 3.76GB. I have 
> the hardware memory hole option enabled in BIOS (rev 1.1a). How I can get the 
> board to count the full 4GB of memory?
> 
> Answer
> The total available size depends on the PCI-e card you are using; some 
> high-end cards may occupy more memory. For example, with a Quadro FX4500 on 
> the H8DCE with the memory hole enabled, 4GB memory will show up as 3728MB in 
> BIOS and 3.64GB in Windows. For some low-end PCI-e VGA cards, it may show up 
> as 4048MB in BIOS.
> 
> Why ? Who knows...
> Chipset is all nVidia. I have a GeForce 8800GTX with 768 Mb. It eats up
> 400Mb.
> 
> This are my settings:
> 
> BIOS-provided physical RAM map:
>  BIOS-e820:  - 0009fc00 (usable)
>  BIOS-e820: 0009fc00 - 000a (reserved)
>  BIOS-e820: 000e6000 - 0010 (reserved)
>  BIOS-e820: 0010 - 9ffd (usable)
>  BIOS-e820: 9ffd - 9ffde000 (ACPI data)
>  BIOS-e820: 9ffde000 - a000 (ACPI NVS)
>  BIOS-e820: fec0 - fec01000 (reserved)
>  BIOS-e820: fee0 - fee01000 (reserved)
>  BIOS-e820: ff78 - 0001 (reserved)
>  BIOS-e820: 0001 - 00014700 (usable)
> 
> cicely:~# cat /proc/mtrr
> reg00: base=0x1 (4096MB), size=1024MB: write-back, count=1
> reg01: base=0x14000 (5120MB), size=  64MB: write-back, count=1
> reg02: base=0x14400 (5184MB), size=  32MB: write-back, count=1
> reg03: base=0x14600 (5216MB), size=  16MB: write-back, count=1
> reg04: base=0x (   0MB), size=2048MB: write-back, count=1
> reg05: base=0x8000 (2048MB), size= 512MB: write-back, count=1
> 
> This is with BIOS set for MTRR=Discrete.
> With MTRR=Continuous, the mtrr's are simpler, a full range and a non-usable
> hole. Which is better for Linux ? Many separate usable zones or one big zone
> and an un-usable hole ?

Seems odd if they can't just map memory as:
2GB at 0
512MB at 2GB
1GB at 4GB
512MB at 5GB

Or for that matter:
2GB at 0
2GB at 4GB
Why can't they do things as simple as that?  If you have a 64bit OS that
would be perfectly fine.

It appears they have 
2GB at 0
512MB at 2GB
1GB at 4GB
64MB at 5GB
32MB at 5GB+64MB
16MB at 5GB+64MB+32MB

Where did they map the rest of the ram?  All I can think there is that
they messed up.  Of course some think that having as much ram for XP
32bit as possible is more important than sane systems, so they will map
more in the first 4GB where XP32 can use it, and then due to alignments
and rounding of the mapping they can't get all the remaining ram mapped
above 4GB.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-08 Thread Lennart Sorensen
On Tue, Nov 06, 2007 at 12:03:19AM +0100, J.A. Magall?n wrote:
 (correction, this was not AMD website but SuperMicro's)
 
 I just said the same... Board is a SuperMicro H8DCE. From the FAQ section
 at supermicro:
 
 Question
 I have an H8DCE motherboard with 4 x 1GB DIMMS installed but the amount of 
 memory displayed in the BIOS is 3.865GB and in 64-bit XP is 3.76GB. I have 
 the hardware memory hole option enabled in BIOS (rev 1.1a). How I can get the 
 board to count the full 4GB of memory?
 
 Answer
 The total available size depends on the PCI-e card you are using; some 
 high-end cards may occupy more memory. For example, with a Quadro FX4500 on 
 the H8DCE with the memory hole enabled, 4GB memory will show up as 3728MB in 
 BIOS and 3.64GB in Windows. For some low-end PCI-e VGA cards, it may show up 
 as 4048MB in BIOS.
 
 Why ? Who knows...
 Chipset is all nVidia. I have a GeForce 8800GTX with 768 Mb. It eats up
 400Mb.
 
 This are my settings:
 
 BIOS-provided physical RAM map:
  BIOS-e820:  - 0009fc00 (usable)
  BIOS-e820: 0009fc00 - 000a (reserved)
  BIOS-e820: 000e6000 - 0010 (reserved)
  BIOS-e820: 0010 - 9ffd (usable)
  BIOS-e820: 9ffd - 9ffde000 (ACPI data)
  BIOS-e820: 9ffde000 - a000 (ACPI NVS)
  BIOS-e820: fec0 - fec01000 (reserved)
  BIOS-e820: fee0 - fee01000 (reserved)
  BIOS-e820: ff78 - 0001 (reserved)
  BIOS-e820: 0001 - 00014700 (usable)
 
 cicely:~# cat /proc/mtrr
 reg00: base=0x1 (4096MB), size=1024MB: write-back, count=1
 reg01: base=0x14000 (5120MB), size=  64MB: write-back, count=1
 reg02: base=0x14400 (5184MB), size=  32MB: write-back, count=1
 reg03: base=0x14600 (5216MB), size=  16MB: write-back, count=1
 reg04: base=0x (   0MB), size=2048MB: write-back, count=1
 reg05: base=0x8000 (2048MB), size= 512MB: write-back, count=1
 
 This is with BIOS set for MTRR=Discrete.
 With MTRR=Continuous, the mtrr's are simpler, a full range and a non-usable
 hole. Which is better for Linux ? Many separate usable zones or one big zone
 and an un-usable hole ?

Seems odd if they can't just map memory as:
2GB at 0
512MB at 2GB
1GB at 4GB
512MB at 5GB

Or for that matter:
2GB at 0
2GB at 4GB
Why can't they do things as simple as that?  If you have a 64bit OS that
would be perfectly fine.

It appears they have 
2GB at 0
512MB at 2GB
1GB at 4GB
64MB at 5GB
32MB at 5GB+64MB
16MB at 5GB+64MB+32MB

Where did they map the rest of the ram?  All I can think there is that
they messed up.  Of course some think that having as much ram for XP
32bit as possible is more important than sane systems, so they will map
more in the first 4GB where XP32 can use it, and then due to alignments
and rounding of the mapping they can't get all the remaining ram mapped
above 4GB.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-05 Thread J.A. Magallón
On Mon, 5 Nov 2007 13:50:40 -0500, [EMAIL PROTECTED] (Lennart Sorensen) wrote:

> On Mon, Nov 05, 2007 at 07:45:21PM +0100, J.A. Magall?n wrote:
> > Well, problem solved...
> > 
> > I'm going to kill all pc assemblers in the world... Someone should teach 
> > them
> > to learn mauals before assembling anything but a power chord.
> > 
> > The memory was not paired, so the motherboard was not interleaving the 
> > access.
> > With no inter-node but with inter-module interleaving, and a couple 1Gb 
> > sticks
> > for each processor now I get something like:
> > 
> > cicely:~/bn> bn
> > name: cicely.cps.unizar.es
> > arch: x86-64
> > proc: 4 x x86_64 @ 2200 MHz
> > ram:  3555 Mb
> > os:   unx, Linux, 2.6.23.1-desktop-1mdv
> > cc:   gcc-4.3.0
> > vector size   : 8 x 1024 x 1024
> > allocation: 0.02 ms
> > int scl add: ..   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
> > int scl mul: ..   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
> > flt scl add: ..   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
> > flt vec add: ..   14.79 ms,  567.06 Mflops |  257.75 Mflops/GHz
> > flt scl mul: ..   59.02 ms,  142.12 Mflops |   64.60 Mflops/GHz
> > flt vec mul: ..   14.82 ms,  566.19 Mflops |  257.36 Mflops/GHz
> > total:   5019.86 ms
> > 
> > Much better, but not like the other opteron box.
> > 
> > My processors are higher than Rev E0, because the BIOS does not let me 
> > choose
> > the 'software' hole. If I activate the 'hardware hole', I see al the memory
> > I can:
> > 
> > cicely:~/bn> free
> >  total   used   free sharedbuffers cached
> > Mem:   3640628 2144963426132  0  21240  84184
> > -/+ buffers/cache: 1090723531556
> > Swap:  4200988  04200988
> > 
> > 3.64 Gb. The rest is eaten by the graphics card, as I could read in the
> > AMD site. Don't know if mem=4096 to boot the kernel would help, even if it
> > is possible (don't think so, as it looks like a BIOS mis-feature).
> > The ram is DDR 400.
> 
> The video card is stealing 300MB of ram?  What for?  What does the mtrr
> and e820 map look like with the hardware hole enabled?
> 

(correction, this was not AMD website but SuperMicro's)

I just said the same... Board is a SuperMicro H8DCE. From the FAQ section
at supermicro:

Question
I have an H8DCE motherboard with 4 x 1GB DIMMS installed but the amount of 
memory displayed in the BIOS is 3.865GB and in 64-bit XP is 3.76GB. I have the 
hardware memory hole option enabled in BIOS (rev 1.1a). How I can get the board 
to count the full 4GB of memory?

Answer
The total available size depends on the PCI-e card you are using; some high-end 
cards may occupy more memory. For example, with a Quadro FX4500 on the H8DCE 
with the memory hole enabled, 4GB memory will show up as 3728MB in BIOS and 
3.64GB in Windows. For some low-end PCI-e VGA cards, it may show up as 4048MB 
in BIOS.

Why ? Who knows...
Chipset is all nVidia. I have a GeForce 8800GTX with 768 Mb. It eats up
400Mb.

This are my settings:

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 9ffd (usable)
 BIOS-e820: 9ffd - 9ffde000 (ACPI data)
 BIOS-e820: 9ffde000 - a000 (ACPI NVS)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ff78 - 0001 (reserved)
 BIOS-e820: 0001 - 00014700 (usable)

cicely:~# cat /proc/mtrr
reg00: base=0x1 (4096MB), size=1024MB: write-back, count=1
reg01: base=0x14000 (5120MB), size=  64MB: write-back, count=1
reg02: base=0x14400 (5184MB), size=  32MB: write-back, count=1
reg03: base=0x14600 (5216MB), size=  16MB: write-back, count=1
reg04: base=0x (   0MB), size=2048MB: write-back, count=1
reg05: base=0x8000 (2048MB), size= 512MB: write-back, count=1

This is with BIOS set for MTRR=Discrete.
With MTRR=Continuous, the mtrr's are simpler, a full range and a non-usable
hole. Which is better for Linux ? Many separate usable zones or one big zone
and an un-usable hole ?

BTW, mtrr formatting should be set to 0x%013lx000, to get them aligned with
nowadays memory amounts and similar to e820 map, 16 hex digits...

> > Anyways, can I trust what dmidecode says ? I installed the ram as the board
> > manual said in banks 1A+1B (not 2A+2B) for each processor, but this program
> > says this:
> > 
> > BANK0   64MbBANK4   64Mb
> > BANK1   64MbBANK5   64Mb
> > BANK2 1024MbBANK6 1024Mb
> > BANK3 1024MbBANK7 1024Mb
> > 
> > I would always have thought that BANK0 would be slot 1A in first processor,
> > but it looks like not...
> > And where do the 64 Mb 

Re: Opteron box and 4Gb memory

2007-11-05 Thread Lennart Sorensen
On Mon, Nov 05, 2007 at 07:45:21PM +0100, J.A. Magall?n wrote:
> Well, problem solved...
> 
> I'm going to kill all pc assemblers in the world... Someone should teach them
> to learn mauals before assembling anything but a power chord.
> 
> The memory was not paired, so the motherboard was not interleaving the access.
> With no inter-node but with inter-module interleaving, and a couple 1Gb sticks
> for each processor now I get something like:
> 
> cicely:~/bn> bn
>   name: cicely.cps.unizar.es
>   arch: x86-64
>   proc: 4 x x86_64 @ 2200 MHz
>   ram:  3555 Mb
>   os:   unx, Linux, 2.6.23.1-desktop-1mdv
>   cc:   gcc-4.3.0
> vector size   : 8 x 1024 x 1024
> allocation: 0.02 ms
> int scl add: ..   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
> int scl mul: ..   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
> flt scl add: ..   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
> flt vec add: ..   14.79 ms,  567.06 Mflops |  257.75 Mflops/GHz
> flt scl mul: ..   59.02 ms,  142.12 Mflops |   64.60 Mflops/GHz
> flt vec mul: ..   14.82 ms,  566.19 Mflops |  257.36 Mflops/GHz
> total:   5019.86 ms
> 
> Much better, but not like the other opteron box.
> 
> My processors are higher than Rev E0, because the BIOS does not let me choose
> the 'software' hole. If I activate the 'hardware hole', I see al the memory
> I can:
> 
> cicely:~/bn> free
>  total   used   free sharedbuffers cached
> Mem:   3640628 2144963426132  0  21240  84184
> -/+ buffers/cache: 1090723531556
> Swap:  4200988  04200988
> 
> 3.64 Gb. The rest is eaten by the graphics card, as I could read in the
> AMD site. Don't know if mem=4096 to boot the kernel would help, even if it
> is possible (don't think so, as it looks like a BIOS mis-feature).
> The ram is DDR 400.

The video card is stealing 300MB of ram?  What for?  What does the mtrr
and e820 map look like with the hardware hole enabled?

> Anyways, can I trust what dmidecode says ? I installed the ram as the board
> manual said in banks 1A+1B (not 2A+2B) for each processor, but this program
> says this:
> 
> BANK0   64MbBANK4   64Mb
> BANK1   64MbBANK5   64Mb
> BANK2 1024MbBANK6 1024Mb
> BANK3 1024MbBANK7 1024Mb
> 
> I would always have thought that BANK0 would be slot 1A in first processor,
> but it looks like not...
> And where do the 64 Mb blocks come from ?

Well if you ahve 4 sticks of 1GB, then I would hope they are installed
as a pair for each CPU so that both CPUs can have dual channel ram
directly connected.

I have no idea where the 64Mb comes from.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-05 Thread J.A. Magallón
On Mon, 5 Nov 2007 13:10:46 -0500, [EMAIL PROTECTED] (Lennart Sorensen) wrote:

> On Mon, Nov 05, 2007 at 12:18:47AM +0100, J.A. Magall?n wrote:
> > Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
> > but I'm still in the process to find the 'software hole' option to get
> > the rest of the 4Gb...
> > 
> > But now another (perhaps related) question has arised...
> > I like all those 5-line progams to test system performance...;).
> > I just wrote a simple program that sums/muls int/float vectors with
> > scalar/sse operations. And my opteron box looks terribly slow.
> > 
> > This is my MacPro, Xeon 5130:
> > 
> > belly:~/bn> bn  
> > proc: 4 x MacPro1,1 @ 2000 MHz
> > ram:  2048 Mb
> > os:   unx, Darwin, 9.0.0
> > cc:   gcc-4.0.1
> > vector size   : 8 x 1024 x 1024
> > allocation: 0.01 ms
> > int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
> > int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
> > flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
> > flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
> > flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
> > flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
> > total:   3322.19 ms
> > 
> > This is a normal (I think) opteron box (Opteron 846):
> > 
> > selene:~/bn> g  
> > proc: 4 x x86_64 @ 2004 MHz
> > ram:  3496 Mb
> > os:   unx, Linux, 2.6.9-42.0.10.ELsmp
> > cc:   gcc-4.0.2
> > vector size   : 8 x 1024 x 1024
> > allocation: 0.05 ms
> > int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
> > int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
> > flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
> > flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
> > flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
> > flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
> > total:   4705.07 ms
> > 
> > And this is my opteron (Opteron 275)
> > 
> > cicely:~/bn> g  
> > proc: 4 x x86_64 @ 2200 MHz
> > ram:  2914 Mb
> > os:   unx, Linux, 2.6.23.1-desktop-1mdv
> > cc:   gcc-4.0.2
> > vector size   : 8 x 1024 x 1024
> > allocation: 0.03 ms
> > int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
> > int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
> > flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
> > flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
> > flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
> > flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
> > total:   6334.96 ms
> > 
> > As I read in AMD site, the only difference that matters in models is
> > the xx5 vx xx6, related to fequency, but the processors should be just
> > the same.
> > 
> > As this only does intensive memory/fp operations, I'm not going to blame
> > gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
> > on one of the boxes and results are very similar, the code is really
> > stupid and not very suitable for compiler smartness...).
> > I suspect it is a memory problem. It can be hardware or caused by
> > incorrect BIOS/kernel-mtrr setup:
> > 
> > selene:~> cat /proc/mtrr
> > reg00: base=0x (   0MB), size=16384MB: write-back, count=1
> > reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
> > 
> > cicely:~> cat /proc/mtrr
> > reg00: base=0x (   0MB), size=2048MB: write-back, count=1
> > reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
> > reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
> > reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
> > reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1
> > 
> > 
> > Any idea on what can be going on here ? I have asked the 'good opteron'
> > admin info about the mobo an memory of the box.
> > 
> > Any help will be _very_ appreciated.
> 
> Well what revisions are the two opterons?  Is one running dual channel
> memory while the other isn't perhaps?  What speed and type is the ram on
> the two opterons?
> 

Well, problem solved...

I'm going to kill all pc assemblers in the world... Someone should teach them
to learn mauals before assembling anything but a power chord.

The memory was not paired, so the motherboard was not interleaving the access.
With no inter-node but with inter-module interleaving, and a couple 1Gb sticks
for each processor now I get something like:

cicely:~/bn> bn
name: cicely.cps.unizar.es
arch: x86-64
proc: 4 x x86_64 @ 2200 MHz
ram:  3555 Mb
os:   unx, Linux, 2.6.23.1-desktop-1mdv
cc:   gcc-4.3.0
vector size   : 8 x 1024 x 1024
allocation: 0.02 ms
int scl add: ..   60.56 ms,  138.52 

Re: Opteron box and 4Gb memory

2007-11-05 Thread Lennart Sorensen
On Mon, Nov 05, 2007 at 12:18:47AM +0100, J.A. Magall?n wrote:
> Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
> but I'm still in the process to find the 'software hole' option to get
> the rest of the 4Gb...
> 
> But now another (perhaps related) question has arised...
> I like all those 5-line progams to test system performance...;).
> I just wrote a simple program that sums/muls int/float vectors with
> scalar/sse operations. And my opteron box looks terribly slow.
> 
> This is my MacPro, Xeon 5130:
> 
> belly:~/bn> bn  
>   proc: 4 x MacPro1,1 @ 2000 MHz
>   ram:  2048 Mb
>   os:   unx, Darwin, 9.0.0
>   cc:   gcc-4.0.1
> vector size   : 8 x 1024 x 1024
> allocation: 0.01 ms
> int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
> int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
> flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
> flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
> flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
> flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
> total:   3322.19 ms
> 
> This is a normal (I think) opteron box (Opteron 846):
> 
> selene:~/bn> g  
>   proc: 4 x x86_64 @ 2004 MHz
>   ram:  3496 Mb
>   os:   unx, Linux, 2.6.9-42.0.10.ELsmp
>   cc:   gcc-4.0.2
> vector size   : 8 x 1024 x 1024
> allocation: 0.05 ms
> int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
> int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
> flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
> flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
> flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
> flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
> total:   4705.07 ms
> 
> And this is my opteron (Opteron 275)
> 
> cicely:~/bn> g  
>   proc: 4 x x86_64 @ 2200 MHz
>   ram:  2914 Mb
>   os:   unx, Linux, 2.6.23.1-desktop-1mdv
>   cc:   gcc-4.0.2
> vector size   : 8 x 1024 x 1024
> allocation: 0.03 ms
> int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
> int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
> flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
> flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
> flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
> flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
> total:   6334.96 ms
> 
> As I read in AMD site, the only difference that matters in models is
> the xx5 vx xx6, related to fequency, but the processors should be just
> the same.
> 
> As this only does intensive memory/fp operations, I'm not going to blame
> gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
> on one of the boxes and results are very similar, the code is really
> stupid and not very suitable for compiler smartness...).
> I suspect it is a memory problem. It can be hardware or caused by
> incorrect BIOS/kernel-mtrr setup:
> 
> selene:~> cat /proc/mtrr
> reg00: base=0x (   0MB), size=16384MB: write-back, count=1
> reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
> 
> cicely:~> cat /proc/mtrr
> reg00: base=0x (   0MB), size=2048MB: write-back, count=1
> reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
> reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
> reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
> reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1
> 
> 
> Any idea on what can be going on here ? I have asked the 'good opteron'
> admin info about the mobo an memory of the box.
> 
> Any help will be _very_ appreciated.

Well what revisions are the two opterons?  Is one running dual channel
memory while the other isn't perhaps?  What speed and type is the ram on
the two opterons?

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-05 Thread Lennart Sorensen
On Mon, Nov 05, 2007 at 12:18:47AM +0100, J.A. Magall?n wrote:
 Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
 but I'm still in the process to find the 'software hole' option to get
 the rest of the 4Gb...
 
 But now another (perhaps related) question has arised...
 I like all those 5-line progams to test system performance...;).
 I just wrote a simple program that sums/muls int/float vectors with
 scalar/sse operations. And my opteron box looks terribly slow.
 
 This is my MacPro, Xeon 5130:
 
 belly:~/bn bn  
   proc: 4 x MacPro1,1 @ 2000 MHz
   ram:  2048 Mb
   os:   unx, Darwin, 9.0.0
   cc:   gcc-4.0.1
 vector size   : 8 x 1024 x 1024
 allocation: 0.01 ms
 int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
 int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
 flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
 flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
 flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
 flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
 total:   3322.19 ms
 
 This is a normal (I think) opteron box (Opteron 846):
 
 selene:~/bn g  
   proc: 4 x x86_64 @ 2004 MHz
   ram:  3496 Mb
   os:   unx, Linux, 2.6.9-42.0.10.ELsmp
   cc:   gcc-4.0.2
 vector size   : 8 x 1024 x 1024
 allocation: 0.05 ms
 int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
 int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
 flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
 flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
 flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
 flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
 total:   4705.07 ms
 
 And this is my opteron (Opteron 275)
 
 cicely:~/bn g  
   proc: 4 x x86_64 @ 2200 MHz
   ram:  2914 Mb
   os:   unx, Linux, 2.6.23.1-desktop-1mdv
   cc:   gcc-4.0.2
 vector size   : 8 x 1024 x 1024
 allocation: 0.03 ms
 int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
 int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
 flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
 flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
 flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
 flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
 total:   6334.96 ms
 
 As I read in AMD site, the only difference that matters in models is
 the xx5 vx xx6, related to fequency, but the processors should be just
 the same.
 
 As this only does intensive memory/fp operations, I'm not going to blame
 gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
 on one of the boxes and results are very similar, the code is really
 stupid and not very suitable for compiler smartness...).
 I suspect it is a memory problem. It can be hardware or caused by
 incorrect BIOS/kernel-mtrr setup:
 
 selene:~ cat /proc/mtrr
 reg00: base=0x (   0MB), size=16384MB: write-back, count=1
 reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
 
 cicely:~ cat /proc/mtrr
 reg00: base=0x (   0MB), size=2048MB: write-back, count=1
 reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
 reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
 reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
 reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1
 
 
 Any idea on what can be going on here ? I have asked the 'good opteron'
 admin info about the mobo an memory of the box.
 
 Any help will be _very_ appreciated.

Well what revisions are the two opterons?  Is one running dual channel
memory while the other isn't perhaps?  What speed and type is the ram on
the two opterons?

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-05 Thread J.A. Magallón
On Mon, 5 Nov 2007 13:10:46 -0500, [EMAIL PROTECTED] (Lennart Sorensen) wrote:

 On Mon, Nov 05, 2007 at 12:18:47AM +0100, J.A. Magall?n wrote:
  Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
  but I'm still in the process to find the 'software hole' option to get
  the rest of the 4Gb...
  
  But now another (perhaps related) question has arised...
  I like all those 5-line progams to test system performance...;).
  I just wrote a simple program that sums/muls int/float vectors with
  scalar/sse operations. And my opteron box looks terribly slow.
  
  This is my MacPro, Xeon 5130:
  
  belly:~/bn bn  
  proc: 4 x MacPro1,1 @ 2000 MHz
  ram:  2048 Mb
  os:   unx, Darwin, 9.0.0
  cc:   gcc-4.0.1
  vector size   : 8 x 1024 x 1024
  allocation: 0.01 ms
  int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
  int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
  flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
  flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
  flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
  flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
  total:   3322.19 ms
  
  This is a normal (I think) opteron box (Opteron 846):
  
  selene:~/bn g  
  proc: 4 x x86_64 @ 2004 MHz
  ram:  3496 Mb
  os:   unx, Linux, 2.6.9-42.0.10.ELsmp
  cc:   gcc-4.0.2
  vector size   : 8 x 1024 x 1024
  allocation: 0.05 ms
  int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
  int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
  flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
  flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
  flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
  flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
  total:   4705.07 ms
  
  And this is my opteron (Opteron 275)
  
  cicely:~/bn g  
  proc: 4 x x86_64 @ 2200 MHz
  ram:  2914 Mb
  os:   unx, Linux, 2.6.23.1-desktop-1mdv
  cc:   gcc-4.0.2
  vector size   : 8 x 1024 x 1024
  allocation: 0.03 ms
  int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
  int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
  flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
  flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
  flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
  flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
  total:   6334.96 ms
  
  As I read in AMD site, the only difference that matters in models is
  the xx5 vx xx6, related to fequency, but the processors should be just
  the same.
  
  As this only does intensive memory/fp operations, I'm not going to blame
  gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
  on one of the boxes and results are very similar, the code is really
  stupid and not very suitable for compiler smartness...).
  I suspect it is a memory problem. It can be hardware or caused by
  incorrect BIOS/kernel-mtrr setup:
  
  selene:~ cat /proc/mtrr
  reg00: base=0x (   0MB), size=16384MB: write-back, count=1
  reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
  
  cicely:~ cat /proc/mtrr
  reg00: base=0x (   0MB), size=2048MB: write-back, count=1
  reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
  reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
  reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
  reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1
  
  
  Any idea on what can be going on here ? I have asked the 'good opteron'
  admin info about the mobo an memory of the box.
  
  Any help will be _very_ appreciated.
 
 Well what revisions are the two opterons?  Is one running dual channel
 memory while the other isn't perhaps?  What speed and type is the ram on
 the two opterons?
 

Well, problem solved...

I'm going to kill all pc assemblers in the world... Someone should teach them
to learn mauals before assembling anything but a power chord.

The memory was not paired, so the motherboard was not interleaving the access.
With no inter-node but with inter-module interleaving, and a couple 1Gb sticks
for each processor now I get something like:

cicely:~/bn bn
name: cicely.cps.unizar.es
arch: x86-64
proc: 4 x x86_64 @ 2200 MHz
ram:  3555 Mb
os:   unx, Linux, 2.6.23.1-desktop-1mdv
cc:   gcc-4.3.0
vector size   : 8 x 1024 x 1024
allocation: 0.02 ms
int scl add: ..   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
int scl mul: ..   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
flt scl add: ..   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
flt vec add: 

Re: Opteron box and 4Gb memory

2007-11-05 Thread Lennart Sorensen
On Mon, Nov 05, 2007 at 07:45:21PM +0100, J.A. Magall?n wrote:
 Well, problem solved...
 
 I'm going to kill all pc assemblers in the world... Someone should teach them
 to learn mauals before assembling anything but a power chord.
 
 The memory was not paired, so the motherboard was not interleaving the access.
 With no inter-node but with inter-module interleaving, and a couple 1Gb sticks
 for each processor now I get something like:
 
 cicely:~/bn bn
   name: cicely.cps.unizar.es
   arch: x86-64
   proc: 4 x x86_64 @ 2200 MHz
   ram:  3555 Mb
   os:   unx, Linux, 2.6.23.1-desktop-1mdv
   cc:   gcc-4.3.0
 vector size   : 8 x 1024 x 1024
 allocation: 0.02 ms
 int scl add: ..   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
 int scl mul: ..   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
 flt scl add: ..   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
 flt vec add: ..   14.79 ms,  567.06 Mflops |  257.75 Mflops/GHz
 flt scl mul: ..   59.02 ms,  142.12 Mflops |   64.60 Mflops/GHz
 flt vec mul: ..   14.82 ms,  566.19 Mflops |  257.36 Mflops/GHz
 total:   5019.86 ms
 
 Much better, but not like the other opteron box.
 
 My processors are higher than Rev E0, because the BIOS does not let me choose
 the 'software' hole. If I activate the 'hardware hole', I see al the memory
 I can:
 
 cicely:~/bn free
  total   used   free sharedbuffers cached
 Mem:   3640628 2144963426132  0  21240  84184
 -/+ buffers/cache: 1090723531556
 Swap:  4200988  04200988
 
 3.64 Gb. The rest is eaten by the graphics card, as I could read in the
 AMD site. Don't know if mem=4096 to boot the kernel would help, even if it
 is possible (don't think so, as it looks like a BIOS mis-feature).
 The ram is DDR 400.

The video card is stealing 300MB of ram?  What for?  What does the mtrr
and e820 map look like with the hardware hole enabled?

 Anyways, can I trust what dmidecode says ? I installed the ram as the board
 manual said in banks 1A+1B (not 2A+2B) for each processor, but this program
 says this:
 
 BANK0   64MbBANK4   64Mb
 BANK1   64MbBANK5   64Mb
 BANK2 1024MbBANK6 1024Mb
 BANK3 1024MbBANK7 1024Mb
 
 I would always have thought that BANK0 would be slot 1A in first processor,
 but it looks like not...
 And where do the 64 Mb blocks come from ?

Well if you ahve 4 sticks of 1GB, then I would hope they are installed
as a pair for each CPU so that both CPUs can have dual channel ram
directly connected.

I have no idea where the 64Mb comes from.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-11-05 Thread J.A. Magallón
On Mon, 5 Nov 2007 13:50:40 -0500, [EMAIL PROTECTED] (Lennart Sorensen) wrote:

 On Mon, Nov 05, 2007 at 07:45:21PM +0100, J.A. Magall?n wrote:
  Well, problem solved...
  
  I'm going to kill all pc assemblers in the world... Someone should teach 
  them
  to learn mauals before assembling anything but a power chord.
  
  The memory was not paired, so the motherboard was not interleaving the 
  access.
  With no inter-node but with inter-module interleaving, and a couple 1Gb 
  sticks
  for each processor now I get something like:
  
  cicely:~/bn bn
  name: cicely.cps.unizar.es
  arch: x86-64
  proc: 4 x x86_64 @ 2200 MHz
  ram:  3555 Mb
  os:   unx, Linux, 2.6.23.1-desktop-1mdv
  cc:   gcc-4.3.0
  vector size   : 8 x 1024 x 1024
  allocation: 0.02 ms
  int scl add: ..   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
  int scl mul: ..   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
  flt scl add: ..   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
  flt vec add: ..   14.79 ms,  567.06 Mflops |  257.75 Mflops/GHz
  flt scl mul: ..   59.02 ms,  142.12 Mflops |   64.60 Mflops/GHz
  flt vec mul: ..   14.82 ms,  566.19 Mflops |  257.36 Mflops/GHz
  total:   5019.86 ms
  
  Much better, but not like the other opteron box.
  
  My processors are higher than Rev E0, because the BIOS does not let me 
  choose
  the 'software' hole. If I activate the 'hardware hole', I see al the memory
  I can:
  
  cicely:~/bn free
   total   used   free sharedbuffers cached
  Mem:   3640628 2144963426132  0  21240  84184
  -/+ buffers/cache: 1090723531556
  Swap:  4200988  04200988
  
  3.64 Gb. The rest is eaten by the graphics card, as I could read in the
  AMD site. Don't know if mem=4096 to boot the kernel would help, even if it
  is possible (don't think so, as it looks like a BIOS mis-feature).
  The ram is DDR 400.
 
 The video card is stealing 300MB of ram?  What for?  What does the mtrr
 and e820 map look like with the hardware hole enabled?
 

(correction, this was not AMD website but SuperMicro's)

I just said the same... Board is a SuperMicro H8DCE. From the FAQ section
at supermicro:

Question
I have an H8DCE motherboard with 4 x 1GB DIMMS installed but the amount of 
memory displayed in the BIOS is 3.865GB and in 64-bit XP is 3.76GB. I have the 
hardware memory hole option enabled in BIOS (rev 1.1a). How I can get the board 
to count the full 4GB of memory?

Answer
The total available size depends on the PCI-e card you are using; some high-end 
cards may occupy more memory. For example, with a Quadro FX4500 on the H8DCE 
with the memory hole enabled, 4GB memory will show up as 3728MB in BIOS and 
3.64GB in Windows. For some low-end PCI-e VGA cards, it may show up as 4048MB 
in BIOS.

Why ? Who knows...
Chipset is all nVidia. I have a GeForce 8800GTX with 768 Mb. It eats up
400Mb.

This are my settings:

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 9ffd (usable)
 BIOS-e820: 9ffd - 9ffde000 (ACPI data)
 BIOS-e820: 9ffde000 - a000 (ACPI NVS)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ff78 - 0001 (reserved)
 BIOS-e820: 0001 - 00014700 (usable)

cicely:~# cat /proc/mtrr
reg00: base=0x1 (4096MB), size=1024MB: write-back, count=1
reg01: base=0x14000 (5120MB), size=  64MB: write-back, count=1
reg02: base=0x14400 (5184MB), size=  32MB: write-back, count=1
reg03: base=0x14600 (5216MB), size=  16MB: write-back, count=1
reg04: base=0x (   0MB), size=2048MB: write-back, count=1
reg05: base=0x8000 (2048MB), size= 512MB: write-back, count=1

This is with BIOS set for MTRR=Discrete.
With MTRR=Continuous, the mtrr's are simpler, a full range and a non-usable
hole. Which is better for Linux ? Many separate usable zones or one big zone
and an un-usable hole ?

BTW, mtrr formatting should be set to 0x%013lx000, to get them aligned with
nowadays memory amounts and similar to e820 map, 16 hex digits...

  Anyways, can I trust what dmidecode says ? I installed the ram as the board
  manual said in banks 1A+1B (not 2A+2B) for each processor, but this program
  says this:
  
  BANK0   64MbBANK4   64Mb
  BANK1   64MbBANK5   64Mb
  BANK2 1024MbBANK6 1024Mb
  BANK3 1024MbBANK7 1024Mb
  
  I would always have thought that BANK0 would be slot 1A in first processor,
  but it looks like not...
  And where do the 64 Mb blocks come from ?
 
 Well if you ahve 4 sticks of 1GB, then I would hope they are installed
 as a pair for each CPU so 

Re: Opteron box and 4Gb memory

2007-11-04 Thread J.A. Magallón
On Thu, 25 Oct 2007 14:58:10 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> J.A. Magallon wrote:
> > Hi...
> > 
> > I have some Quad-Opteron boxes with 4Gb memory and two of them are
> > running two different Linux distros.
> > 
> > Box one sees 4Gb of memory, but box two just sees 3.
> > Their mtrr setups are different:
> > 
> > Why ? Is it a bios setup problem ? A kernel problem ?
> > grep HIGHMEN in configs for both kernels does not give anything, so
> > I still understand less this thing...
> > 
> 
> It would depend on how the BIOS programmed the memory controllers.  For 
> 32-bit (and lots of device) compatibility, a memory hole is required 
> below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
> range above the 4 GB memory; I'm not sure if that varies with the 
> different Opteron processors.
> 

Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
but I'm still in the process to find the 'software hole' option to get
the rest of the 4Gb...

But now another (perhaps related) question has arised...
I like all those 5-line progams to test system performance...;).
I just wrote a simple program that sums/muls int/float vectors with
scalar/sse operations. And my opteron box looks terribly slow.

This is my MacPro, Xeon 5130:

belly:~/bn> bn  
proc: 4 x MacPro1,1 @ 2000 MHz
ram:  2048 Mb
os:   unx, Darwin, 9.0.0
cc:   gcc-4.0.1
vector size   : 8 x 1024 x 1024
allocation: 0.01 ms
int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
total:   3322.19 ms

This is a normal (I think) opteron box (Opteron 846):

selene:~/bn> g  
proc: 4 x x86_64 @ 2004 MHz
ram:  3496 Mb
os:   unx, Linux, 2.6.9-42.0.10.ELsmp
cc:   gcc-4.0.2
vector size   : 8 x 1024 x 1024
allocation: 0.05 ms
int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
total:   4705.07 ms

And this is my opteron (Opteron 275)

cicely:~/bn> g  
proc: 4 x x86_64 @ 2200 MHz
ram:  2914 Mb
os:   unx, Linux, 2.6.23.1-desktop-1mdv
cc:   gcc-4.0.2
vector size   : 8 x 1024 x 1024
allocation: 0.03 ms
int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
total:   6334.96 ms

As I read in AMD site, the only difference that matters in models is
the xx5 vx xx6, related to fequency, but the processors should be just
the same.

As this only does intensive memory/fp operations, I'm not going to blame
gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
on one of the boxes and results are very similar, the code is really
stupid and not very suitable for compiler smartness...).
I suspect it is a memory problem. It can be hardware or caused by
incorrect BIOS/kernel-mtrr setup:

selene:~> cat /proc/mtrr
reg00: base=0x (   0MB), size=16384MB: write-back, count=1
reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1

cicely:~> cat /proc/mtrr
reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1


Any idea on what can be going on here ? I have asked the 'good opteron'
admin info about the mobo an memory of the box.

Any help will be _very_ appreciated.

TIA

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam01 (gcc 4.2.2 20070909 (4.2.2-0.RC.1mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo 

Re: Opteron box and 4Gb memory

2007-11-04 Thread J.A. Magallón
On Thu, 25 Oct 2007 14:58:10 -0700, H. Peter Anvin [EMAIL PROTECTED] wrote:

 J.A. Magallon wrote:
  Hi...
  
  I have some Quad-Opteron boxes with 4Gb memory and two of them are
  running two different Linux distros.
  
  Box one sees 4Gb of memory, but box two just sees 3.
  Their mtrr setups are different:
  
  Why ? Is it a bios setup problem ? A kernel problem ?
  grep HIGHMEN in configs for both kernels does not give anything, so
  I still understand less this thing...
  
 
 It would depend on how the BIOS programmed the memory controllers.  For 
 32-bit (and lots of device) compatibility, a memory hole is required 
 below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
 range above the 4 GB memory; I'm not sure if that varies with the 
 different Opteron processors.
 

Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
but I'm still in the process to find the 'software hole' option to get
the rest of the 4Gb...

But now another (perhaps related) question has arised...
I like all those 5-line progams to test system performance...;).
I just wrote a simple program that sums/muls int/float vectors with
scalar/sse operations. And my opteron box looks terribly slow.

This is my MacPro, Xeon 5130:

belly:~/bn bn  
proc: 4 x MacPro1,1 @ 2000 MHz
ram:  2048 Mb
os:   unx, Darwin, 9.0.0
cc:   gcc-4.0.1
vector size   : 8 x 1024 x 1024
allocation: 0.01 ms
int scl add: ..   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
int scl mul: ..   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
flt scl add: ..   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
flt vec add: ..7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
flt scl mul: ..   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
flt vec mul: ..7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
total:   3322.19 ms

This is a normal (I think) opteron box (Opteron 846):

selene:~/bn g  
proc: 4 x x86_64 @ 2004 MHz
ram:  3496 Mb
os:   unx, Linux, 2.6.9-42.0.10.ELsmp
cc:   gcc-4.0.2
vector size   : 8 x 1024 x 1024
allocation: 0.05 ms
int scl add: ..   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
int scl mul: ..   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
flt scl add: ..   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
flt vec add: ..   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
flt scl mul: ..   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
flt vec mul: ..   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
total:   4705.07 ms

And this is my opteron (Opteron 275)

cicely:~/bn g  
proc: 4 x x86_64 @ 2200 MHz
ram:  2914 Mb
os:   unx, Linux, 2.6.23.1-desktop-1mdv
cc:   gcc-4.0.2
vector size   : 8 x 1024 x 1024
allocation: 0.03 ms
int scl add: ..   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
int scl mul: ..   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
flt scl add: ..   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
flt vec add: ..   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
flt scl mul: ..   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
flt vec mul: ..   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
total:   6334.96 ms

As I read in AMD site, the only difference that matters in models is
the xx5 vx xx6, related to fequency, but the processors should be just
the same.

As this only does intensive memory/fp operations, I'm not going to blame
gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
on one of the boxes and results are very similar, the code is really
stupid and not very suitable for compiler smartness...).
I suspect it is a memory problem. It can be hardware or caused by
incorrect BIOS/kernel-mtrr setup:

selene:~ cat /proc/mtrr
reg00: base=0x (   0MB), size=16384MB: write-back, count=1
reg01: base=0xf000 (3840MB), size= 256MB: uncachable, count=1

cicely:~ cat /proc/mtrr
reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size= 512MB: write-back, count=1
reg02: base=0xa000 (2560MB), size= 256MB: write-back, count=1
reg03: base=0xb000 (2816MB), size= 128MB: write-back, count=1
reg04: base=0xb800 (2944MB), size=  16MB: write-back, count=1


Any idea on what can be going on here ? I have asked the 'good opteron'
admin info about the mobo an memory of the box.

Any help will be _very_ appreciated.

TIA

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam01 (gcc 4.2.2 20070909 (4.2.2-0.RC.1mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: Opteron box and 4Gb memory

2007-10-26 Thread Arne Georg Gleditsch
"J.A. Magallón" <[EMAIL PROTECTED]> writes:
> Software Memory Hole
> When "Enabled", allows software memory remapping around the memory
> hole. Options are Enabled and Disabled.
>
> Hardware Memory Hole
> When "Enabled", allows software memory remapping around the memory
> hole. Options are Enabled and Disabled. Note: this is only supported by
> Rev E0 processors and above.
> ( I have two Opteron 275 processors, no idea about revision)

The configuration register used to to reclaim DRAM lost to an MMIO
hole was introduced with revision E of the gen1 Opterons.  (This
feature is supposed to work both in interleaved and non-interleaved
mode.)  What does /proc/cpuinfo say?  (On both?)

(Still, even without this your BIOS should be able to map your memory
so that you are able to use all 4G.  Provided you disable
interleaving, I can't see that there's anything stopping the BIOS from
mapping the memory from node 1 to 0-2G and the memory from node 2 to
4-6G, leaving a 2G hole for MMIO and other junk.  Whether your BIOS
actually supports this is another matter.)

-- 
Arne.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-26 Thread Arne Georg Gleditsch
J.A. Magallón [EMAIL PROTECTED] writes:
 Software Memory Hole
 When Enabled, allows software memory remapping around the memory
 hole. Options are Enabled and Disabled.

 Hardware Memory Hole
 When Enabled, allows software memory remapping around the memory
 hole. Options are Enabled and Disabled. Note: this is only supported by
 Rev E0 processors and above.
 ( I have two Opteron 275 processors, no idea about revision)

The configuration register used to to reclaim DRAM lost to an MMIO
hole was introduced with revision E of the gen1 Opterons.  (This
feature is supposed to work both in interleaved and non-interleaved
mode.)  What does /proc/cpuinfo say?  (On both?)

(Still, even without this your BIOS should be able to map your memory
so that you are able to use all 4G.  Provided you disable
interleaving, I can't see that there's anything stopping the BIOS from
mapping the memory from node 1 to 0-2G and the memory from node 2 to
4-6G, leaving a 2G hole for MMIO and other junk.  Whether your BIOS
actually supports this is another matter.)

-- 
Arne.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread J.A. Magallón
On Thu, 25 Oct 2007 14:58:10 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> J.A. Magallon wrote:
> > Hi...
> > 
> > I have some Quad-Opteron boxes with 4Gb memory and two of them are
> > running two different Linux distros.
> > 
> > Box one sees 4Gb of memory, but box two just sees 3.
> > Their mtrr setups are different:
> > 
> > Why ? Is it a bios setup problem ? A kernel problem ?
> > grep HIGHMEN in configs for both kernels does not give anything, so
> > I still understand less this thing...
> > 
> 
> It would depend on how the BIOS programmed the memory controllers.  For 
> 32-bit (and lots of device) compatibility, a memory hole is required 
> below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
> range above the 4 GB memory; I'm not sure if that varies with the 
> different Opteron processors.

I have collected several pieces of info around the internet...

- Some people uses this options in the BIOS:
Node interleave: off
Bank interleave: auto
SW memory hole: disable
HW memory hole: enable
MTRR: Continuous

- Node Memory Interleaving DISABLES NUMA and generally is a bad thing

- MTRR setting -should be set to "discrete" for Linux, and probably for Windows 
too.

- This is what SuperMicro's tech support said about 2.96GB vs. 4GB.

"This is as expected, as soon as you set "software memory hole" to disabled,
you also disable option ROM remapping functionality, this option normally
remaps used option rom (option rom= raid bios, lan pxe ; usb legacy, bioses
on add-on cards, etc) in the 4GB region, so no basis memory is lost, while
this feature is now disabled the option rom space occupies the space between
3 and 4 GB which results in lower main memory availability.
There is no solution or work around for this phenomenon"

so software memory hole enabled might be needed to get all 4GB to show up

>From mobo manual:

Software Memory Hole
When "Enabled", allows software memory remapping around the memory
hole. Options are Enabled and Disabled.

Hardware Memory Hole
When "Enabled", allows software memory remapping around the memory
hole. Options are Enabled and Disabled. Note: this is only supported by
Rev E0 processors and above.
( I have two Opteron 275 processors, no idea about revision)

So _my_ conclussion is:

Node interleave: off(numa mode)
Bank interleave: auto
SW memory hole: disable |
HW memory hole: enable  | allow remapping
MTRR: Discrete  |

But then, do I need to enable NUMA options in the kernel ?

> 
> Also, if you run a 32-bit distribution, you need to have HIGHMEM_64G 
> enabled in the kernel.
> 

I run a 64 bit one, then I don't need anything, isn't it ? That's why I
don't see any _HIGHMEM in the kernel configs...

Some day I will understand this crappy BIOS thing (or burn a photo of its
inventor...).
Why can't we have OpenFirmware PC's, like my MacPro  and Sparcs ?

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam01 (gcc 4.2.2 20070909 (4.2.2-0.RC.1mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread Rafael J. Wysocki
On Thursday, 25 October 2007 23:58, H. Peter Anvin wrote:
> J.A. Magallon wrote:
> > Hi...
> > 
> > I have some Quad-Opteron boxes with 4Gb memory and two of them are
> > running two different Linux distros.
> > 
> > Box one sees 4Gb of memory, but box two just sees 3.
> > Their mtrr setups are different:
> > 
> > Why ? Is it a bios setup problem ? A kernel problem ?
> > grep HIGHMEN in configs for both kernels does not give anything, so
> > I still understand less this thing...
> > 
> 
> It would depend on how the BIOS programmed the memory controllers.  For 
> 32-bit (and lots of device) compatibility, a memory hole is required 
> below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
> range above the 4 GB memory; I'm not sure if that varies with the 
> different Opteron processors.

It shouldn't, AFAICS.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread H. Peter Anvin

J.A. Magallon wrote:

Hi...

I have some Quad-Opteron boxes with 4Gb memory and two of them are
running two different Linux distros.

Box one sees 4Gb of memory, but box two just sees 3.
Their mtrr setups are different:

Why ? Is it a bios setup problem ? A kernel problem ?
grep HIGHMEN in configs for both kernels does not give anything, so
I still understand less this thing...



It would depend on how the BIOS programmed the memory controllers.  For 
32-bit (and lots of device) compatibility, a memory hole is required 
below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
range above the 4 GB memory; I'm not sure if that varies with the 
different Opteron processors.


Also, if you run a 32-bit distribution, you need to have HIGHMEM_64G 
enabled in the kernel.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread H. Peter Anvin

J.A. Magallon wrote:

Hi...

I have some Quad-Opteron boxes with 4Gb memory and two of them are
running two different Linux distros.

Box one sees 4Gb of memory, but box two just sees 3.
Their mtrr setups are different:

Why ? Is it a bios setup problem ? A kernel problem ?
grep HIGHMEN in configs for both kernels does not give anything, so
I still understand less this thing...



It would depend on how the BIOS programmed the memory controllers.  For 
32-bit (and lots of device) compatibility, a memory hole is required 
below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
range above the 4 GB memory; I'm not sure if that varies with the 
different Opteron processors.


Also, if you run a 32-bit distribution, you need to have HIGHMEM_64G 
enabled in the kernel.


-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread Rafael J. Wysocki
On Thursday, 25 October 2007 23:58, H. Peter Anvin wrote:
 J.A. Magallon wrote:
  Hi...
  
  I have some Quad-Opteron boxes with 4Gb memory and two of them are
  running two different Linux distros.
  
  Box one sees 4Gb of memory, but box two just sees 3.
  Their mtrr setups are different:
  
  Why ? Is it a bios setup problem ? A kernel problem ?
  grep HIGHMEN in configs for both kernels does not give anything, so
  I still understand less this thing...
  
 
 It would depend on how the BIOS programmed the memory controllers.  For 
 32-bit (and lots of device) compatibility, a memory hole is required 
 below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
 range above the 4 GB memory; I'm not sure if that varies with the 
 different Opteron processors.

It shouldn't, AFAICS.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opteron box and 4Gb memory

2007-10-25 Thread J.A. Magallón
On Thu, 25 Oct 2007 14:58:10 -0700, H. Peter Anvin [EMAIL PROTECTED] wrote:

 J.A. Magallon wrote:
  Hi...
  
  I have some Quad-Opteron boxes with 4Gb memory and two of them are
  running two different Linux distros.
  
  Box one sees 4Gb of memory, but box two just sees 3.
  Their mtrr setups are different:
  
  Why ? Is it a bios setup problem ? A kernel problem ?
  grep HIGHMEN in configs for both kernels does not give anything, so
  I still understand less this thing...
  
 
 It would depend on how the BIOS programmed the memory controllers.  For 
 32-bit (and lots of device) compatibility, a memory hole is required 
 below 4 GB.  Not all memory controllers can remap memory in the 3-4 GB 
 range above the 4 GB memory; I'm not sure if that varies with the 
 different Opteron processors.

I have collected several pieces of info around the internet...

- Some people uses this options in the BIOS:
Node interleave: off
Bank interleave: auto
SW memory hole: disable
HW memory hole: enable
MTRR: Continuous

- Node Memory Interleaving DISABLES NUMA and generally is a bad thing

- MTRR setting -should be set to discrete for Linux, and probably for Windows 
too.

- This is what SuperMicro's tech support said about 2.96GB vs. 4GB.

This is as expected, as soon as you set software memory hole to disabled,
you also disable option ROM remapping functionality, this option normally
remaps used option rom (option rom= raid bios, lan pxe ; usb legacy, bioses
on add-on cards, etc) in the 4GB region, so no basis memory is lost, while
this feature is now disabled the option rom space occupies the space between
3 and 4 GB which results in lower main memory availability.
There is no solution or work around for this phenomenon

so software memory hole enabled might be needed to get all 4GB to show up

From mobo manual:

Software Memory Hole
When Enabled, allows software memory remapping around the memory
hole. Options are Enabled and Disabled.

Hardware Memory Hole
When Enabled, allows software memory remapping around the memory
hole. Options are Enabled and Disabled. Note: this is only supported by
Rev E0 processors and above.
( I have two Opteron 275 processors, no idea about revision)

So _my_ conclussion is:

Node interleave: off(numa mode)
Bank interleave: auto
SW memory hole: disable |
HW memory hole: enable  | allow remapping
MTRR: Discrete  |

But then, do I need to enable NUMA options in the kernel ?

 
 Also, if you run a 32-bit distribution, you need to have HIGHMEM_64G 
 enabled in the kernel.
 

I run a 64 bit one, then I don't need anything, isn't it ? That's why I
don't see any _HIGHMEM in the kernel configs...

Some day I will understand this crappy BIOS thing (or burn a photo of its
inventor...).
Why can't we have OpenFirmware PC's, like my MacPro  and Sparcs ?

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam01 (gcc 4.2.2 20070909 (4.2.2-0.RC.1mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/