Re: 4G SGI quad Xeon -memory-related slowdowns

2001-01-17 Thread Paul Hubbard


> Sounds like a true-to-God bug. Possibly in the form of incorrect MTRR
> settings. Make sure you enable MTRR support.

MTRR is enabled - here's the dump from /proc/mtrr:

reg00: base=0xc000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1
reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1

Note that the sizes sum to six gigabytes, and we only have four in the
box. 

We've discovered that if we turn off PCI hotplug support and resource
remapping in the BIOS, then /proc/mtrr looks more sensible:

reg00: base=0xfc00 (4032MB), size=  64MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1

However, the 64G-compiled kernel still hangs. 

The complete task dump is at http://quirk.fnal.gov/xeon/ 
It looks like everything is blocked on lock_get_status.

> I do need more information on what seems to hang, and how it hangs. One
> of the pre-kernels will give you a nice stack backtrace for each process
> if you press control-scrolllock, and that might be useful.

Sysreq dump, plus meminfo and MTRR are at the above URL. Please let me
know if you need any other information.

FYI, with the revised BIOS settings, even the 4G-compiled kernel sees
and uses the full 4G. So if this problem turns out to take time to fix,
we can still get full use from the machine in the interim. 

Many thanks for the help!

-Paul

-- 
Paul Hubbard  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 4G SGI quad Xeon -memory-related slowdowns

2001-01-17 Thread Paul Hubbard


 Sounds like a true-to-God bug. Possibly in the form of incorrect MTRR
 settings. Make sure you enable MTRR support.

MTRR is enabled - here's the dump from /proc/mtrr:

reg00: base=0xc000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1
reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1

Note that the sizes sum to six gigabytes, and we only have four in the
box. 

We've discovered that if we turn off PCI hotplug support and resource
remapping in the BIOS, then /proc/mtrr looks more sensible:

reg00: base=0xfc00 (4032MB), size=  64MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1

However, the 64G-compiled kernel still hangs. 

The complete task dump is at http://quirk.fnal.gov/xeon/ 
It looks like everything is blocked on lock_get_status.

 I do need more information on what seems to hang, and how it hangs. One
 of the pre-kernels will give you a nice stack backtrace for each process
 if you press control-scrolllock, and that might be useful.

Sysreq dump, plus meminfo and MTRR are at the above URL. Please let me
know if you need any other information.

FYI, with the revised BIOS settings, even the 4G-compiled kernel sees
and uses the full 4G. So if this problem turns out to take time to fix,
we can still get full use from the machine in the interim. 

Many thanks for the help!

-Paul

-- 
Paul Hubbard  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-16 Thread Mark Hemment

Hi Paul,
 
> 2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
> run very slowly

What do you notice when running "top" and doing the above?
Does the "buff" value grow high (+700MB), with high CPU usage?

  If so, I think this might be down to nr_free_buffer_pages().

  This function includes the pages in all zones (including the HIGHMEM
zone) in its calculations, while only DMA and NORMAL zone pages are used
for buffers.  This upsets the result from 
balance_dirty_state() (fs/buffer.c), and as a result the required flushing
of buffers is only done as a result of running v low of pages in the DMA
and NORMAL zones.

  I've attached a "quick hack" I did for 2.4.0.  It doesn't completely
solve the problem, but moves it in the right direction.

  Please let me know if this helps.

Mark


diff -urN -X dontdiff linux-2.4.0/mm/page_alloc.c markhe-2.4.0/mm/page_alloc.c
--- linux-2.4.0/mm/page_alloc.c Wed Jan  3 17:59:06 2001
+++ markhe-2.4.0/mm/page_alloc.cMon Jan 15 15:35:14 2001
@@ -583,6 +583,27 @@
 }
 
 /*
+ * Free pages in zone "type", and the zones below it.
+ */
+unsigned int nr_free_pages_zone (int type)
+{
+   unsigned int sum;
+   zone_t *zone;
+   pg_data_t *pgdat = pgdat_list;
+
+   if (type >= MAX_NR_ZONES)
+   BUG();
+
+   sum = 0;
+   while (pgdat) {
+   for (zone = pgdat->node_zones; zone < pgdat->node_zones + type; 
+zone++)
+   sum += zone->free_pages;
+   pgdat = pgdat->node_next;
+   }
+   return sum;
+}
+
+/*
  * Total amount of inactive_clean (allocatable) RAM:
  */
 unsigned int nr_inactive_clean_pages (void)
@@ -600,6 +621,25 @@
return sum;
 }
 
+unsigned int nr_inactive_clean_pages_zone(int type)
+{
+   unsigned int sum;
+   zone_t *zone;
+   pg_data_t *pgdat = pgdat_list;
+
+   if (type >= MAX_NR_ZONES)
+   BUG();
+   type++;
+
+   sum = 0;
+   while (pgdat) {
+   for (zone = pgdat->node_zones; zone < pgdat->node_zones + type; 
+zone++)
+   sum += zone->inactive_clean_pages;
+   pgdat = pgdat->node_next;
+   }
+   return sum;
+}
+
 /*
  * Amount of free RAM allocatable as buffer memory:
  */
@@ -607,9 +647,9 @@
 {
unsigned int sum;
 
-   sum = nr_free_pages();
-   sum += nr_inactive_clean_pages();
-   sum += nr_inactive_dirty_pages;
+   sum = nr_free_pages_zone(ZONE_NORMAL);
+   sum += nr_inactive_clean_pages_zone(ZONE_NORMAL);
+   sum += nr_inactive_dirty_pages; /* XXX */
 
/*
 * Keep our write behind queue filled, even if



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-16 Thread Mark Hemment

Hi Paul,
 
 2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
 run very slowly

What do you notice when running "top" and doing the above?
Does the "buff" value grow high (+700MB), with high CPU usage?

  If so, I think this might be down to nr_free_buffer_pages().

  This function includes the pages in all zones (including the HIGHMEM
zone) in its calculations, while only DMA and NORMAL zone pages are used
for buffers.  This upsets the result from 
balance_dirty_state() (fs/buffer.c), and as a result the required flushing
of buffers is only done as a result of running v low of pages in the DMA
and NORMAL zones.

  I've attached a "quick hack" I did for 2.4.0.  It doesn't completely
solve the problem, but moves it in the right direction.

  Please let me know if this helps.

Mark


diff -urN -X dontdiff linux-2.4.0/mm/page_alloc.c markhe-2.4.0/mm/page_alloc.c
--- linux-2.4.0/mm/page_alloc.c Wed Jan  3 17:59:06 2001
+++ markhe-2.4.0/mm/page_alloc.cMon Jan 15 15:35:14 2001
@@ -583,6 +583,27 @@
 }
 
 /*
+ * Free pages in zone "type", and the zones below it.
+ */
+unsigned int nr_free_pages_zone (int type)
+{
+   unsigned int sum;
+   zone_t *zone;
+   pg_data_t *pgdat = pgdat_list;
+
+   if (type = MAX_NR_ZONES)
+   BUG();
+
+   sum = 0;
+   while (pgdat) {
+   for (zone = pgdat-node_zones; zone  pgdat-node_zones + type; 
+zone++)
+   sum += zone-free_pages;
+   pgdat = pgdat-node_next;
+   }
+   return sum;
+}
+
+/*
  * Total amount of inactive_clean (allocatable) RAM:
  */
 unsigned int nr_inactive_clean_pages (void)
@@ -600,6 +621,25 @@
return sum;
 }
 
+unsigned int nr_inactive_clean_pages_zone(int type)
+{
+   unsigned int sum;
+   zone_t *zone;
+   pg_data_t *pgdat = pgdat_list;
+
+   if (type = MAX_NR_ZONES)
+   BUG();
+   type++;
+
+   sum = 0;
+   while (pgdat) {
+   for (zone = pgdat-node_zones; zone  pgdat-node_zones + type; 
+zone++)
+   sum += zone-inactive_clean_pages;
+   pgdat = pgdat-node_next;
+   }
+   return sum;
+}
+
 /*
  * Amount of free RAM allocatable as buffer memory:
  */
@@ -607,9 +647,9 @@
 {
unsigned int sum;
 
-   sum = nr_free_pages();
-   sum += nr_inactive_clean_pages();
-   sum += nr_inactive_dirty_pages;
+   sum = nr_free_pages_zone(ZONE_NORMAL);
+   sum += nr_inactive_clean_pages_zone(ZONE_NORMAL);
+   sum += nr_inactive_dirty_pages; /* XXX */
 
/*
 * Keep our write behind queue filled, even if



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Ingo Molnar


On 15 Jan 2001, Linus Torvalds wrote:

> The performance problem is _probably_ due to the kernel having to
> double-buffer the IO requests, coupled with bad MTRR settings (ie
> memory above the 4GB range is probably marked as non-cacheable or
> something, which means that you'll get really bad performance).

the highmem related double-buffering alone on such a category of system is
miniscule, compared to other costs of IO, and considering the expected
bandwidth (20-30 MB/sec).

the MTRR part could be a problem.

> Not using the high memory will avoid the double-buffering, and will
> also avoid using memory that isn't cached. If I'm right.

> The hang still indicates that something is wrong in PAE-land, though.

it's working just fine on all 4GB+ systems tested (including 32GB
systems), Intel, Dell, Compaq boxes. So if it's a unique PAE bug, then it
must be some boundary condition.

Paul, here is the memory map of my 8GB system:

BIOS-provided physical RAM map:
 BIOS-e820: 0009d400 @  (usable)
 BIOS-e820: 2c00 @ 0009d400 (reserved)
 BIOS-e820: 0002 @ 000e (reserved)
 BIOS-e820: 03ef8000 @ 0010 (usable)
 BIOS-e820: 7c00 @ 03ff8000 (ACPI data)
 BIOS-e820: 0400 @ 03fffc00 (ACPI NVS)
 BIOS-e820: ec00 @ 0400 (usable)
 BIOS-e820: 0140 @ fec0 (reserved)
 BIOS-e820: f000 @ 0001 (usable)

and here are the MTRR settings:

[root@m mingo]# cat /proc/mtrr
reg00: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1
reg02: base=0x1 (4096MB), size=2048MB: write-back, count=1
reg03: base=0x18000 (6144MB), size=1024MB: write-back, count=1
reg04: base=0x1c000 (7168MB), size= 512MB: write-back, count=1
reg05: base=0x1e000 (7680MB), size= 256MB: write-back, count=1

i'd suggest using the mem=exact feature to force different type of memory
maps. Eg. i'm using the following append= line to force a 800 MB setup:

append="mem=exactmap mem=0x0009d800@0x mem=0x03ef8000@0x0010 
mem=0x2bffe000@0x0400"

such mem=exactmap lines can be constructed based on the BIOS output.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Paul Hubbard  <[EMAIL PROTECTED]> wrote:
>
>We're having some problems with the 2.4.0 kernel on our SGI 1450, and
>were hoping for some help.
> The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
>chipset, RH6.1 (slightly modified for local configuration) distribution.
>
>a) If we compile the kernel with no high memory support, /proc/meminfo
>shows 1G of memory and everything works fine.

Good.

>b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
>overriding the amount at the lilo prompt causes kernel panics at bootup.
>However, other than missing a quarter of the memory, it works just
>fine.

3GB is right - your last 1GB is above the 4GB mark, and it's mapped
there explicitly so that you'll have space in the low 32 bits to map PCI
devices etc (and things like the APIC, you get the idea). 

If you try to override it, you will very obviously crash, because if you
tell Linux that you have 4GB of memory, Linux will think that you have
4GB of _contiguous_ memory, which is not true.  The only way to use that
last gigabyte is to enable support for memory > 4GB, and get the proper
memory map _without_ any overrides that shows the proper holes for PCI
space. 

Check your "dmesg" output under a working kernel for details - you'll
see how the memory is laid out and reported by the e820 call..

>c) If we compile the kernel for 64G high memory (PAE mode), we see all
>of the memory but have other problems:
>  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
>5MB/sec instead of 22-25) and the machine hangs after the format
>completes. To be exact, the command prompt returns, but
> ls or any other command will never return, and you have to reset
>the box. This is a 
> showstopper for us!

Sounds like a true-to-God bug. Possibly in the form of incorrect MTRR
settings. Make sure you enable MTRR support.

I do need more information on what seems to hang, and how it hangs. One
of the pre-kernels will give you a nice stack backtrace for each process
if you press control-scrolllock, and that might be useful.

>  ii) If I override the amount of memory via lilo, we still get the
>   hang, but performance actually improves!

The performance problem is _probably_ due to the kernel having to
double-buffer the IO requests, coupled with bad MTRR settings (ie memory
above the 4GB range is probably marked as non-cacheable or something,
which means that you'll get really bad performance). 

Not using the high memory will avoid the double-buffering, and will also
avoid using memory that isn't cached. If I'm right.

The hang still indicates that something is wrong in PAE-land, though.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Paul Hubbard


We're having some problems with the 2.4.0 kernel on our SGI 1450, and
were hoping for some help.
 The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
chipset, RH6.1 (slightly modified for local configuration) distribution.

a) If we compile the kernel with no high memory support, /proc/meminfo
shows 1G of memory and everything works fine.

b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
overriding the amount at the lilo prompt causes kernel panics at bootup.
However, other than missing a quarter of the memory, it works just fine.

c) If we compile the kernel for 64G high memory (PAE mode), we see all
of the memory but have other problems:
  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
5MB/sec instead of 22-25) and the machine hangs after the format
completes. To be exact, the command prompt returns, but
 ls or any other command will never return, and you have to reset
the box. This is a 
 showstopper for us!

  ii) If I override the amount of memory via lilo, we still get the
hang, but performance 
 actually improves! At 1G, it's slow for a few seconds, and then
runs fine. At 2G, it's 
 slow, and when I tried to boot 3G I got an odd startup crash that
I've not had time to
 replicate.

Other notes: 

1) SCSI is onboard Adaptec 39160 (aic7xxx driver, dual-channel) and
we've tried different drives, cables, terminators, etc. 

2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
run very slowly
3) We are using vmstat 1 to monitor data rates
4) I tried the format with 2.4 prerelease, and the mkfs was very slow,
and I got a SCSI reset at the end of the format. Perhaps this is
related?
5) If necessary, we can easily load a different distribution on the
machine if that might be part of the problem.

If necessary, we can setup a login on the machine, or run whatever test
code is necessary. Other than this, it's a pretty nice box to work on.

Please reply to rjetton and phubbard at fnal.gov, thanks.

-Paul

-- 
Paul Hubbard  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Paul Hubbard


We're having some problems with the 2.4.0 kernel on our SGI 1450, and
were hoping for some help.
 The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
chipset, RH6.1 (slightly modified for local configuration) distribution.

a) If we compile the kernel with no high memory support, /proc/meminfo
shows 1G of memory and everything works fine.

b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
overriding the amount at the lilo prompt causes kernel panics at bootup.
However, other than missing a quarter of the memory, it works just fine.

c) If we compile the kernel for 64G high memory (PAE mode), we see all
of the memory but have other problems:
  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
5MB/sec instead of 22-25) and the machine hangs after the format
completes. To be exact, the command prompt returns, but
 ls or any other command will never return, and you have to reset
the box. This is a 
 showstopper for us!

  ii) If I override the amount of memory via lilo, we still get the
hang, but performance 
 actually improves! At 1G, it's slow for a few seconds, and then
runs fine. At 2G, it's 
 slow, and when I tried to boot 3G I got an odd startup crash that
I've not had time to
 replicate.

Other notes: 

1) SCSI is onboard Adaptec 39160 (aic7xxx driver, dual-channel) and
we've tried different drives, cables, terminators, etc. 

2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
run very slowly
3) We are using vmstat 1 to monitor data rates
4) I tried the format with 2.4 prerelease, and the mkfs was very slow,
and I got a SCSI reset at the end of the format. Perhaps this is
related?
5) If necessary, we can easily load a different distribution on the
machine if that might be part of the problem.

If necessary, we can setup a login on the machine, or run whatever test
code is necessary. Other than this, it's a pretty nice box to work on.

Please reply to rjetton and phubbard at fnal.gov, thanks.

-Paul

-- 
Paul Hubbard  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Paul Hubbard  [EMAIL PROTECTED] wrote:

We're having some problems with the 2.4.0 kernel on our SGI 1450, and
were hoping for some help.
 The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
chipset, RH6.1 (slightly modified for local configuration) distribution.

a) If we compile the kernel with no high memory support, /proc/meminfo
shows 1G of memory and everything works fine.

Good.

b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
overriding the amount at the lilo prompt causes kernel panics at bootup.
However, other than missing a quarter of the memory, it works just
fine.

3GB is right - your last 1GB is above the 4GB mark, and it's mapped
there explicitly so that you'll have space in the low 32 bits to map PCI
devices etc (and things like the APIC, you get the idea). 

If you try to override it, you will very obviously crash, because if you
tell Linux that you have 4GB of memory, Linux will think that you have
4GB of _contiguous_ memory, which is not true.  The only way to use that
last gigabyte is to enable support for memory  4GB, and get the proper
memory map _without_ any overrides that shows the proper holes for PCI
space. 

Check your "dmesg" output under a working kernel for details - you'll
see how the memory is laid out and reported by the e820 call..

c) If we compile the kernel for 64G high memory (PAE mode), we see all
of the memory but have other problems:
  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
5MB/sec instead of 22-25) and the machine hangs after the format
completes. To be exact, the command prompt returns, but
 ls or any other command will never return, and you have to reset
the box. This is a 
 showstopper for us!

Sounds like a true-to-God bug. Possibly in the form of incorrect MTRR
settings. Make sure you enable MTRR support.

I do need more information on what seems to hang, and how it hangs. One
of the pre-kernels will give you a nice stack backtrace for each process
if you press control-scrolllock, and that might be useful.

  ii) If I override the amount of memory via lilo, we still get the
   hang, but performance actually improves!

The performance problem is _probably_ due to the kernel having to
double-buffer the IO requests, coupled with bad MTRR settings (ie memory
above the 4GB range is probably marked as non-cacheable or something,
which means that you'll get really bad performance). 

Not using the high memory will avoid the double-buffering, and will also
avoid using memory that isn't cached. If I'm right.

The hang still indicates that something is wrong in PAE-land, though.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 4G SGI quad Xeon - memory-related slowdowns

2001-01-15 Thread Ingo Molnar


On 15 Jan 2001, Linus Torvalds wrote:

 The performance problem is _probably_ due to the kernel having to
 double-buffer the IO requests, coupled with bad MTRR settings (ie
 memory above the 4GB range is probably marked as non-cacheable or
 something, which means that you'll get really bad performance).

the highmem related double-buffering alone on such a category of system is
miniscule, compared to other costs of IO, and considering the expected
bandwidth (20-30 MB/sec).

the MTRR part could be a problem.

 Not using the high memory will avoid the double-buffering, and will
 also avoid using memory that isn't cached. If I'm right.

 The hang still indicates that something is wrong in PAE-land, though.

it's working just fine on all 4GB+ systems tested (including 32GB
systems), Intel, Dell, Compaq boxes. So if it's a unique PAE bug, then it
must be some boundary condition.

Paul, here is the memory map of my 8GB system:

BIOS-provided physical RAM map:
 BIOS-e820: 0009d400 @  (usable)
 BIOS-e820: 2c00 @ 0009d400 (reserved)
 BIOS-e820: 0002 @ 000e (reserved)
 BIOS-e820: 03ef8000 @ 0010 (usable)
 BIOS-e820: 7c00 @ 03ff8000 (ACPI data)
 BIOS-e820: 0400 @ 03fffc00 (ACPI NVS)
 BIOS-e820: ec00 @ 0400 (usable)
 BIOS-e820: 0140 @ fec0 (reserved)
 BIOS-e820: f000 @ 0001 (usable)

and here are the MTRR settings:

[root@m mingo]# cat /proc/mtrr
reg00: base=0xf000 (3840MB), size= 256MB: uncachable, count=1
reg01: base=0x (   0MB), size=4096MB: write-back, count=1
reg02: base=0x1 (4096MB), size=2048MB: write-back, count=1
reg03: base=0x18000 (6144MB), size=1024MB: write-back, count=1
reg04: base=0x1c000 (7168MB), size= 512MB: write-back, count=1
reg05: base=0x1e000 (7680MB), size= 256MB: write-back, count=1

i'd suggest using the mem=exact feature to force different type of memory
maps. Eg. i'm using the following append= line to force a 800 MB setup:

append="mem=exactmap mem=0x0009d800@0x mem=0x03ef8000@0x0010 
mem=0x2bffe000@0x0400"

such mem=exactmap lines can be constructed based on the BIOS output.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/