More data on 2.4.5 VM issues

2001-06-01 Thread Michael Merhej

This is 2.4.5 with Andrea Arcangeli's aa1 patch compiled with himem:

Why is kswapd using so much CPU?  If you reboot the machine and run the
same user process kswapd CPU usage is almost 0% and none of the swap is
used.  This machine was upgraded from 2.2 and we did not have the luxury of
re-partitioning it support the "new" 2.4 swap size requirements.

After running for a few days with relatively constant memory usage:
vmstat:
  procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us
sy  id
 2  0  1 136512   5408504 209744   0   0 0 2   1949  10
26  64



top:

  5:38pm  up 3 days, 19:44,  2 users,  load average: 2.08, 2.13, 2.15
34 processes: 32 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 16.0% user, 56.4% system, 16.2% nice, 26.3% idle
CPU1 states: 11.1% user, 57.0% system, 11.0% nice, 31.3% idle
Mem:  1028804K av, 1023744K used,5060K free,   0K shrd, 504K
buff
Swap:  136512K av,  136512K used,   0K free  209876K
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
28442 root  18  10  898M 812M 36188 R N  56.0 80.9 296:12
gateway.smart.5
28438 root  16  10  898M 811M 35084 S N  43.7 80.7 291:03
gateway.smart.5
5 root   9   0 00 0 SW   37.6  0.0 164:58 kswapd
 2509 root  18   0   492  492   300 R 2.5  0.0   0:00 top
1 root   9   0680 0 SW0.0  0.0   0:08 init
2 root   9   0 00 0 SW0.0  0.0   0:00 keventd
3 root  19  19 00 0 SWN   0.0  0.0   1:11
ksoftirqd_CPU0
4 root  19  19 00 0 SWN   0.0  0.0   1:04
ksoftirqd_CPU1
6 root   9   0 00 0 SW0.0  0.0   0:00 kreclaimd
7 root   9   0 00 0 SW0.0  0.0   0:00 bdflush
8 root   9   0 00 0 SW0.0  0.0   0:07 kupdated
   11 root   9   0 00 0 SW0.0  0.0   0:00 scsi_eh_0
  315 root   9   0   1000 0 SW0.0  0.0   0:00 syslogd


Hope this helps


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



More data on 2.4.5 VM issues

2001-06-01 Thread Michael Merhej

This is 2.4.5 with Andrea Arcangeli's aa1 patch compiled with himem:

Why is kswapd using so much CPU?  If you reboot the machine and run the
same user process kswapd CPU usage is almost 0% and none of the swap is
used.  This machine was upgraded from 2.2 and we did not have the luxury of
re-partitioning it support the new 2.4 swap size requirements.

After running for a few days with relatively constant memory usage:
vmstat:
  procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us
sy  id
 2  0  1 136512   5408504 209744   0   0 0 2   1949  10
26  64



top:

  5:38pm  up 3 days, 19:44,  2 users,  load average: 2.08, 2.13, 2.15
34 processes: 32 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 16.0% user, 56.4% system, 16.2% nice, 26.3% idle
CPU1 states: 11.1% user, 57.0% system, 11.0% nice, 31.3% idle
Mem:  1028804K av, 1023744K used,5060K free,   0K shrd, 504K
buff
Swap:  136512K av,  136512K used,   0K free  209876K
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
28442 root  18  10  898M 812M 36188 R N  56.0 80.9 296:12
gateway.smart.5
28438 root  16  10  898M 811M 35084 S N  43.7 80.7 291:03
gateway.smart.5
5 root   9   0 00 0 SW   37.6  0.0 164:58 kswapd
 2509 root  18   0   492  492   300 R 2.5  0.0   0:00 top
1 root   9   0680 0 SW0.0  0.0   0:08 init
2 root   9   0 00 0 SW0.0  0.0   0:00 keventd
3 root  19  19 00 0 SWN   0.0  0.0   1:11
ksoftirqd_CPU0
4 root  19  19 00 0 SWN   0.0  0.0   1:04
ksoftirqd_CPU1
6 root   9   0 00 0 SW0.0  0.0   0:00 kreclaimd
7 root   9   0 00 0 SW0.0  0.0   0:00 bdflush
8 root   9   0 00 0 SW0.0  0.0   0:07 kupdated
   11 root   9   0 00 0 SW0.0  0.0   0:00 scsi_eh_0
  315 root   9   0   1000 0 SW0.0  0.0   0:00 syslogd


Hope this helps


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.1ac3 vs 2.4.1ac8

2001-02-09 Thread Michael Merhej

Basic Machine configuration:

SMP Supermicro board
2 gigabytes of ECC Registered ram
Adaptec AIC-7892
eepro100 onboard nic

The machine has been running as a database server with no MySQL crashes
for several months and has run fine with kernels 2.2.18 and 2.4.1ac3.

We have seen a HUGE improvement in the processing power and file
access from kernel 2.4.1ac3 to 2.4.1ac8, but MySQL crashes every few hours
with the following error on 2.4.1ac8:

mysqld version: 3.23.32

mysqld got signal 11;
The manual section 'Debugging a MySQL server' tells you how to use a
stack trace and/or the core file to produce a readable backtrace that may
help in finding out why mysqld died
Attemping backtrace. You can use the following information to find out
where mysqld died.  If you see no messages after this, something went
terribly wrong
Bogus stack limit or frame pointer, aborting backtrace


With 2.4.1.ac8 syslog has been spitting out the following errors:

Feb  8 23:12:38 db1 kernel: __alloc_pages: 0-order allocation failed.
Feb  8 23:34:54 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:54 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1de0
Feb  8 23:34:55 db1 kernel: __alloc_pages: 2-order allocation failed.

Feb  8 23:34:55 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1ee0
Feb  8 23:34:56 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:56 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1160
Feb  8 23:34:59 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:59 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c11a0
Feb  8 23:35:05 db1 kernel: nfs: server toastem not responding, still
trying
Feb  8 23:35:05 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:35:05 db1 kernel: IP: queue_glue: no memory for gluing queue
c322e520
Feb  8 23:35:06 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:35:06 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c11a0
Feb  8 23:36:04 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:04 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60
Feb  8 23:36:05 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:05 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60
Feb  8 23:36:06 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:06 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60

Feb  9 00:00:13 db1 kernel: __alloc_pages: 1-order allocation failed.
Feb  9 00:00:21 db1 last message repeated 269 times

Feb  9 00:15:13 db1 kernel: __alloc_pages: 1-order allocation failed.
Feb  9 00:15:19 db1 last message repeated 114 times

etc



We would love to stay with kernel 2.4.1ac8 because of the huge speed
increase.

Queries / Sec on this machine are from about 300 - 1700

If you need more information please email me.



Thanks



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.1ac3 vs 2.4.1ac8

2001-02-09 Thread Michael Merhej

Basic Machine configuration:

SMP Supermicro board
2 gigabytes of ECC Registered ram
Adaptec AIC-7892
eepro100 onboard nic

The machine has been running as a database server with no MySQL crashes
for several months and has run fine with kernels 2.2.18 and 2.4.1ac3.

We have seen a HUGE improvement in the processing power and file
access from kernel 2.4.1ac3 to 2.4.1ac8, but MySQL crashes every few hours
with the following error on 2.4.1ac8:

mysqld version: 3.23.32

mysqld got signal 11;
The manual section 'Debugging a MySQL server' tells you how to use a
stack trace and/or the core file to produce a readable backtrace that may
help in finding out why mysqld died
Attemping backtrace. You can use the following information to find out
where mysqld died.  If you see no messages after this, something went
terribly wrong
Bogus stack limit or frame pointer, aborting backtrace


With 2.4.1.ac8 syslog has been spitting out the following errors:

Feb  8 23:12:38 db1 kernel: __alloc_pages: 0-order allocation failed.
Feb  8 23:34:54 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:54 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1de0
Feb  8 23:34:55 db1 kernel: __alloc_pages: 2-order allocation failed.

Feb  8 23:34:55 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1ee0
Feb  8 23:34:56 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:56 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c1160
Feb  8 23:34:59 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:34:59 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c11a0
Feb  8 23:35:05 db1 kernel: nfs: server toastem not responding, still
trying
Feb  8 23:35:05 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:35:05 db1 kernel: IP: queue_glue: no memory for gluing queue
c322e520
Feb  8 23:35:06 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:35:06 db1 kernel: IP: queue_glue: no memory for gluing queue
ef1c11a0
Feb  8 23:36:04 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:04 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60
Feb  8 23:36:05 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:05 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60
Feb  8 23:36:06 db1 kernel: __alloc_pages: 2-order allocation failed.
Feb  8 23:36:06 db1 kernel: IP: queue_glue: no memory for gluing queue
c322ea60

Feb  9 00:00:13 db1 kernel: __alloc_pages: 1-order allocation failed.
Feb  9 00:00:21 db1 last message repeated 269 times

Feb  9 00:15:13 db1 kernel: __alloc_pages: 1-order allocation failed.
Feb  9 00:15:19 db1 last message repeated 114 times

etc



We would love to stay with kernel 2.4.1ac8 because of the huge speed
increase.

Queries / Sec on this machine are from about 300 - 1700

If you need more information please email me.



Thanks



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



dst cache overflow

2000-10-03 Thread Michael Merhej

Hello,
Recently we have been experiencing some problems with the network dying
temporarily on a machine then magically coming back to life.  This appears
to happen more frequently when the machines are loaded down CPU wise and
usually sustain over 3Mbits/sec of network traffic.  This is happening on
several machines with similar configurations.  Each machine has about 2000
active tcp connections on them and CPU usage is typically over 75%.  They
have AMD 800-950 processors and 3Com 905C cards running Redhat 6.2 with
kernel build ranges from 2.2.17p3 to 2.2.18p10.

When this happens the below logs appear in the system logger:


Oct  3 12:14:38 onion kernel: dst cache overflow 
Oct  3 12:14:38 onion last message repeated 9 times
Oct  3 12:14:43 onion kernel: NET: 486 messages suppressed. 
Oct  3 12:14:43 onion kernel: dst cache overflow 
Oct  3 12:14:48 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:49 onion kernel: NET: 367 messages suppressed. 
Oct  3 12:14:49 onion kernel: dst cache overflow 
Oct  3 12:14:51 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:53 onion kernel: NET: 192 messages suppressed. 
Oct  3 12:14:53 onion kernel: dst cache overflow 
Oct  3 12:14:55 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:59 onion kernel: NET: 122 messages suppressed. 
Oct  3 12:14:59 onion kernel: dst cache overflow 
Oct  3 12:15:01 onion kernel: nfs: server toastem not responding, still
trying 
Oct  3 12:15:01 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:03 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:04 onion kernel: NET: 52 messages suppressed. 
Oct  3 12:15:05 onion kernel: dst cache overflow 
Oct  3 12:15:08 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:11 onion kernel: nfs: server toastem OK 


Oct  3 09:23:58 mint kernel: NET: 271 messages suppressed. 
Oct  3 09:23:58 mint kernel: dst cache overflow 
Oct  3 09:23:58 mint last message repeated 9 times
Oct  3 09:24:07 mint kernel: NET: 384 messages suppressed. 
Oct  3 09:24:07 mint kernel: dst cache overflow 
Oct  3 09:24:07 mint kernel: NET: 255 messages suppressed. 
Oct  3 09:24:07 mint kernel: dst cache overflow 
Oct  3 09:24:12 mint kernel: NET: 149 messages suppressed. 
Oct  3 09:24:12 mint kernel: dst cache overflow 
Oct  3 09:24:18 mint kernel: NET: 64 messages suppressed. 
Oct  3 09:24:18 mint kernel: dst cache overflow 
Oct  3 09:24:23 mint kernel: NET: 35 messages suppressed. 
Oct  3 09:24:23 mint kernel: dst cache overflow 
Oct  3 09:24:27 mint kernel: NET: 23 messages suppressed. 
.
.


Hope this helps

Thanks

--Michael


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



dst cache overflow

2000-10-03 Thread Michael Merhej

Hello,
Recently we have been experiencing some problems with the network dying
temporarily on a machine then magically coming back to life.  This appears
to happen more frequently when the machines are loaded down CPU wise and
usually sustain over 3Mbits/sec of network traffic.  This is happening on
several machines with similar configurations.  Each machine has about 2000
active tcp connections on them and CPU usage is typically over 75%.  They
have AMD 800-950 processors and 3Com 905C cards running Redhat 6.2 with
kernel build ranges from 2.2.17p3 to 2.2.18p10.

When this happens the below logs appear in the system logger:


Oct  3 12:14:38 onion kernel: dst cache overflow 
Oct  3 12:14:38 onion last message repeated 9 times
Oct  3 12:14:43 onion kernel: NET: 486 messages suppressed. 
Oct  3 12:14:43 onion kernel: dst cache overflow 
Oct  3 12:14:48 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:49 onion kernel: NET: 367 messages suppressed. 
Oct  3 12:14:49 onion kernel: dst cache overflow 
Oct  3 12:14:51 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:53 onion kernel: NET: 192 messages suppressed. 
Oct  3 12:14:53 onion kernel: dst cache overflow 
Oct  3 12:14:55 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:14:59 onion kernel: NET: 122 messages suppressed. 
Oct  3 12:14:59 onion kernel: dst cache overflow 
Oct  3 12:15:01 onion kernel: nfs: server toastem not responding, still
trying 
Oct  3 12:15:01 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:03 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:04 onion kernel: NET: 52 messages suppressed. 
Oct  3 12:15:05 onion kernel: dst cache overflow 
Oct  3 12:15:08 onion kernel: RPC: sendmsg returned error 105 
Oct  3 12:15:11 onion kernel: nfs: server toastem OK 


Oct  3 09:23:58 mint kernel: NET: 271 messages suppressed. 
Oct  3 09:23:58 mint kernel: dst cache overflow 
Oct  3 09:23:58 mint last message repeated 9 times
Oct  3 09:24:07 mint kernel: NET: 384 messages suppressed. 
Oct  3 09:24:07 mint kernel: dst cache overflow 
Oct  3 09:24:07 mint kernel: NET: 255 messages suppressed. 
Oct  3 09:24:07 mint kernel: dst cache overflow 
Oct  3 09:24:12 mint kernel: NET: 149 messages suppressed. 
Oct  3 09:24:12 mint kernel: dst cache overflow 
Oct  3 09:24:18 mint kernel: NET: 64 messages suppressed. 
Oct  3 09:24:18 mint kernel: dst cache overflow 
Oct  3 09:24:23 mint kernel: NET: 35 messages suppressed. 
Oct  3 09:24:23 mint kernel: dst cache overflow 
Oct  3 09:24:27 mint kernel: NET: 23 messages suppressed. 
.
.


Hope this helps

Thanks

--Michael


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/