Re: Processing time spent in IRQ handling and what to do about it
On Wed, 2007-12-19 at 09:58 +0200, Dotan Shavit wrote: On Tuesday 18 December 2007, Oded Arbel wrote: I can see that a lot of time is spent in the hard-IRQ region - sometimes more then all other regions together. Lets look for more hints... - Anything interesting in the logs (during boot and after) ? - Lets plug out all the hardware you can: network , USB, disks... - rmmod all the modules you can. - Boot with a different kernel version. - Nothing yet? Lets play with the BIOS... The logs do not show anything that I don't understand or that I can relate to this problem, and none of the other options are possible as this is a production machine. On a duplicate machine that runs mysql replicated from the first, and doesn't have any load, stopping the mysqld caused the load to fall to almost 0. There were very few hardware interrupts after that (as evident from /proc/interrupts) but there isn't any load so I don't know. -- Oded = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Wed, 2007-12-19 at 10:34 +0200, Aviv Greenberg wrote: Can you send an output of cat /proc/interrupts ? Is there any device sharing the IRQ line with the network interface? On Tue, 2007-12-18 at 22:14 +0200, Oron Peled wrote: 6. Why guess? watch -n10 -d cat /proc/interrupts /proc/interrupts looks like this: CPU0 CPU1 CPU2 CPU3 0: 2818676796 3045096095 2597715597 3039460137 IO-APIC-edge timer 1: 0 2 0 0 IO-APIC-edge i8042 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 1 1 2 IO-APIC-edge i8042 14:6144547 861135937042 85048 IO-APIC-edge libata 15: 0 0 0 0 IO-APIC-edge libata 16: 1 0 0 1 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6 17:234 13197 11 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 22: 24 24 25 23 IO-APIC-fasteoi uhci_hcd:usb5 2289: 426764360 12 153890890 25567190 PCI-MSI-edge eth1 2290: 184062475 14352363 1146094937 36605794 PCI-MSI-edge eth0 2292: 253368176 26799612 221976501 20082294 PCI-MSI-edge cciss0 NMI: 0 0 0 0 LOC: 2910906978 2910907454 2910906845 2910907935 I haven't calculated diffs exactly yet, but on first glance it looks like eth0 interrupts are happening at about 150 a second while cciss0 interrupts are happening at about 20 per second. Also eth0 interrupts happen almost exclusively on one CPI (currently 2 at the moment) and cciss happen on two CPUs (0 and 2). I'm not sure what's up with CPU1 and 3 - is it possible that because these are the 2nd cores on each chip that they don't get as many interrupts ? isn't 'irqbalance' supposed to do something about it ? -- Oded = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Thursday, 20 בDecember 2007, Oded Arbel wrote: I haven't calculated diffs exactly yet, but on first glance it looks like eth0 interrupts are happening at about 150 a second while cciss0 interrupts are happening at about 20 per second. Well, ~150 interrupts/seconds is very low interrupt rate and should not cause a significant load *unless* they are doing a heavy work in each interrupt. [as a reference, on a specific device family I work, we use a *minimum* of 1000 interrupts/second even on very low-end hosts. When we connect several devices on a bit stonger hosts (single cpu) we normally get around ~4000 interrupts/second] I still tend to suspect the disk controller although its interrupt rate is really low. Maybe you can test this (run some I/O bound process like 'find /' and see if it affects on the hardware interrupts load in top. If all else fails, than you may want to start using oprofile. Hope it helps, -- Oron Peled Voice/Fax: +972-4-8228492 [EMAIL PROTECTED] http://www.actcom.co.il/~oron ICQ UIN: 16527398 Free software: each person contributes a brick, but ultimately each person receives a house in return. -- Brendan Scott To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
And my word to this never ending story: You may use get_cycles (http://lxr.linux.no/linux/include/asm-i386/tsc.h#L19) to measure time (cycles) in interrupts. Read more here: http://www.linuxdriver.co.il/ldd3/linuxdrive3-CHP-7-SECT-1.html -- Constantine Shulyupin Freelance Embedded Linux Engineer 054-4234440 http://www.linuxdriver.co.il/ = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Tuesday 18 December 2007, Oded Arbel wrote: I can see that a lot of time is spent in the hard-IRQ region - sometimes more then all other regions together. Lets look for more hints... - Anything interesting in the logs (during boot and after) ? - Lets plug out all the hardware you can: network , USB, disks... - rmmod all the modules you can. - Boot with a different kernel version. - Nothing yet? Lets play with the BIOS... What stops the IRQs? How far will you go to catch an IRQ? # = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
Can you send an output of cat /proc/interrupts ? Is there any device sharing the IRQ line with the network interface? Bnx2 has NAPI support. The recent changes you saw recently are not related, they are improvements to the NAPI machanism (to support multiple device queues, not specific to bnx2) Aviv Greenberg On 12/19/07, Dotan Shavit [EMAIL PROTECTED] wrote: On Tuesday 18 December 2007, Oded Arbel wrote: I can see that a lot of time is spent in the hard-IRQ region - sometimes more then all other regions together. Lets look for more hints... - Anything interesting in the logs (during boot and after) ? - Lets plug out all the hardware you can: network , USB, disks... - rmmod all the modules you can. - Boot with a different kernel version. - Nothing yet? Lets play with the BIOS... What stops the IRQs? How far will you go to catch an IRQ? # = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
Hi, You cannot turn it on/off. The driver may support this optional API or not. If it supports it, it's the driver sole decision when it's better to use polling/interrupt-per-packet according it its hardware specifics. I doubt whether this is exactly so for all NICS, as one might understand from your answer. For example, with e1000 NICs, you can select to build the driver with or without polling support. See, while configuring the kernel: Device Drivers-Network Device Support-Ethernet 1000 Mbit - Intel (R) PRO/1000 Gigabit Ethernet Support-Use RX polling (NAPI). selecting it sets the CONFIG_E1000_NAPI to y. By default, in newer kernels, for e1000, it comes with support for NAPI by default, but you can also build it without this support. And if you will look at the code of the driver, you will find in e1000_main.c module the following: #ifdef CONFIG_E1000_NAPI netdev-poll = e1000_clean; netdev-weight = 64; #endif Which means that , when building without CONFIG_E1000_NAPI set, you will not have the poll method and therefore no polling/NAPI. You have also the ability to choose NAPI for other nics; for example, Tulip; see Device Drivers-Network Device Support-Ethernet 10 or 100 Mbit - Tulip family network device support-Use NAPI RX polling. It could be that on other NICs you cannot turn it on/off. Broadcom was the first to release the tg3 driver with support for NAPI for Linux. So they have probably a lot of experience with it, and it could be that there NAPI support is built in and you cannot avoid it. BTW, with Open Solaris, this is exactly the situation: the NAPI support is in the core automatically; the driver start as interrupt driver, and changes to polling when there is a high load of interrupts.The drivers need not be built with any NAPI special support. The driver binary is the same when working with/without NAPI.There is a way, however, to configure kernel-wide NAPI parameters. Regards, Rami Rosen On Dec 18, 2007 10:14 PM, Oron Peled [EMAIL PROTECTED] wrote: On Tuesday, 18 בDecember 2007, Yedidyah Bar-David wrote: I am not an expert on this, but what you want might be NAPI - a new network driver infrastructure designed to solve just that. Google a bit - I do not know exactly when it entered 2.6 (and you did not state your kernel version) and which drivers use it already. 1. NAPI was new at kernel 2.3.x when it was developed towards 2.4 2. It gives the *driver* the option to toggle between interrupt driven and polling mode at runtime. E.g: - A GB ethernet at full speed may better poll the hardware every once in a while. - The same card is better off using interrupt driven mode if the trafic is low. 3. You cannot turn it on/off. The driver may support this optional API or not. If it supports it, it's the driver sole decision when it's better to use polling/interrupt-per-packet according it its hardware specifics. 4. I don't think a single fast ethernet card can severely affect your hardware interrupt load. So either: - You have a GB (or maybe 2GB?) ethernet with high load. - You have several fast-ethernet cards working at full speed. 5. A far better suspect would be the disk controller (e.g: working without DMA etc.) 6. Why guess? watch -n10 -d cat /proc/interrupts And calculate how many interrupts per-sec occured for various devices. That would give you a rough idea who are the possible suspects. -- Oron Peled Voice/Fax: +972-4-8228492 [EMAIL PROTECTED] http://www.actcom.co.il/~oron ICQ UIN: 16527398 Linux lasts longer! -- Kim J. Brand [EMAIL PROTECTED] To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Tue, 2007-12-18 at 15:21 +0200, Dotan Shavit wrote: I don't think that swapping has anything to do with the IRQ behavior I'm seeing, In that case, it probably is network related... Can you provide more details regarding this? Is the Apache server you mentioned located on the same machine? Indeed. Are you connected to a private vlan (or seeing non relevant traffic)? Its infrastructure I don't really have access to so I wouldn't know, but I'm on a good switch (maybe with a vlan) and I don't see traffic that isn't meant for me. Do you get this (a lot of time is spent in the hard-IRQ region) all the time or just when the server is accessed by it's clients? I'm always seeing some traffic, so its hard to say if I wouldn't see hard-IRQ when there aren't any clients. But interestingly enough a second identical machine which is currently doing nothing except maintaining a replica of the MySQL database on the first is also seeing high hard-IRQ counts. A third completely different computer on a different network with different work loads that also maintains a replica of the first MySQL database is also seeing high IRQ usage. What is the difference between this machine and the other (I understand the other machine works OK) ? Hardware wise and OS wise - nothing. Software wise there are many different things, but most prominently: * it doesn't see the same kind of traffic (which I currently don't think is the issue as the second server above doesn't see any traffic) * It doesn't replicate its databases. -- Oded = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Tue, 2007-12-18 at 07:48 +0200, Yedidyah Bar-David wrote: On Tue, Dec 18, 2007 at 02:49:29AM +0200, Oded Arbel wrote: Running some static benchmarks that should mimic the behavior on real load, on identical hardware at the office, I see very little hard-IRQ time if at all. The main difference between the static benchmark and real usage is that the static benchmark only tests the application logic and IO, while real usage also fetches some files served by Apache over HTTP with each request - maybe ~50Kbytes worth of responses are served by Apache for each request to the application. I was thinking that the high IRQ usage is due to high network traffic - could that be the case and could that be affecting the server's performance ? I am not an expert on this, but what you want might be NAPI - a new network driver infrastructure designed to solve just that. Google a bit - I do not know exactly when it entered 2.6 (and you did not state your kernel version) and which drivers use it already. Searching for NAPI I see some discussion on it entering 2.4 or 2.5, so I'm assuming 2.6 had it from the start. I also see some patches for the bnx2 NIC module which talk about NAPI related fixes for 2.6 - but only quite recently: October this year. I'm using Fedora 7 with kernel 2.6.22.1 which is fairly recent so I'm assuming I have this NAPI. can it possibly be currently turned off and I need to turn it on ? -- Oded = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Tuesday, 18 בDecember 2007, Yedidyah Bar-David wrote: I am not an expert on this, but what you want might be NAPI - a new network driver infrastructure designed to solve just that. Google a bit - I do not know exactly when it entered 2.6 (and you did not state your kernel version) and which drivers use it already. 1. NAPI was new at kernel 2.3.x when it was developed towards 2.4 2. It gives the *driver* the option to toggle between interrupt driven and polling mode at runtime. E.g: - A GB ethernet at full speed may better poll the hardware every once in a while. - The same card is better off using interrupt driven mode if the trafic is low. 3. You cannot turn it on/off. The driver may support this optional API or not. If it supports it, it's the driver sole decision when it's better to use polling/interrupt-per-packet according it its hardware specifics. 4. I don't think a single fast ethernet card can severely affect your hardware interrupt load. So either: - You have a GB (or maybe 2GB?) ethernet with high load. - You have several fast-ethernet cards working at full speed. 5. A far better suspect would be the disk controller (e.g: working without DMA etc.) 6. Why guess? watch -n10 -d cat /proc/interrupts And calculate how many interrupts per-sec occured for various devices. That would give you a rough idea who are the possible suspects. -- Oron Peled Voice/Fax: +972-4-8228492 [EMAIL PROTECTED] http://www.actcom.co.il/~oron ICQ UIN: 16527398 Linux lasts longer! -- Kim J. Brand [EMAIL PROTECTED] To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Processing time spent in IRQ handling and what to do about it
Hi List I have a somewhat of a problem but I don't know how serious it is or how to handle it: I manage several servers - quite a nice beasts, HP ML360G5 with 2 x dual Xeons and 4GB ram each. Now one of the production servers is not behaving all that well - it doesn't handle the load as well as I would like it to and its responses are slower then what I would expect according to previous benchmarks (on identical hardware, not on the specific machine). After doing some application testing and optimization, I still do not rule out sub-optimal application behavior, but I noticed something disturbing and I would appreciate some input on that - I use htop to monitor the server's load, and the load average is quite low when the servers suffers under load, and the cpu time bars rarely reach over 50%. Splitting the cpu time display in htop according to system/IO-wait/hard-IRQ/soft-IRQ I can see that a lot of time is spent in the hard-IRQ region - sometimes more then all other regions together. Running some static benchmarks that should mimic the behavior on real load, on identical hardware at the office, I see very little hard-IRQ time if at all. The main difference between the static benchmark and real usage is that the static benchmark only tests the application logic and IO, while real usage also fetches some files served by Apache over HTTP with each request - maybe ~50Kbytes worth of responses are served by Apache for each request to the application. I was thinking that the high IRQ usage is due to high network traffic - could that be the case and could that be affecting the server's performance ? I'd appreciate any references that you can provide - searching the web for irq bnx2 (the NIC module used by the machine) yields nothing that I could decipher. Thanks in advance -- Oded = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: Processing time spent in IRQ handling and what to do about it
On Tue, Dec 18, 2007 at 02:49:29AM +0200, Oded Arbel wrote: Running some static benchmarks that should mimic the behavior on real load, on identical hardware at the office, I see very little hard-IRQ time if at all. The main difference between the static benchmark and real usage is that the static benchmark only tests the application logic and IO, while real usage also fetches some files served by Apache over HTTP with each request - maybe ~50Kbytes worth of responses are served by Apache for each request to the application. I was thinking that the high IRQ usage is due to high network traffic - could that be the case and could that be affecting the server's performance ? I am not an expert on this, but what you want might be NAPI - a new network driver infrastructure designed to solve just that. Google a bit - I do not know exactly when it entered 2.6 (and you did not state your kernel version) and which drivers use it already. -- Didi = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]