Re: [PATCH] scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression
On Tue, Jan 05, 2021 at 12:41:04AM +0100, Arnd Bergmann wrote: > Phil Oester reported that a fix for a possible buffer overrun that I > sent caused a regression that manifests in this output: > > Event Message: A PCI parity error was detected on a component at bus 0 > device 5 function 0. > Severity: Critical > Message ID: PCI1308 > > The original code tried to handle the sense data pointer differently > when using 32-bit 64-bit DMA addressing, which would lead to a 32-bit > dma_addr_t value of 0x11223344 to get stored > > 32-bit kernel: 44 33 22 11 ?? ?? ?? ?? > 64-bit LE kernel:44 33 22 11 00 00 00 00 > 64-bit BE kernel:00 00 00 00 44 33 22 11 > > or a 64-bit dma_addr_t value of 0x1122334455667788 to get stored as > > 32-bit kernel: 88 77 66 55 ?? ?? ?? ?? > 64-bit kernel: 88 77 66 55 44 33 22 11 > > In my patch, I tried to ensure that the same value is used on both > 32-bit and 64-bit kernels, and picked what seemed to be the most sensible > combination, storing 32-bit addresses in the first four bytes (as 32-bit > kernels already did), and 64-bit addresses in eight consecutive bytes > (as 64-bit kernels already did), but evidently this was incorrect. > > Always storing the dma_addr_t pointer as 64-bit little-endian, > i.e. initializing the second four bytes to zero in case of 32-bit > addressing, apparently solved the problem for Phil, and is consistent > with what all 64-bit little-endian machines did before. > > I also checked in the history that in previous versions of the code, > the pointer was always in the first four bytes without padding, and that > previous attempts to fix 64-bit user space, big-endian architectures > and 64-bit DMA were clearly flawed and seem to have introduced made > this worse. > > Reported-by: Phil Oester > Fixes: 381d34e376e3 ("scsi: megaraid_sas: Check user-provided offsets") > Fixes: 107a60dd71b5 ("scsi: megaraid_sas: Add support for 64bit consistent > DMA") > Fixes: 94cd65ddf4d7 ("[SCSI] megaraid_sas: addded support for big endian > architecture") > Fixes: 7b2519afa1ab ("[SCSI] megaraid_sas: fix 64 bit sense pointer > truncation") > Signed-off-by: Arnd Bergmann This solves the issue on our Dell servers, thanks Arnd. Phil
Re: [PATCH 2/3] scsi: megaraid_sas: check user-provided offsets
On Sun, Jan 03, 2021 at 05:26:29PM +0100, Arnd Bergmann wrote: > Thank you for the report and bisecting the issue, and sorry this broke > your system! > > Fortunately, the patch is fairly small, so there are only a limited number > of things that could go wrong. I haven't tried to analyze that message, > but I have two ideas: > > a) The added ioc->sense_off check gets triggered and the code relies > on the data being written outside of the structure > > b) the address actually needs to always be written as a 64-bit value > regardless of the instance->consistent_mask_64bit flag, as the >driver did before. This looked like it was done in error. > > Can you try the patch below instead of the revert and see if that > resolves the regression, and if it triggers the warning message I > add? Thanks Arnd, I tried your patch and it resolves the regression. It does not trigger the warning message you added. Phil
Re: [PATCH 2/3] scsi: megaraid_sas: check user-provided offsets
On Tue, Sep 08, 2020 at 11:36:22PM +0200, Arnd Bergmann wrote: > It sounds unwise to let user space pass an unchecked 32-bit > offset into a kernel structure in an ioctl. This is an unsigned > variable, so checking the upper bound for the size of the structure > it points into is sufficient to avoid data corruption, but as > the pointer might also be unaligned, it has to be written carefully > as well. > > While I stumbled over this problem by reading the code, I did not > continue checking the function for further problems like it. Sorry for replying to an ancient thread, but this patch just recently made it into 5.10.3 and has caused unintended consequences. On Dell servers with PERC RAID controllers, booting 5.10.3+ with this patch causes a PCI parity error. Specifically: Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0. Severity: Critical Message ID: PCI1308 I reverted this single patch and the errors went away. Thoughts? Phil Oester
2.6.25-rc3: WARNING: at arch/x86/mm/ioremap.c:137
Got the below on -rc3. Tried applying the "more info" patch from Arjan (http://marc.info/?l=linux-kernel=120336371506283=2), but that just made the warning go away. Phil ACPI: EC: Look up EC in DSDT [ cut here ] WARNING: at arch/x86/mm/ioremap.c:137 __ioremap+0xb1/0x16b() Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #7 [] warn_on_slowpath+0x40/0x4f [] acpi_ut_update_object_reference+0xb0/0x109 [] acpi_ns_lookup+0x205/0x2ea [] acpi_ut_release_mutex+0x50/0x55 [] acpi_ns_get_node+0x79/0x83 [] raw_pci_read+0x3d/0x45 [] acpi_ut_acquire_mutex+0x2e/0x64 [] acpi_ut_release_mutex+0x50/0x55 [] acpi_get_parent+0x63/0x6c [] raw_pci_read+0x3d/0x45 [] __ioremap+0xb1/0x16b [] acpi_ex_system_memory_space_handler+0xdd/0x210 [] acpi_ex_system_memory_space_handler+0x0/0x210 [] acpi_ev_address_space_dispatch+0x127/0x168 [] acpi_ex_access_region+0x1a5/0x1b7 [] acpi_ex_field_datum_io+0x10b/0x193 [] acpi_ns_search_one_scope+0x12/0x37 [] acpi_ns_search_and_enter+0xf8/0x159 [] acpi_ex_extract_from_field+0x98/0x221 [] acpi_ut_allocate_object_desc_dbg+0x29/0x58 [] acpi_ut_create_internal_object_dbg+0x15/0x67 [] acpi_ex_read_data_from_field+0x108/0x136 [] acpi_ex_resolve_node_to_value+0x145/0x1c0 [] acpi_ex_resolve_to_value+0x202/0x20c [] acpi_ex_resolve_operands+0x1e1/0x4aa [] acpi_ds_exec_end_op+0xa7/0x3ae [] acpi_ps_parse_loop+0x546/0x6f8 [] acpi_ps_parse_aml+0x5f/0x222 [] acpi_ps_execute_method+0x10a/0x1a7 [] acpi_ns_evaluate+0x90/0xe4 [] acpi_ns_init_one_device+0x72/0xad [] acpi_ns_walk_namespace+0x94/0x114 [] acpi_ns_initialize_devices+0x80/0xb3 [] acpi_ns_init_one_device+0x0/0xad [] acpi_initialize_objects+0x28/0x3a [] acpi_init+0x77/0x21f [] kernel_init+0x97/0x1d1 [] schedule_tail+0xe/0x39 [] ret_from_fork+0x6/0x1c [] kernel_init+0x0/0x1d1 [] kernel_init+0x0/0x1d1 [] kernel_thread_helper+0x7/0x10 === ---[ end trace ca143223eefdc828 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.25-rc3: WARNING: at arch/x86/mm/ioremap.c:137
Got the below on -rc3. Tried applying the more info patch from Arjan (http://marc.info/?l=linux-kernelm=120336371506283w=2), but that just made the warning go away. Phil ACPI: EC: Look up EC in DSDT [ cut here ] WARNING: at arch/x86/mm/ioremap.c:137 __ioremap+0xb1/0x16b() Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #7 [c01112d3] warn_on_slowpath+0x40/0x4f [c01bb692] acpi_ut_update_object_reference+0xb0/0x109 [c01b4215] acpi_ns_lookup+0x205/0x2ea [c01bc0f8] acpi_ut_release_mutex+0x50/0x55 [c01b54fa] acpi_ns_get_node+0x79/0x83 [c0213a91] raw_pci_read+0x3d/0x45 [c01bc1ba] acpi_ut_acquire_mutex+0x2e/0x64 [c01bc0f8] acpi_ut_release_mutex+0x50/0x55 [c01b6113] acpi_get_parent+0x63/0x6c [c0213a91] raw_pci_read+0x3d/0x45 [c010cdde] __ioremap+0xb1/0x16b [c01b2291] acpi_ex_system_memory_space_handler+0xdd/0x210 [c01b21b4] acpi_ex_system_memory_space_handler+0x0/0x210 [c01aba50] acpi_ev_address_space_dispatch+0x127/0x168 [c01af5e2] acpi_ex_access_region+0x1a5/0x1b7 [c01af6ff] acpi_ex_field_datum_io+0x10b/0x193 [c01b45da] acpi_ns_search_one_scope+0x12/0x37 [c01b46f7] acpi_ns_search_and_enter+0xf8/0x159 [c01af81f] acpi_ex_extract_from_field+0x98/0x221 [c01bbd9f] acpi_ut_allocate_object_desc_dbg+0x29/0x58 [c01bbde3] acpi_ut_create_internal_object_dbg+0x15/0x67 [c01ae268] acpi_ex_read_data_from_field+0x108/0x136 [c01b2f11] acpi_ex_resolve_node_to_value+0x145/0x1c0 [c01aedf2] acpi_ex_resolve_to_value+0x202/0x20c [c01b0ceb] acpi_ex_resolve_operands+0x1e1/0x4aa [c01a9471] acpi_ds_exec_end_op+0xa7/0x3ae [c01b72d2] acpi_ps_parse_loop+0x546/0x6f8 [c01b67ab] acpi_ps_parse_aml+0x5f/0x222 [c01b795b] acpi_ps_execute_method+0x10a/0x1a7 [c01b4ed8] acpi_ns_evaluate+0x90/0xe4 [c01b5a31] acpi_ns_init_one_device+0x72/0xad [c01b5f08] acpi_ns_walk_namespace+0x94/0x114 [c01b598c] acpi_ns_initialize_devices+0x80/0xb3 [c01b59bf] acpi_ns_init_one_device+0x0/0xad [c01bae06] acpi_initialize_objects+0x28/0x3a [c03182ba] acpi_init+0x77/0x21f [c030c5fd] kernel_init+0x97/0x1d1 [c010ee27] schedule_tail+0xe/0x39 [c01027b2] ret_from_fork+0x6/0x1c [c030c566] kernel_init+0x0/0x1d1 [c030c566] kernel_init+0x0/0x1d1 [c01033f7] kernel_thread_helper+0x7/0x10 === ---[ end trace ca143223eefdc828 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] x86 arch updates for v2.6.25
On Mon, Feb 04, 2008 at 07:27:53PM -0800, Linus Torvalds wrote: > kgdb? Not so interesting. We have many more hard problems happening at > user sites, not in developer hands. FWIW, I'm not a fulltime developer by any means, but on occasion I have fixed a few bugs in the netfilter area of the kernel. And in almost all cases, I used kgdb in my debugging and testing of fixes. In doing so, it was a bit of a PITA to find/patch kgdb into the kernel, and having it as a configurable option would have saved me some time and effort and made the process much smoother. So perhaps someone else out there would find it similarly useful, and the extra time it takes to find/patch/compile kgdb in is precluding them from participating? Why would we ever want to do that? Phil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] x86 arch updates for v2.6.25
On Mon, Feb 04, 2008 at 07:27:53PM -0800, Linus Torvalds wrote: kgdb? Not so interesting. We have many more hard problems happening at user sites, not in developer hands. FWIW, I'm not a fulltime developer by any means, but on occasion I have fixed a few bugs in the netfilter area of the kernel. And in almost all cases, I used kgdb in my debugging and testing of fixes. In doing so, it was a bit of a PITA to find/patch kgdb into the kernel, and having it as a configurable option would have saved me some time and effort and made the process much smoother. So perhaps someone else out there would find it similarly useful, and the extra time it takes to find/patch/compile kgdb in is precluding them from participating? Why would we ever want to do that? Phil -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: > The problems start around time index 09:21:39.860302 when the CLIENT issues > a TCP packet with SACK option set (seemingly for a data segment which has > already been seen) from that point on the connection hangs. I'd say most likely scenario is the SERVER is behind a Cisco Pix firewall, which has known bugs in handling packets with sack option. By default the Cisco has sequence number randomization enabled, but it's a half-assed implementation which doesn't bother adjusting the sequence numbers inside sack options. This has been reported to Cisco, and they don't seem to care. As a workaround, you can do this: echo 0 > /proc/sys/net/ipv4/tcp_sack and it will probably fix it up. It'd be really nice, however, to have a per-route option for sack, similar to how we can clamp window scaling per route. Something like the below ip r a Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: The problems start around time index 09:21:39.860302 when the CLIENT issues a TCP packet with SACK option set (seemingly for a data segment which has already been seen) from that point on the connection hangs. I'd say most likely scenario is the SERVER is behind a Cisco Pix firewall, which has known bugs in handling packets with sack option. By default the Cisco has sequence number randomization enabled, but it's a half-assed implementation which doesn't bother adjusting the sequence numbers inside sack options. This has been reported to Cisco, and they don't seem to care. As a workaround, you can do this: echo 0 /proc/sys/net/ipv4/tcp_sack and it will probably fix it up. It'd be really nice, however, to have a per-route option for sack, similar to how we can clamp window scaling per route. Something like the below ip r a host gw nosack Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.22 released
On Sun, Jul 08, 2007 at 04:52:52PM -0700, Linus Torvalds wrote: > Anybody? Should I make just the shortlogs available instead (I don't save > those, but I post those for the later -rc's - usually the -rc1 and -rc2's > are too big for the mailing list, but they are still a lot smaller and > more readable than the *full* logs are)? > > Or do people really want the full logs, and don't use git? I don't use git, and sometimes find it useful to view the changelogs to look for when a particular change occurred. Doing so via: http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.X where X is decremented from current rev is handy. Please keep them around. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.22 released
On Sun, Jul 08, 2007 at 04:52:52PM -0700, Linus Torvalds wrote: Anybody? Should I make just the shortlogs available instead (I don't save those, but I post those for the later -rc's - usually the -rc1 and -rc2's are too big for the mailing list, but they are still a lot smaller and more readable than the *full* logs are)? Or do people really want the full logs, and don't use git? I don't use git, and sometimes find it useful to view the changelogs to look for when a particular change occurred. Doing so via: http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.X where X is decremented from current rev is handy. Please keep them around. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-git2] sk_buff changes break Cisco VPN client
On Sat, Apr 28, 2007 at 10:38:45PM +0200, Alessandro Suardi wrote: > skb_set_timestamp I can figure out, but the rest is a bit > too hard for me... if anyone has already an idea of how > to fix this, I'd be most grateful. Did you ask Cisco? Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-git2] sk_buff changes break Cisco VPN client
On Sat, Apr 28, 2007 at 10:38:45PM +0200, Alessandro Suardi wrote: skb_set_timestamp I can figure out, but the rest is a bit too hard for me... if anyone has already an idea of how to fix this, I'd be most grateful. Did you ask Cisco? Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19: skb_over_panic, followed by a BUG at net/core/skbuff.c:93
On Sat, Dec 02, 2006 at 04:40:54PM +0100, Helge Deller wrote: > > The bug happens when gentoo wants to bring up eth0 (starting the lo > > device works fine), even a simple 'ifconfig eth0 192.168.0.11' will > > crash the kernel. > I just faced the same problem, but on a HPPA (PARISC) box with 32bit kernel. > Just reported at the parisc-linux kernel mailing list as well: > http://lists.parisc-linux.org/pipermail/parisc-linux/2006-December/030810.html Please read the mailing list archives...this has been covered a number of times in the past few days. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19: skb_over_panic, followed by a BUG at net/core/skbuff.c:93
On Sat, Dec 02, 2006 at 04:40:54PM +0100, Helge Deller wrote: The bug happens when gentoo wants to bring up eth0 (starting the lo device works fine), even a simple 'ifconfig eth0 192.168.0.11' will crash the kernel. I just faced the same problem, but on a HPPA (PARISC) box with 32bit kernel. Just reported at the parisc-linux kernel mailing list as well: http://lists.parisc-linux.org/pipermail/parisc-linux/2006-December/030810.html Please read the mailing list archives...this has been covered a number of times in the past few days. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
Getting an oops on boot here, caused by commit e81c73596704793e73e6dbb478f41686f15a4b34 titled "[NET]: Fix MAX_HEADER setting". Reverting that patch fixes things up for me. Dave? Phil Bringing up interface eth0: skb_over_panic: text:c02af809 len:56 put:16 head:d7e213c0 data:d7e213d0 tail:d7e21408 end:d7e21400 dev:eth0 [ cut here ] kernel BUG at net/core/skbuff.c:93! invalid opcode: [#1] CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010296 (2.6.19 #1) EIP is at skb_over_panic+0x59/0x70 eax: 006f ebx: d7e213c0 ecx: edx: c03102c0 esi: d7e4f000 edi: d7e213f8 ebp: d7e4f000 esp: c037aec4 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, ti=c037a000 task=c03023e0 task.ti=c0347000) Stack: c02fbb9c c02af809 0038 0010 d7e213c0 d7e213d0 d7e21408 d7e21400 d7e4f000 0010 d6e84520 c02af80e d6c718a0 c037af6c 003a 0010 c037af6c d6c718a0 d6c09920 d79749c0 0001 02ff Call Trace: [] ndisc_send_rs+0x399/0x3e0 [] ndisc_send_rs+0x39e/0x3e0 [] addrconf_dad_completed+0x82/0xc0 [] addrconf_dad_timer+0xe5/0xf0 [] e100_poll+0x259/0x420 [] it_real_fn+0x0/0x60 [] cascade+0x3f/0x60 [] addrconf_dad_timer+0x0/0xf0 [] run_timer_softirq+0xab/0x170 [] __do_softirq+0x42/0xa0 [] do_softirq+0x60/0xb0 [] handle_edge_irq+0x0/0x110 [] do_IRQ+0x85/0xe0 [] schedule+0x29e/0x580 [] common_interrupt+0x1a/0x20 [] default_idle+0x32/0x60 [] cpu_idle+0x42/0x60 [] start_kernel+0x283/0x330 [] unknown_bootoption+0x0/0x260 === Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 2 4 04 c7 04 24 9c bb 2f c0 89 44 24 08 e8 47 07 ed ff <0f> 0b 5d 00 a4 91 2f c0 83 c4 24 5b 5e c3 89 f6 8d bc 27 00 00 EIP: [] skb_over_panic+0x59/0x70 SS:ESP 0068:c037aec4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
Getting an oops on boot here, caused by commit e81c73596704793e73e6dbb478f41686f15a4b34 titled [NET]: Fix MAX_HEADER setting. Reverting that patch fixes things up for me. Dave? Phil Bringing up interface eth0: skb_over_panic: text:c02af809 len:56 put:16 head:d7e213c0 data:d7e213d0 tail:d7e21408 end:d7e21400 dev:eth0 [ cut here ] kernel BUG at net/core/skbuff.c:93! invalid opcode: [#1] CPU:0 EIP:0060:[c0244659]Not tainted VLI EFLAGS: 00010296 (2.6.19 #1) EIP is at skb_over_panic+0x59/0x70 eax: 006f ebx: d7e213c0 ecx: edx: c03102c0 esi: d7e4f000 edi: d7e213f8 ebp: d7e4f000 esp: c037aec4 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, ti=c037a000 task=c03023e0 task.ti=c0347000) Stack: c02fbb9c c02af809 0038 0010 d7e213c0 d7e213d0 d7e21408 d7e21400 d7e4f000 0010 d6e84520 c02af80e d6c718a0 c037af6c 003a 0010 c037af6c d6c718a0 d6c09920 d79749c0 0001 02ff Call Trace: [c02af809] ndisc_send_rs+0x399/0x3e0 [c02af80e] ndisc_send_rs+0x39e/0x3e0 [c02a4132] addrconf_dad_completed+0x82/0xc0 [c02a6595] addrconf_dad_timer+0xe5/0xf0 [c0214799] e100_poll+0x259/0x420 [c0117330] it_real_fn+0x0/0x60 [c011bcdf] cascade+0x3f/0x60 [c02a64b0] addrconf_dad_timer+0x0/0xf0 [c011bdeb] run_timer_softirq+0xab/0x170 [c01188c2] __do_softirq+0x42/0xa0 [c01053c0] do_softirq+0x60/0xb0 [c012eea0] handle_edge_irq+0x0/0x110 [c0105495] do_IRQ+0x85/0xe0 [c02c707e] schedule+0x29e/0x580 [c0103586] common_interrupt+0x1a/0x20 [c01018f2] default_idle+0x32/0x60 [c0101962] cpu_idle+0x42/0x60 [c0348733] start_kernel+0x283/0x330 [c0348250] unknown_bootoption+0x0/0x260 === Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 2 4 04 c7 04 24 9c bb 2f c0 89 44 24 08 e8 47 07 ed ff 0f 0b 5d 00 a4 91 2f c0 83 c4 24 5b 5e c3 89 f6 8d bc 27 00 00 EIP: [c0244659] skb_over_panic+0x59/0x70 SS:ESP 0068:c037aec4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
e1000 dual-port nic failures
I am seeing some odd behaviour from a dual port Intel e1000 nic. When I have only one port of the dual port nic plugged in, all is well. But when I plug the 2nd port in, the box goes to pieces. The counters for the interfaces on the nic start moving in reverse, and the interfaces start spewing the logs with these errors: kernel: NETDEV WATCHDOG: eth2: transmit timed out kernel: e1000: eth2: e1000_reset: Hardware Error kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex I'm currently testing with 2.6.13-rc6, but tried with 2.6.10 and had the same problem. I've got two identical boxes (Dell Optiplex GX280s) exhibiting the same problem, so really don't think its a hardware problem. I upgraded one to the latest BIOS just for kicks, but it didn't help. I've tried removing ACPI, SMT, using pci=routeirq, all to no avail. Below is dmesg output with PCI debugging enabled. The problematic nic is eth[23]. Any suggestions? Phil Linux version 2.6.13-rc6 (root@) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #7 Tue Aug 16 14:21:31 EDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3f686c00 (usable) BIOS-e820: 3f686c00 - 3f688c00 (ACPI NVS) BIOS-e820: 3f688c00 - 3f68ac00 (ACPI data) BIOS-e820: 3f68ac00 - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fed00400 (reserved) BIOS-e820: fed2 - feda (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 118MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 On node 0 totalpages: 259718 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 30342 pages, LIFO batch:15 DMI 2.3 present. ACPI: RSDP (v000 DELL ) @ 0x000fec00 ACPI: RSDT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcbfd ACPI: FADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc3d ACPI: SSDT (v001 DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffd43fc ACPI: MADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fccb1 ACPI: BOOT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd23 ACPI: ASF! (v016 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd4b ACPI: MCFG (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdb2 ACPI: HPET (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdf0 ACPI: DSDT (v001 DELLdt_ex 0x1000 MSFT 0x010d) @ 0x ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:4 APIC version 20 WARNING: NR_CPUS limit of 1 reached. Processor ignored. ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 4000 (gap: 4000:a000) Built 1 zonelists Kernel command line: ro nofirewire root=/dev/sda9 console=tty0 console=ttyS0,9600 panic=1 mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) Initializing CPU#0 CPU 0 irqstacks, hard=c0336000 soft=c0335000 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 2993.117 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1026500k/1038872k available (1591k kernel code, 11500k reserved, 491k data, 152k init, 121368k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 5995.10 BogoMIPS (lpj=11990207) Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: bfebfbff 0010 0080 441d Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0:
e1000 dual-port nic failures
I am seeing some odd behaviour from a dual port Intel e1000 nic. When I have only one port of the dual port nic plugged in, all is well. But when I plug the 2nd port in, the box goes to pieces. The counters for the interfaces on the nic start moving in reverse, and the interfaces start spewing the logs with these errors: kernel: NETDEV WATCHDOG: eth2: transmit timed out kernel: e1000: eth2: e1000_reset: Hardware Error kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex I'm currently testing with 2.6.13-rc6, but tried with 2.6.10 and had the same problem. I've got two identical boxes (Dell Optiplex GX280s) exhibiting the same problem, so really don't think its a hardware problem. I upgraded one to the latest BIOS just for kicks, but it didn't help. I've tried removing ACPI, SMT, using pci=routeirq, all to no avail. Below is dmesg output with PCI debugging enabled. The problematic nic is eth[23]. Any suggestions? Phil Linux version 2.6.13-rc6 (root@) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #7 Tue Aug 16 14:21:31 EDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3f686c00 (usable) BIOS-e820: 3f686c00 - 3f688c00 (ACPI NVS) BIOS-e820: 3f688c00 - 3f68ac00 (ACPI data) BIOS-e820: 3f68ac00 - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fed00400 (reserved) BIOS-e820: fed2 - feda (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 118MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 On node 0 totalpages: 259718 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 30342 pages, LIFO batch:15 DMI 2.3 present. ACPI: RSDP (v000 DELL ) @ 0x000fec00 ACPI: RSDT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcbfd ACPI: FADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc3d ACPI: SSDT (v001 DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffd43fc ACPI: MADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fccb1 ACPI: BOOT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd23 ACPI: ASF! (v016 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd4b ACPI: MCFG (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdb2 ACPI: HPET (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdf0 ACPI: DSDT (v001 DELLdt_ex 0x1000 MSFT 0x010d) @ 0x ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:4 APIC version 20 WARNING: NR_CPUS limit of 1 reached. Processor ignored. ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 4000 (gap: 4000:a000) Built 1 zonelists Kernel command line: ro nofirewire root=/dev/sda9 console=tty0 console=ttyS0,9600 panic=1 mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) Initializing CPU#0 CPU 0 irqstacks, hard=c0336000 soft=c0335000 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 2993.117 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1026500k/1038872k available (1591k kernel code, 11500k reserved, 491k data, 152k init, 121368k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 5995.10 BogoMIPS (lpj=11990207) Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: After all inits, caps: bfebfbff 0010 0080 441d Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0:
Re: 2.6.12->2.6.13-rc6 SMT changes -- intentional?
On Tue, Aug 09, 2005 at 12:34:10AM +0200, Petr Vandrovec wrote: > It looks like that ACPI is gone... Can you recheck your .config that > you still have ACPI enabled? > Petr Hmmff...yup, you are correct. Which is interesting, since I just copied the 2.6.12.4 .config, and did a make oldconfig on it. Looks like ACPI is now dependent on CONFIG_PM, while it was not before. Wonder how many others this will bite... Thanks, Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.12->2.6.13-rc6 SMT changes -- intentional?
Just booted a box on 2.6.13-rc6, and noticed that it now only reports a single processor, whereas on 2.6.12.4 it reports two. While there is only one physical processor, I wonder if this change was intentional, since I can't find anything in the changelog about SMT changes. Below is dmesg output from each kernel. Phil Linux version 2.6.12.4 (root@) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Mon Aug 8 17:13:40 EDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3f686c00 (usable) BIOS-e820: 3f686c00 - 3f688c00 (ACPI NVS) BIOS-e820: 3f688c00 - 3f68ac00 (ACPI data) BIOS-e820: 3f68ac00 - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fed00400 (reserved) BIOS-e820: fed2 - feda (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 118MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 On node 0 totalpages: 259718 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 30342 pages, LIFO batch:15 DMI 2.3 present. ACPI: RSDP (v000 DELL ) @ 0x000fec00 ACPI: RSDT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc04 ACPI: FADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc44 ACPI: SSDT (v001 DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffd3468 ACPI: MADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fccb8 ACPI: BOOT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd2a ACPI: ASF! (v016 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd52 ACPI: MCFG (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdb9 ACPI: HPET (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdf7 ACPI: DSDT (v001 DELLdt_ex 0x1000 MSFT 0x010d) @ 0x ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 4000 (gap: 4000:a000) Built 1 zonelists Kernel command line: ro nofirewire root=/dev/sda9 console=tty0 console=ttyS0,9600 panic=1 mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) Initializing CPU#0 CPU 0 irqstacks, hard=c034a000 soft=c0348000 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 2994.062 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1026420k/1038872k available (1635k kernel code, 11624k reserved, 486k data, 188k init, 121368k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay loop... 5898.24 BogoMIPS (lpj=2949120) Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebfbff 0010 0080 441d Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01 Booting processor 1/1 eip 2000 CPU 1 irqstacks, hard=c034b000 soft=c0349000 Initializing CPU#1 Calibrating delay loop... 5980.16 BogoMIPS (lpj=2990080) CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2
2.6.12-2.6.13-rc6 SMT changes -- intentional?
Just booted a box on 2.6.13-rc6, and noticed that it now only reports a single processor, whereas on 2.6.12.4 it reports two. While there is only one physical processor, I wonder if this change was intentional, since I can't find anything in the changelog about SMT changes. Below is dmesg output from each kernel. Phil Linux version 2.6.12.4 (root@) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Mon Aug 8 17:13:40 EDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3f686c00 (usable) BIOS-e820: 3f686c00 - 3f688c00 (ACPI NVS) BIOS-e820: 3f688c00 - 3f68ac00 (ACPI data) BIOS-e820: 3f68ac00 - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fed00400 (reserved) BIOS-e820: fed2 - feda (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 118MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 On node 0 totalpages: 259718 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 30342 pages, LIFO batch:15 DMI 2.3 present. ACPI: RSDP (v000 DELL ) @ 0x000fec00 ACPI: RSDT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc04 ACPI: FADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcc44 ACPI: SSDT (v001 DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffd3468 ACPI: MADT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fccb8 ACPI: BOOT (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd2a ACPI: ASF! (v016 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcd52 ACPI: MCFG (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdb9 ACPI: HPET (v001 DELLGX280 0x0007 ASL 0x0061) @ 0x000fcdf7 ACPI: DSDT (v001 DELLdt_ex 0x1000 MSFT 0x010d) @ 0x ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 4000 (gap: 4000:a000) Built 1 zonelists Kernel command line: ro nofirewire root=/dev/sda9 console=tty0 console=ttyS0,9600 panic=1 mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) Initializing CPU#0 CPU 0 irqstacks, hard=c034a000 soft=c0348000 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 2994.062 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1026420k/1038872k available (1635k kernel code, 11624k reserved, 486k data, 188k init, 121368k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay loop... 5898.24 BogoMIPS (lpj=2949120) Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebfbff 0010 0080 441d Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01 Booting processor 1/1 eip 2000 CPU 1 irqstacks, hard=c034b000 soft=c0349000 Initializing CPU#1 Calibrating delay loop... 5980.16 BogoMIPS (lpj=2990080) CPU: After generic identify, caps: bfebfbff 0010 441d CPU: After vendor identify, caps: bfebfbff 0010 441d monitor/mwait feature present. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2
Re: 2.6.12-2.6.13-rc6 SMT changes -- intentional?
On Tue, Aug 09, 2005 at 12:34:10AM +0200, Petr Vandrovec wrote: It looks like that ACPI is gone... Can you recheck your .config that you still have ACPI enabled? Petr Hmmff...yup, you are correct. Which is interesting, since I just copied the 2.6.12.4 .config, and did a make oldconfig on it. Looks like ACPI is now dependent on CONFIG_PM, while it was not before. Wonder how many others this will bite... Thanks, Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12.2 -- time passes faster; related to the acpi_register_gsi() call
On Fri, Jul 08, 2005 at 11:25:08PM +0200, Alexander Nyberg wrote: > fre 2005-07-08 klockan 23:12 +0200 skrev Rudo Thomas: > > Hello, guys. > > > > Time started to pass faster with 2.6.12.2 (actually, it was 2.6.12-ck3 > > which is based on it). I have isolated the cause of the problem: > > I bet you this fixes it (already in mainline) > > If ACPI doesn't find an irq listed, don't accept 0 as a valid PCI irq. FYI, this did fix the time-passing-faster problem for me on a poweredge 750 a few days ago. I'd suggest this fix should go to -stable. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12.2 -- time passes faster; related to the acpi_register_gsi() call
On Fri, Jul 08, 2005 at 11:25:08PM +0200, Alexander Nyberg wrote: fre 2005-07-08 klockan 23:12 +0200 skrev Rudo Thomas: Hello, guys. Time started to pass faster with 2.6.12.2 (actually, it was 2.6.12-ck3 which is based on it). I have isolated the cause of the problem: I bet you this fixes it (already in mainline) If ACPI doesn't find an irq listed, don't accept 0 as a valid PCI irq. FYI, this did fix the time-passing-faster problem for me on a poweredge 750 a few days ago. I'd suggest this fix should go to -stable. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Wed, Apr 13, 2005 at 12:45:51PM +, Paul Slootman wrote: > We have a variety of Dell rackmount systems, also on Cyclades, and see > this mess everywhere. > > I had reported this problem a little while ago, see > http://marc.theaimsgroup.com/?l=linux-kernel=111036598927105=2 > but unfortunately didn't get any response at that time. Read the remainder of the thread for a workaround. http://www.uwsg.iu.edu/hypermail/linux/kernel/0503.3/1061.html Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Wed, Apr 13, 2005 at 12:45:51PM +, Paul Slootman wrote: We have a variety of Dell rackmount systems, also on Cyclades, and see this mess everywhere. I had reported this problem a little while ago, see http://marc.theaimsgroup.com/?l=linux-kernelm=111036598927105w=2 but unfortunately didn't get any response at that time. Read the remainder of the thread for a workaround. http://www.uwsg.iu.edu/hypermail/linux/kernel/0503.3/1061.html Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 03:10:05PM +, Russell King wrote: > Doesn't matter. The problem is that dwmw2's NS16550A patch (from ages > ago) changes the prescaler setting for this device so we can use the > higher speed baud rates. This means any programmed divisor (programmed > at early serial console initialisation time) suddenly becomes wrong as > soon as we fiddle with the prescaler during normal UART initialisation > time. Seems like you are correct, given the below patch fixes the garbage output for me. Phil --- linux-standard/drivers/serial/8250.c2005-03-02 02:37:47.0 -0500 +++ linux-dellfw/drivers/serial/8250.c 2005-03-28 12:28:34.560032856 -0500 @@ -698,7 +698,7 @@ serial_outp(up, UART_MCR, status1); if ((status2 ^ status1) & UART_MCR_LOOP) { -#ifndef CONFIG_PPC +#if 0 serial_outp(up, UART_LCR, 0xE0); status1 = serial_in(up, 0x04); /* EXCR1 */ status1 &= ~0xB0; /* Disable LOCK, mask out PRESL[01] */
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 03:10:05PM +, Russell King wrote: Doesn't matter. The problem is that dwmw2's NS16550A patch (from ages ago) changes the prescaler setting for this device so we can use the higher speed baud rates. This means any programmed divisor (programmed at early serial console initialisation time) suddenly becomes wrong as soon as we fiddle with the prescaler during normal UART initialisation time. Seems like you are correct, given the below patch fixes the garbage output for me. Phil --- linux-standard/drivers/serial/8250.c2005-03-02 02:37:47.0 -0500 +++ linux-dellfw/drivers/serial/8250.c 2005-03-28 12:28:34.560032856 -0500 @@ -698,7 +698,7 @@ serial_outp(up, UART_MCR, status1); if ((status2 ^ status1) UART_MCR_LOOP) { -#ifndef CONFIG_PPC +#if 0 serial_outp(up, UART_LCR, 0xE0); status1 = serial_in(up, 0x04); /* EXCR1 */ status1 = ~0xB0; /* Disable LOCK, mask out PRESL[01] */
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 04:37:29PM +, Russell King wrote: > > serio: i8042 AUX port at 0x60,0x64 irq 12 > > serio: i8042 KBD port at 0x60,0x64 irq 1 > > Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled Garbage here Sorry -- I was relying upon my (flawed) memory of the bootup sequence, but sending you the contents from /var/log/dmesg. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 04:37:29PM +, Russell King wrote: > > But intererstingly, on identical boxes, the garbage only appears on > > those hooked up to a PortMaster device - those using a Cyclades never > > display this problem. (???) > > Sorry, I don't understand your scenarios. Can you explain the > circumstances under which you see corruption? > > From the kernel messages you've quoted above, I can only think that > you're not using ttyS0 as the serial console - if you were, my > understanding of this issue would indicate that you should get the > garbage immediately after the line starting "Serial:" > > Either my understanding of the cause of this problem is wrong, or > I'm not understanding your setup. I have a number of PowerEdge 2550 servers. All are setup with serial console on ttyS0 @ 9600. One group uses a (old) Portmaster device for console access, the other group uses a Cyclades device. Only those servers using the Portmaster device exhibit the garbage problem. The Cyclades group never displays garbage on boot. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 03:10:05PM +, Russell King wrote: > Doesn't matter. The problem is that dwmw2's NS16550A patch (from ages > ago) changes the prescaler setting for this device so we can use the > higher speed baud rates. This means any programmed divisor (programmed > at early serial console initialisation time) suddenly becomes wrong as > soon as we fiddle with the prescaler during normal UART initialisation > time. FWIW, I see the same thing here on some Dell Poweredge boxes: serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a NS16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A But intererstingly, on identical boxes, the garbage only appears on those hooked up to a PortMaster device - those using a Cyclades never display this problem. (???) Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 03:10:05PM +, Russell King wrote: Doesn't matter. The problem is that dwmw2's NS16550A patch (from ages ago) changes the prescaler setting for this device so we can use the higher speed baud rates. This means any programmed divisor (programmed at early serial console initialisation time) suddenly becomes wrong as soon as we fiddle with the prescaler during normal UART initialisation time. FWIW, I see the same thing here on some Dell Poweredge boxes: serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a NS16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A garbage But intererstingly, on identical boxes, the garbage only appears on those hooked up to a PortMaster device - those using a Cyclades never display this problem. (???) Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 04:37:29PM +, Russell King wrote: But intererstingly, on identical boxes, the garbage only appears on those hooked up to a PortMaster device - those using a Cyclades never display this problem. (???) Sorry, I don't understand your scenarios. Can you explain the circumstances under which you see corruption? From the kernel messages you've quoted above, I can only think that you're not using ttyS0 as the serial console - if you were, my understanding of this issue would indicate that you should get the garbage immediately after the line starting Serial: Either my understanding of the cause of this problem is wrong, or I'm not understanding your setup. I have a number of PowerEdge 2550 servers. All are setup with serial console on ttyS0 @ 9600. One group uses a (old) Portmaster device for console access, the other group uses a Cyclades device. Only those servers using the Portmaster device exhibit the garbage problem. The Cyclades group never displays garbage on boot. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Garbage on serial console after serial driver loads
On Sat, Mar 26, 2005 at 04:37:29PM +, Russell King wrote: serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled Garbage here Sorry -- I was relying upon my (flawed) memory of the bootup sequence, but sending you the contents from /var/log/dmesg. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dst cache overflow
On Tue, Mar 22, 2005 at 10:39:43AM +0200, [EMAIL PROTECTED] wrote: > > computer's main job is to be router on small LAN with 10 users and some > services like qmail, apache, proftpd, shoutcast, squid, and ices on slack > 10.1. Iptables and tc are used to limit bandwiwdth and the two bandwidthd > daemons are running on eth0 interface and all the time the cpu is used at > about 0.4% and additional 12% by ices when encoding mp3 on demand, and > the proccess ksoftirqd/0 randomally starts to use 100% of 0 cpu in normal > situation and one time when the ksoftirqd/0 became crazy i noticed dst > cache overflow messages in syslog but there are more of thies lines in > logs about 5 times in 10 days period There was a problem fixed in the handling of fragments which caused dst cache overflow in the 2.6.11-rc series. Are you still seeing dst cache overflow on 2.6.11? Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dst cache overflow
On Tue, Mar 22, 2005 at 10:39:43AM +0200, [EMAIL PROTECTED] wrote: computer's main job is to be router on small LAN with 10 users and some services like qmail, apache, proftpd, shoutcast, squid, and ices on slack 10.1. Iptables and tc are used to limit bandwiwdth and the two bandwidthd daemons are running on eth0 interface and all the time the cpu is used at about 0.4% and additional 12% by ices when encoding mp3 on demand, and the proccess ksoftirqd/0 randomally starts to use 100% of 0 cpu in normal situation and one time when the ksoftirqd/0 became crazy i noticed dst cache overflow messages in syslog but there are more of thies lines in logs about 5 times in 10 days period There was a problem fixed in the handling of fragments which caused dst cache overflow in the 2.6.11-rc series. Are you still seeing dst cache overflow on 2.6.11? Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dst cache overflow, again
On Mon, Feb 21, 2005 at 02:21:50PM +0100, Piotr Kowalczyk wrote: > Hi all, > > I'm suffering from destination cache overflow on router running kernel > 2.6.10. This wouldn't be anything special if not different numbers > reported by slabinfo and the real state. It's worth to mention that > there was no problems with old 2.4.x here. Use 2.6.11-rc4 -- this problem has been fixed there. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dst cache overflow, again
On Mon, Feb 21, 2005 at 02:21:50PM +0100, Piotr Kowalczyk wrote: Hi all, I'm suffering from destination cache overflow on router running kernel 2.6.10. This wouldn't be anything special if not different numbers reported by slabinfo and the real state. It's worth to mention that there was no problems with old 2.4.x here. Use 2.6.11-rc4 -- this problem has been fixed there. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Sun, Jan 30, 2005 at 06:01:46PM +, Russell King wrote: > > OTOH, if conntrack isn't loaded forwarded packet are never defragmented, > > so frag_list should be empty. So probably false alarm, sorry. > > I've just checked Phil's mails - both Phil and myself are using > netfilter on the troublesome boxen. > > Also, since FragCreates is zero, and this does mean that the frag_list > is not empty in all cases so far where ip_fragment() has been called. > (Reading the code, if frag_list was empty, we'd have to create some > fragments, which increments the FragCreates statistic.) The below testcase seems to illustrate the problem nicely -- ip_dst_cache grows but never shrinks: On gateway: iptables -I FORWARD -d 10.10.10.0/24 -j DROP On client: for i in `seq 1 254` ; do ping -s 1500 -c 5 -w 1 -f 10.10.10.$i ; done Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Sun, Jan 30, 2005 at 03:34:49PM +, Russell King wrote: > I think the case against the IPv4 fragmentation code is mounting. > However, without knowing what the expected conditions for this code, > (eg, are skbs on the fraglist supposed to have NULL skb->dst?) I'm > unable to progress this any further. However, I think it's quite > clear that there is something bad going on here. Interesting...the gateway which exhibits the problem fastest in my area does have a large number of fragmented UDP packets running through it, as shown by tcpdump 'ip[6:2] & 0x1fff != 0'. > Why many more people aren't seeing this I've no idea. Perhaps you (and I) experience more fragments than the average user??? Nice detective work! Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Fri, Jan 28, 2005 at 12:17:01AM +, Russell King wrote: > On Thu, Jan 27, 2005 at 12:33:26PM -0800, David S. Miller wrote: > > So they won't be listed in /proc/net/rt_cache (since they've been > > removed from the lookup table) but they will be accounted for in > > /proc/net/stat/rt_cache until the final release is done on the > > routing cache object and it can be completely freed up. > > > > Do you happen to be using IPV6 in any way by chance? > > Yes. Someone suggested this evening that there may have been a recent > change to do with some IPv6 refcounting which may have caused this > problem. Is that something you can confirm? FWIW, I do not use IPv6, and it is not compiled into the kernel. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Thu, Jan 27, 2005 at 07:25:04PM +, Russell King wrote: > Can you provide some details, eg kernel configuration, loaded modules > and a brief overview of any netfilter modules you may be using. > > Maybe we can work out what's common between our setups. Vanilla 2.6.10, though I've been seeing these problems since 2.6.8 or earlier. Netfilter running on all boxes, some utilizing SNAT, others not -- none using MASQ. This is from a box running no NAT at all, although has some other filter rules: # wc -l /proc/net/rt_cache ; grep dst_cache /proc/slabinfo 50 /proc/net/rt_cache ip_dst_cache 84285 84285 Also with uptime of 26 days. These boxes are all running the quagga OSPF daemon, but those that are lightly loaded are not exhibiting these problems. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Thu, Jan 27, 2005 at 04:49:18PM +, Russell King wrote: > so obviously the GC does appear to be working - as can be seen from the > number of entries in /proc/net/rt_cache. However, the number of objects > in the slab cache does grow day on day. About 4 days ago, it was only > about 600 active objects. Now it's more than twice that, and it'll > continue increasing until it hits 8192, where upon it's game over. I can confirm the behavior you are seeing -- does seem to be a leak somewhere. Below from a heavily used gateway with 26 days uptime: # wc -l /proc/net/rt_cache ; grep ip_dst /proc/slabinfo 12870 /proc/net/rt_cache ip_dst_cache 53327 57855 Eventually I get the dst_cache overflow errors and have to reboot. Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Thu, Jan 27, 2005 at 04:49:18PM +, Russell King wrote: so obviously the GC does appear to be working - as can be seen from the number of entries in /proc/net/rt_cache. However, the number of objects in the slab cache does grow day on day. About 4 days ago, it was only about 600 active objects. Now it's more than twice that, and it'll continue increasing until it hits 8192, where upon it's game over. I can confirm the behavior you are seeing -- does seem to be a leak somewhere. Below from a heavily used gateway with 26 days uptime: # wc -l /proc/net/rt_cache ; grep ip_dst /proc/slabinfo 12870 /proc/net/rt_cache ip_dst_cache 53327 57855 Eventually I get the dst_cache overflow errors and have to reboot. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Thu, Jan 27, 2005 at 07:25:04PM +, Russell King wrote: Can you provide some details, eg kernel configuration, loaded modules and a brief overview of any netfilter modules you may be using. Maybe we can work out what's common between our setups. Vanilla 2.6.10, though I've been seeing these problems since 2.6.8 or earlier. Netfilter running on all boxes, some utilizing SNAT, others not -- none using MASQ. This is from a box running no NAT at all, although has some other filter rules: # wc -l /proc/net/rt_cache ; grep dst_cache /proc/slabinfo 50 /proc/net/rt_cache ip_dst_cache 84285 84285 Also with uptime of 26 days. These boxes are all running the quagga OSPF daemon, but those that are lightly loaded are not exhibiting these problems. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
On Fri, Jan 28, 2005 at 12:17:01AM +, Russell King wrote: On Thu, Jan 27, 2005 at 12:33:26PM -0800, David S. Miller wrote: So they won't be listed in /proc/net/rt_cache (since they've been removed from the lookup table) but they will be accounted for in /proc/net/stat/rt_cache until the final release is done on the routing cache object and it can be completely freed up. Do you happen to be using IPV6 in any way by chance? Yes. Someone suggested this evening that there may have been a recent change to do with some IPv6 refcounting which may have caused this problem. Is that something you can confirm? FWIW, I do not use IPv6, and it is not compiled into the kernel. Phil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NFS caching issues on 2.4
I've stumbled upon a wierd NFS caching issue on 2.4 which does not seem to exist in 2.2. Our www docroot is NFS mounted on a NetApp 760. We have a cron job which make changes to the index.html every 5 minutes. Recently, we upgraded all the web servers to 2.4 and noticed that there were big delays in seeing these 5 minute updates. Yet, an ls -l in the nfs directory on each of the servers clearly shows the new timestamp. However, a cat of the file shows that it is the old version (sometimes up to 1 hour old). I was using NFSv3, so decided to try NFSv2, but got the same results. I reverted to 2.2.19 on the boxes (which are RedHat 7.1 incidentally), and these problems went away. Any ideas why this is happening? -Phil Oester - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NFS caching issues on 2.4
I've stumbled upon a wierd NFS caching issue on 2.4 which does not seem to exist in 2.2. Our www docroot is NFS mounted on a NetApp 760. We have a cron job which make changes to the index.html every 5 minutes. Recently, we upgraded all the web servers to 2.4 and noticed that there were big delays in seeing these 5 minute updates. Yet, an ls -l in the nfs directory on each of the servers clearly shows the new timestamp. However, a cat of the file shows that it is the old version (sometimes up to 1 hour old). I was using NFSv3, so decided to try NFSv2, but got the same results. I reverted to 2.2.19 on the boxes (which are RedHat 7.1 incidentally), and these problems went away. Any ideas why this is happening? -Phil Oester - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.4.5aa1
Works for me. Running cerberus on a machine w/2gb RAM - deadlocks on 2.4.5 vanilla, keeps running on 2.4.5aa1. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Andrea Arcangeli Sent: Saturday, May 26, 2001 10:33 AM To: [EMAIL PROTECTED] Subject: 2.4.5aa1 I merged Rik's three liner fix to alloc_pages for GFP_BUFFER, plus my other fix in create_buffers wait_event and a bit bigger reserved pool of async bh. I'd suggest to test if this makes the highmem deadlock to go away. Detailed description of 2.4.5aa1 follows. --- 00_alpha-illegal-irq-1 Be verbose for MAX_ILLEGAL_IRQS times if an invalid irq number is getting run. (debugging) 00_boot-serial-console-1 Allows the serial console to work anytime during boot. It may have side effects but certainly nothing relevant and the current situation was annoying enough. (nice to have) 00_create_buffers-deadlock-1 Fix tasks possibly deadlocking in wait_event because the unused_list isn't refilled at I/O completion anymore with the 2.4 pagecache(/swapcache) design. (recommended) 00_eepro100-64bit-1 Fixes a 64bit bug that was generating false positives and memory corruption. (recommended) 00_eepro100-alpha-1 Possibly fix the eepro100 transmitter hang on alpha by doing atomic PIO updates to avoid the clear_suspend to be lost. (recommended) 00_gfp_buffer-alloc-pages-deadlock-1 Fix from Rik that avoids GFP_BUFFER to deadlock inside alloc_pages(). This is by no means a definitive fix for the VM deadlocks during oom, all other __GFP_IO allocations can still dealdock inside alloc_pages() like before. But it is a first step in the right direction I think. Please try to beat 2.4.5aa1 hard and see if you can reproduce deadlocks with highmem. (recommended) 00_cachelinealigned-in-smp-1 Moves the pagecache_lock and the VM pagemap_lru_lock in two different L1 cachelines to avoid contention, mostly useful on the alpha where the spinlocks uses load locked store conditional loops (and we don't want to loop). (nice to have) 00_copy-user-lat-2 Put the rechedule points into copy-user calls, with lots of cache large read/writes could otherwise _never_ reschedule once until they returns to userspace. (recommended) 00_cpus_allowed-1 Fixes a bug in the cpu affinity in-kernel API, bug was fatal for ksoftirqd. (recommended) 00_double-buffer-pass-1 Avoids looping two times for no good reason into the lru lists of the buffer cache (the double loop was an unreliable hack from the prehistory that survided 'till today). (nice to have) 00_exception-table-1 Avoids a compilation warning when compiling without modules. (very minor thing) 00_highmem-deadlock-3 Fixes an highmem deadlock using a reserved pool for the bounce buffers. (recommended) 00_highmem-debug-1 Allows people with x86 machines with less than 1G of ram to test the highmem code. (debugging) 00_ia32-bootmem-corruption-1 Fixes the x86 boot stage to finish initializing all the reserved memory before starting allocating memory. (recommended) 00_ipv6-null-oops-1 Fixes null pointer oops. (recommended) 00_jens-loop-noop-nobounce-1 Skips the bounces with the null transfer function. (nice to have) 00_ksoftirqd-4 Avoids 1/HZ latency for the softirq if the softirq is marked again pending when do_softirq() finished and the machine is otherwise idle, it also fixes the case of a softirq re-marking itself runnable by delegating to the scheduler the balance of the softirq load like if it would be an normal task. (nice to have) 00_kupdate-large-interval-1 Allows to set large interval for the kupdate runs, this is useful on the laptops, instead of sigstopping ksoftirqd it's nicer to set a large interval for example of the order of one hour (do that at your own risk of course, doing that is not recommended unless you know what you're doing). (nice to have) 00_lvm-0.9.1_beta7-4 Updates to the lvmbeta7 with fixes for the lv hardsectsize estimantion based on the max hardsectsize of the underlying pv, plus it has some other tons of fixes and it is a must have for the 64bit archs as the IOP silenty changed for those platforms. (recommended) 00_max_readahead-1 Increases the max_readahead to allow the blkdev to read with 512k scsi commands when possible. (nice to have) 00_msync-fb0-1
RE: 2.4.5aa1
Works for me. Running cerberus on a machine w/2gb RAM - deadlocks on 2.4.5 vanilla, keeps running on 2.4.5aa1. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Andrea Arcangeli Sent: Saturday, May 26, 2001 10:33 AM To: [EMAIL PROTECTED] Subject: 2.4.5aa1 I merged Rik's three liner fix to alloc_pages for GFP_BUFFER, plus my other fix in create_buffers wait_event and a bit bigger reserved pool of async bh. I'd suggest to test if this makes the highmem deadlock to go away. Detailed description of 2.4.5aa1 follows. --- 00_alpha-illegal-irq-1 Be verbose for MAX_ILLEGAL_IRQS times if an invalid irq number is getting run. (debugging) 00_boot-serial-console-1 Allows the serial console to work anytime during boot. It may have side effects but certainly nothing relevant and the current situation was annoying enough. (nice to have) 00_create_buffers-deadlock-1 Fix tasks possibly deadlocking in wait_event because the unused_list isn't refilled at I/O completion anymore with the 2.4 pagecache(/swapcache) design. (recommended) 00_eepro100-64bit-1 Fixes a 64bit bug that was generating false positives and memory corruption. (recommended) 00_eepro100-alpha-1 Possibly fix the eepro100 transmitter hang on alpha by doing atomic PIO updates to avoid the clear_suspend to be lost. (recommended) 00_gfp_buffer-alloc-pages-deadlock-1 Fix from Rik that avoids GFP_BUFFER to deadlock inside alloc_pages(). This is by no means a definitive fix for the VM deadlocks during oom, all other __GFP_IO allocations can still dealdock inside alloc_pages() like before. But it is a first step in the right direction I think. Please try to beat 2.4.5aa1 hard and see if you can reproduce deadlocks with highmem. (recommended) 00_cachelinealigned-in-smp-1 Moves the pagecache_lock and the VM pagemap_lru_lock in two different L1 cachelines to avoid contention, mostly useful on the alpha where the spinlocks uses load locked store conditional loops (and we don't want to loop). (nice to have) 00_copy-user-lat-2 Put the rechedule points into copy-user calls, with lots of cache large read/writes could otherwise _never_ reschedule once until they returns to userspace. (recommended) 00_cpus_allowed-1 Fixes a bug in the cpu affinity in-kernel API, bug was fatal for ksoftirqd. (recommended) 00_double-buffer-pass-1 Avoids looping two times for no good reason into the lru lists of the buffer cache (the double loop was an unreliable hack from the prehistory that survided 'till today). (nice to have) 00_exception-table-1 Avoids a compilation warning when compiling without modules. (very minor thing) 00_highmem-deadlock-3 Fixes an highmem deadlock using a reserved pool for the bounce buffers. (recommended) 00_highmem-debug-1 Allows people with x86 machines with less than 1G of ram to test the highmem code. (debugging) 00_ia32-bootmem-corruption-1 Fixes the x86 boot stage to finish initializing all the reserved memory before starting allocating memory. (recommended) 00_ipv6-null-oops-1 Fixes null pointer oops. (recommended) 00_jens-loop-noop-nobounce-1 Skips the bounces with the null transfer function. (nice to have) 00_ksoftirqd-4 Avoids 1/HZ latency for the softirq if the softirq is marked again pending when do_softirq() finished and the machine is otherwise idle, it also fixes the case of a softirq re-marking itself runnable by delegating to the scheduler the balance of the softirq load like if it would be an normal task. (nice to have) 00_kupdate-large-interval-1 Allows to set large interval for the kupdate runs, this is useful on the laptops, instead of sigstopping ksoftirqd it's nicer to set a large interval for example of the order of one hour (do that at your own risk of course, doing that is not recommended unless you know what you're doing). (nice to have) 00_lvm-0.9.1_beta7-4 Updates to the lvmbeta7 with fixes for the lv hardsectsize estimantion based on the max hardsectsize of the underlying pv, plus it has some other tons of fixes and it is a must have for the 64bit archs as the IOP silenty changed for those platforms. (recommended) 00_max_readahead-1 Increases the max_readahead to allow the blkdev to read with 512k scsi commands when possible. (nice to have) 00_msync-fb0-1
[PATCH] Allow 'hidden' interfaces in 2.4.x
The attached patch (against 2.4.4-ac10) adds the /proc/sys/net/ipv4/conf/*/hidden option which is present in 2.2.x series. This is somewhat similar to the arp-filter functionality which was added in ~2.4.4-ac10. The difference is that this is not dependent upon the routing table, it is simply configured using proc fs. This is particularly useful in load-balanced server farms where loopback addresses are configured for direct client-server traffic. Without this patch, Linux will respond to arp requests for the virtual IPs, making effective load balancing difficult. -Phil Oester diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/Documentation/filesystems/proc.txt linux-2.4.4-ac10-hidden/Documentation/filesystems/proc.txt --- linux-2.4.4-ac10/Documentation/filesystems/proc.txt Fri Apr 6 13:42:48 2001 +++ linux-2.4.4-ac10-hidden/Documentation/filesystems/proc.txt Thu May 17 15:01:45 2001 @@ -1578,6 +1578,17 @@ Determines whether to send ICMP redirects to other hosts. +hidden +-- + +Hide addresses attached to this device from other devices. +Such addresses will never be selected by source address autoselection +mechanism. Also, host will not answer broadcast ARP requests for them and +will not announce it as source address of ARP requests. The addresses are, +however, still reachable via IP. This is primarily useful in load-balancing +environments. This flag is activated only if it is enabled both in specific +device section and in "all" section. + Routing settings diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/Documentation/networking/ip-sysctl.txt linux-2.4.4-ac10-hidden/Documentation/networking/ip-sysctl.txt --- linux-2.4.4-ac10/Documentation/networking/ip-sysctl.txt Thu May 17 14:53:02 2001 +++ linux-2.4.4-ac10-hidden/Documentation/networking/ip-sysctl.txt Thu May 17 15:02:47 2001 @@ -392,6 +392,15 @@ Default value is 0. Note that some distributions enable it in startip scripts. +hidden - BOOLEAN + Hide addresses attached to this device from other devices. + Such addresses will never be selected by source address autoselection + mechanism. Also, host will not answer broadcast ARP requests for them and + will not announce it as source address of ARP requests. The addresses are, + however, still reachable via IP. This is primarily useful in load-balancing + environments. This flag is activated only if it is enabled both in specific + device section and in "all" section. + Alexey Kuznetsov. [EMAIL PROTECTED] diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/include/linux/inetdevice.h linux-2.4.4-ac10-hidden/include/linux/inetdevice.h --- linux-2.4.4-ac10/include/linux/inetdevice.h Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/include/linux/inetdevice.h Thu May 17 14:30:25 2001 @@ -18,6 +18,7 @@ int mc_forwarding; int tag; int arp_filter; + int hidden; void*sysctl; }; @@ -44,6 +45,7 @@ #define IN_DEV_LOG_MARTIANS(in_dev)(ipv4_devconf.log_martians || (in_dev)->cnf.log_martians) #define IN_DEV_PROXY_ARP(in_dev) (ipv4_devconf.proxy_arp || (in_dev)->cnf.proxy_arp) +#define IN_DEV_HIDDEN(in_dev) ((in_dev)->cnf.hidden && ipv4_devconf.hidden) #define IN_DEV_SHARED_MEDIA(in_dev)(ipv4_devconf.shared_media || (in_dev)->cnf.shared_media) #define IN_DEV_TX_REDIRECTS(in_dev)(ipv4_devconf.send_redirects || (in_dev)->cnf.send_redirects) #define IN_DEV_SEC_REDIRECTS(in_dev) (ipv4_devconf.secure_redirects || (in_dev)->cnf.secure_redirects) diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/include/linux/sysctl.h linux-2.4.4-ac10-hidden/include/linux/sysctl.h --- linux-2.4.4-ac10/include/linux/sysctl.h Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/include/linux/sysctl.h Thu May 17 14:32:40 2001 @@ -326,7 +326,8 @@ NET_IPV4_CONF_BOOTP_RELAY=10, NET_IPV4_CONF_LOG_MARTIANS=11, NET_IPV4_CONF_TAG=12, - NET_IPV4_CONF_ARPFILTER=13 + NET_IPV4_CONF_ARPFILTER=13, + NET_IPV4_CONF_HIDDEN=14 }; /* /proc/sys/net/ipv6 */ diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/net/ipv4/arp.c linux-2.4.4-ac10-hidden/net/ipv4/arp.c --- linux-2.4.4-ac10/net/ipv4/arp.c Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/net/ipv4/arp.c Thu May 17 14:47:44 2001 @@ -66,7 +66,9 @@ * Alexey Kuznetsov: new arp state machine; * now it is in net/core/neighbour.c. * Krzysztof Halasa: Added Frame Relay ARP support. - */ + * Julian Anastasov: "hidden" flag: hide the + * interface and don't reply for it +*/ #include #include @@ -317,12 +319,23 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) { u32 saddr; + int from_skb; + struct in_device *in_dev2 = NULL; + st
[PATCH] Allow 'hidden' interfaces in 2.4.x
The attached patch (against 2.4.4-ac10) adds the /proc/sys/net/ipv4/conf/*/hidden option which is present in 2.2.x series. This is somewhat similar to the arp-filter functionality which was added in ~2.4.4-ac10. The difference is that this is not dependent upon the routing table, it is simply configured using proc fs. This is particularly useful in load-balanced server farms where loopback addresses are configured for direct client-server traffic. Without this patch, Linux will respond to arp requests for the virtual IPs, making effective load balancing difficult. -Phil Oester diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/Documentation/filesystems/proc.txt linux-2.4.4-ac10-hidden/Documentation/filesystems/proc.txt --- linux-2.4.4-ac10/Documentation/filesystems/proc.txt Fri Apr 6 13:42:48 2001 +++ linux-2.4.4-ac10-hidden/Documentation/filesystems/proc.txt Thu May 17 15:01:45 2001 @@ -1578,6 +1578,17 @@ Determines whether to send ICMP redirects to other hosts. +hidden +-- + +Hide addresses attached to this device from other devices. +Such addresses will never be selected by source address autoselection +mechanism. Also, host will not answer broadcast ARP requests for them and +will not announce it as source address of ARP requests. The addresses are, +however, still reachable via IP. This is primarily useful in load-balancing +environments. This flag is activated only if it is enabled both in specific +device section and in all section. + Routing settings diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/Documentation/networking/ip-sysctl.txt linux-2.4.4-ac10-hidden/Documentation/networking/ip-sysctl.txt --- linux-2.4.4-ac10/Documentation/networking/ip-sysctl.txt Thu May 17 14:53:02 2001 +++ linux-2.4.4-ac10-hidden/Documentation/networking/ip-sysctl.txt Thu May 17 15:02:47 2001 @@ -392,6 +392,15 @@ Default value is 0. Note that some distributions enable it in startip scripts. +hidden - BOOLEAN + Hide addresses attached to this device from other devices. + Such addresses will never be selected by source address autoselection + mechanism. Also, host will not answer broadcast ARP requests for them and + will not announce it as source address of ARP requests. The addresses are, + however, still reachable via IP. This is primarily useful in load-balancing + environments. This flag is activated only if it is enabled both in specific + device section and in all section. + Alexey Kuznetsov. [EMAIL PROTECTED] diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/include/linux/inetdevice.h linux-2.4.4-ac10-hidden/include/linux/inetdevice.h --- linux-2.4.4-ac10/include/linux/inetdevice.h Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/include/linux/inetdevice.h Thu May 17 14:30:25 2001 @@ -18,6 +18,7 @@ int mc_forwarding; int tag; int arp_filter; + int hidden; void*sysctl; }; @@ -44,6 +45,7 @@ #define IN_DEV_LOG_MARTIANS(in_dev)(ipv4_devconf.log_martians || (in_dev)-cnf.log_martians) #define IN_DEV_PROXY_ARP(in_dev) (ipv4_devconf.proxy_arp || (in_dev)-cnf.proxy_arp) +#define IN_DEV_HIDDEN(in_dev) ((in_dev)-cnf.hidden ipv4_devconf.hidden) #define IN_DEV_SHARED_MEDIA(in_dev)(ipv4_devconf.shared_media || (in_dev)-cnf.shared_media) #define IN_DEV_TX_REDIRECTS(in_dev)(ipv4_devconf.send_redirects || (in_dev)-cnf.send_redirects) #define IN_DEV_SEC_REDIRECTS(in_dev) (ipv4_devconf.secure_redirects || (in_dev)-cnf.secure_redirects) diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/include/linux/sysctl.h linux-2.4.4-ac10-hidden/include/linux/sysctl.h --- linux-2.4.4-ac10/include/linux/sysctl.h Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/include/linux/sysctl.h Thu May 17 14:32:40 2001 @@ -326,7 +326,8 @@ NET_IPV4_CONF_BOOTP_RELAY=10, NET_IPV4_CONF_LOG_MARTIANS=11, NET_IPV4_CONF_TAG=12, - NET_IPV4_CONF_ARPFILTER=13 + NET_IPV4_CONF_ARPFILTER=13, + NET_IPV4_CONF_HIDDEN=14 }; /* /proc/sys/net/ipv6 */ diff -r -u -x *~ -x *.rej linux-2.4.4-ac10/net/ipv4/arp.c linux-2.4.4-ac10-hidden/net/ipv4/arp.c --- linux-2.4.4-ac10/net/ipv4/arp.c Thu May 17 14:53:07 2001 +++ linux-2.4.4-ac10-hidden/net/ipv4/arp.c Thu May 17 14:47:44 2001 @@ -66,7 +66,9 @@ * Alexey Kuznetsov: new arp state machine; * now it is in net/core/neighbour.c. * Krzysztof Halasa: Added Frame Relay ARP support. - */ + * Julian Anastasov: hidden flag: hide the + * interface and don't reply for it +*/ #include linux/types.h #include linux/string.h @@ -317,12 +319,23 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) { u32 saddr; + int from_skb; + struct in_device *in_dev2 = NULL; + struct net_device *dev2 = NULL; u8 *dst_ha = NULL
Complete support for Intel 815 chipset?
This may not matter in terms of performance, but many devices on Intel 815 chipset machines show up as unknown. Any ideas when (or if) full support for the 815 is planned? -Phil Output of lspci: 00:00.0 Host bridge: Intel Corporation 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02) 00:02.0 VGA compatible controller: Intel Corporation 82815 CGC [Chipset Graphics Controller] (rev 02) 00:1e.0 PCI bridge: Intel Corporation: Unknown device 244e (rev 02) 00:1f.0 ISA bridge: Intel Corporation: Unknown device 2440 (rev 02) 00:1f.1 IDE interface: Intel Corporation: Unknown device 244b (rev 02) 00:1f.2 USB Controller: Intel Corporation: Unknown device 2442 (rev 02) 00:1f.3 SMBus: Intel Corporation: Unknown device 2443 (rev 02) 00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 (rev 02) 01:08.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Complete support for Intel 815 chipset?
This may not matter in terms of performance, but many devices on Intel 815 chipset machines show up as unknown. Any ideas when (or if) full support for the 815 is planned? -Phil Output of lspci: 00:00.0 Host bridge: Intel Corporation 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02) 00:02.0 VGA compatible controller: Intel Corporation 82815 CGC [Chipset Graphics Controller] (rev 02) 00:1e.0 PCI bridge: Intel Corporation: Unknown device 244e (rev 02) 00:1f.0 ISA bridge: Intel Corporation: Unknown device 2440 (rev 02) 00:1f.1 IDE interface: Intel Corporation: Unknown device 244b (rev 02) 00:1f.2 USB Controller: Intel Corporation: Unknown device 2442 (rev 02) 00:1f.3 SMBus: Intel Corporation: Unknown device 2443 (rev 02) 00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 (rev 02) 01:08.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Process start times moving in reverse on 2.4.x
I've been having continual unexplained lockup problems since converting one of my outgoing qmail servers to 2.4.x. This has been discussed before on this list, where the symptoms are that anything typed on console takes forever to actually come up, and after a few minutes the machine is so unresponsive it requires a powercycle. Noticed that when this box is in its state of unresponsiveness, the process start times in ps gradually move backwards. The following listings were taken over about a 1.5 hour timespan. First root 1 0 0 11:04 ?00:00:09 init root 2 1 0 11:04 ?00:00:00 [keventd] root 3 1 0 11:04 ?00:00:00 [kswapd] root 4 1 0 11:04 ?00:00:00 [kreclaimd] root 5 1 0 11:04 ?00:00:00 [bdflush] root 6 1 0 11:04 ?00:00:02 [kupdated] root96 1 0 11:06 ?00:00:00 [kreiserfsd] root 356 1 0 11:06 ?00:00:02 syslogd -m 0 root 366 1 0 11:06 ?00:00:00 klogd Second root 1 0 0 10:54 ?00:00:09 init root 2 1 0 10:54 ?00:00:00 [keventd] root 3 1 0 10:54 ?00:00:00 [kswapd] root 4 1 0 10:54 ?00:00:00 [kreclaimd] root 5 1 0 10:54 ?00:00:00 [bdflush] root 6 1 0 10:54 ?00:00:02 [kupdated] root96 1 0 10:56 ?00:00:00 [kreiserfsd] root 356 1 0 10:56 ?00:00:02 syslogd -m 0 root 366 1 0 10:56 ?00:00:00 klogd Third root 1 0 0 10:03 ?00:00:09 init root 2 1 0 10:03 ?00:00:00 [keventd] root 3 1 0 10:03 ?00:00:00 [kswapd] root 4 1 0 10:03 ?00:00:00 [kreclaimd] root 5 1 0 10:03 ?00:00:00 [bdflush] root 6 1 0 10:03 ?00:00:02 [kupdated] root96 1 0 10:06 ?00:00:00 [kreiserfsd] root 356 1 0 10:06 ?00:00:02 syslogd -m 0 root 366 1 0 10:06 ?00:00:00 klogd Fourth root 1 0 0 09:53 ?00:00:09 init root 2 1 0 09:53 ?00:00:00 [keventd] root 3 1 0 09:53 ?00:00:00 [kswapd] root 4 1 0 09:53 ?00:00:00 [kreclaimd] root 5 1 0 09:53 ?00:00:00 [bdflush] root 6 1 0 09:53 ?00:00:02 [kupdated] root96 1 0 09:55 ?00:00:00 [kreiserfsd] root 356 1 0 09:55 ?00:00:02 syslogd -m 0 root 366 1 0 09:55 ?00:00:00 klogd Thoughts? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Process start times moving in reverse on 2.4.x
I've been having continual unexplained lockup problems since converting one of my outgoing qmail servers to 2.4.x. This has been discussed before on this list, where the symptoms are that anything typed on console takes forever to actually come up, and after a few minutes the machine is so unresponsive it requires a powercycle. Noticed that when this box is in its state of unresponsiveness, the process start times in ps gradually move backwards. The following listings were taken over about a 1.5 hour timespan. First root 1 0 0 11:04 ?00:00:09 init root 2 1 0 11:04 ?00:00:00 [keventd] root 3 1 0 11:04 ?00:00:00 [kswapd] root 4 1 0 11:04 ?00:00:00 [kreclaimd] root 5 1 0 11:04 ?00:00:00 [bdflush] root 6 1 0 11:04 ?00:00:02 [kupdated] root96 1 0 11:06 ?00:00:00 [kreiserfsd] root 356 1 0 11:06 ?00:00:02 syslogd -m 0 root 366 1 0 11:06 ?00:00:00 klogd Second root 1 0 0 10:54 ?00:00:09 init root 2 1 0 10:54 ?00:00:00 [keventd] root 3 1 0 10:54 ?00:00:00 [kswapd] root 4 1 0 10:54 ?00:00:00 [kreclaimd] root 5 1 0 10:54 ?00:00:00 [bdflush] root 6 1 0 10:54 ?00:00:02 [kupdated] root96 1 0 10:56 ?00:00:00 [kreiserfsd] root 356 1 0 10:56 ?00:00:02 syslogd -m 0 root 366 1 0 10:56 ?00:00:00 klogd Third root 1 0 0 10:03 ?00:00:09 init root 2 1 0 10:03 ?00:00:00 [keventd] root 3 1 0 10:03 ?00:00:00 [kswapd] root 4 1 0 10:03 ?00:00:00 [kreclaimd] root 5 1 0 10:03 ?00:00:00 [bdflush] root 6 1 0 10:03 ?00:00:02 [kupdated] root96 1 0 10:06 ?00:00:00 [kreiserfsd] root 356 1 0 10:06 ?00:00:02 syslogd -m 0 root 366 1 0 10:06 ?00:00:00 klogd Fourth root 1 0 0 09:53 ?00:00:09 init root 2 1 0 09:53 ?00:00:00 [keventd] root 3 1 0 09:53 ?00:00:00 [kswapd] root 4 1 0 09:53 ?00:00:00 [kreclaimd] root 5 1 0 09:53 ?00:00:00 [bdflush] root 6 1 0 09:53 ?00:00:02 [kupdated] root96 1 0 09:55 ?00:00:00 [kreiserfsd] root 356 1 0 09:55 ?00:00:02 syslogd -m 0 root 366 1 0 09:55 ?00:00:00 klogd Thoughts? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: kswapd, kupdated, and bdflush at 99% under intense IO
I've seen similar 'unresponsiveness' running 2.4.3-ac2 on a Qmail server. The hardware is dual-processor PIII 650 w/1GB of RAM. SCSI is sym53c895 with dual Quantum 9gb drives. Any time I start injecting lots of mail into the qmail queue, *one* of the two processors gets pegged at 99%, and it takes forever for anything typed at the console to actually appear (just as you describe). But I don't see any particular user process in top using a great deal of cpu - just the system itself. In my case, however, I usually have to powercycle the box to get it back - it totally dies. I've started the kernel with profile=2, and had a cron job running every minute to capture a readprofile -r; sleep 10; readprofile, but when the processor pegs, the cron jobs just stop without catching any useful information before the freeze. The interesting thing is, the box still responds to pings at this time, even though it goes hours without any profile captures. Upon powercycling, the qmail partition is loaded with thousands of errors - which could be caused by the power cycling, or by something kernel related. In the meantime, I've had to revert to 2.2.19 any time I do intense mailings. -Phil Oester -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Lessem Sent: Tuesday, April 10, 2001 1:01 PM To: [EMAIL PROTECTED] Subject: kswapd, kupdated, and bdflush at 99% under intense IO My machine is an 8 processor Dell P-III 700Mhz with 8GB of memory. The disk system I am using is a 12 drawer JBOD with 5 disks in a raid 5 arrangement attached to an AMI Megaraid 438/466/467/471/493 controller with a total of 145GB of space. The machine has been in use for about 6 months doing primarily cpu and memory intensive scientific computing tasks. It has been very stable in this role and everybody involved has been pleased with its performance. Recently a decision was made to conglomerate people's home directories from around the network and put them all on this machine (hence the JBOD and RAID). These tests are all being done with Linux 2.4.3 + the bigpatch fix for knfsd and quotas. The rest of the OS is Debian unstable. Before moving the storage into production I am performing tests on it to gauge its stability. The first test I performed was a single bonnie++ -s 16096 instance, and the timing results are inline with what I would expect from fast SCSI disks. However, multiple instance of bonnie++ completely kill the machine. Once two or three bonnies are running kswapd, kupdated, and bdflush each jump to using 99% of a cpu and the machine becomes incredibly unresponsive. Even using a root shell at nice -20 it can take several minutes for "killall bonnie++" to appear after being typed and then run. After the bonnies are killed and kswapd, kupdated, and bdflush are given a minute or two to finish whatever they are doing, the machine becomes responsive again. I don't think the machine should be behaving like this. I certainly expect some slowdowns with that much IO, but the computer should still be resonably responsive, particularly because no system or user files that need to be accessed are on that channel of the SCSI controller. Any advice on approaching this problem would be appreciated. I will try my best to provide any debugging information that would be useful, but the machine is on another continent from myself, so without a serial console I have a hard time getting any information that doesn't make it into a logfile. -- Thanks, Jeff Lessem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: kswapd, kupdated, and bdflush at 99% under intense IO
I've seen similar 'unresponsiveness' running 2.4.3-ac2 on a Qmail server. The hardware is dual-processor PIII 650 w/1GB of RAM. SCSI is sym53c895 with dual Quantum 9gb drives. Any time I start injecting lots of mail into the qmail queue, *one* of the two processors gets pegged at 99%, and it takes forever for anything typed at the console to actually appear (just as you describe). But I don't see any particular user process in top using a great deal of cpu - just the system itself. In my case, however, I usually have to powercycle the box to get it back - it totally dies. I've started the kernel with profile=2, and had a cron job running every minute to capture a readprofile -r; sleep 10; readprofile, but when the processor pegs, the cron jobs just stop without catching any useful information before the freeze. The interesting thing is, the box still responds to pings at this time, even though it goes hours without any profile captures. Upon powercycling, the qmail partition is loaded with thousands of errors - which could be caused by the power cycling, or by something kernel related. In the meantime, I've had to revert to 2.2.19 any time I do intense mailings. -Phil Oester -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Lessem Sent: Tuesday, April 10, 2001 1:01 PM To: [EMAIL PROTECTED] Subject: kswapd, kupdated, and bdflush at 99% under intense IO My machine is an 8 processor Dell P-III 700Mhz with 8GB of memory. The disk system I am using is a 12 drawer JBOD with 5 disks in a raid 5 arrangement attached to an AMI Megaraid 438/466/467/471/493 controller with a total of 145GB of space. The machine has been in use for about 6 months doing primarily cpu and memory intensive scientific computing tasks. It has been very stable in this role and everybody involved has been pleased with its performance. Recently a decision was made to conglomerate people's home directories from around the network and put them all on this machine (hence the JBOD and RAID). These tests are all being done with Linux 2.4.3 + the bigpatch fix for knfsd and quotas. The rest of the OS is Debian unstable. Before moving the storage into production I am performing tests on it to gauge its stability. The first test I performed was a single bonnie++ -s 16096 instance, and the timing results are inline with what I would expect from fast SCSI disks. However, multiple instance of bonnie++ completely kill the machine. Once two or three bonnies are running kswapd, kupdated, and bdflush each jump to using 99% of a cpu and the machine becomes incredibly unresponsive. Even using a root shell at nice -20 it can take several minutes for "killall bonnie++" to appear after being typed and then run. After the bonnies are killed and kswapd, kupdated, and bdflush are given a minute or two to finish whatever they are doing, the machine becomes responsive again. I don't think the machine should be behaving like this. I certainly expect some slowdowns with that much IO, but the computer should still be resonably responsive, particularly because no system or user files that need to be accessed are on that channel of the SCSI controller. Any advice on approaching this problem would be appreciated. I will try my best to provide any debugging information that would be useful, but the machine is on another continent from myself, so without a serial console I have a hard time getting any information that doesn't make it into a logfile. -- Thanks, Jeff Lessem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Error compiling aic7xxx driver on 2.4.2-ac13
I actually had the problem with lack-of-lex also, but worked through that... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Alan Cox Sent: Tuesday, March 06, 2001 4:51 PM To: J . A . Magallon Cc: Phil Oester; [EMAIL PROTECTED] Subject: Re: Error compiling aic7xxx driver on 2.4.2-ac13 > Which distro is yours ? In my Mandrake 8.0beta there is no /usr/include/db. > Mdk offers the 3 db libs (db1, db2, db3), so I had to create a symlink > /usr/include/db3 -> /usr/include/db. > > Which is the standard path ? At least, Mdk and RH (Alan...) differ. Im not too worried about this right now since as Al Viro pointed out the libdb use is unneeded. The irony of all this was that the real concern Justin had and discussed with people was about lex/bison/yacc being available, and the problem has been db - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Error compiling aic7xxx driver on 2.4.2-ac13
one more try... anyone else get the following: make[5]: Entering directory `/usr/src/linux-2.4.2-ac13/drivers/scsi/aic7xxx/aicasm' lex -t aicasm_scan.l > aicasm_scan.c gcc -I/usr/include -ldb aicasm_gram.c aicasm_scan.c aicasm.c aicasm_symbol.c -o aicasm aicasm_symbol.c:39: db/db_185.h: No such file or directory make[5]: *** [aicasm] Error 1 make[5]: Leaving directory `/usr/src/linux-2.4.2-ac13/drivers/scsi/aic7xxx/aicasm' - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Error compiling aic7xxx driver on 2.4.2-ac13
anyone else get the following: make[5]: Entering directory ` - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Error compiling aic7xxx driver on 2.4.2-ac13
anyone else get the following: make[5]: Entering directory ` - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Error compiling aic7xxx driver on 2.4.2-ac13
one more try... anyone else get the following: make[5]: Entering directory `/usr/src/linux-2.4.2-ac13/drivers/scsi/aic7xxx/aicasm' lex -t aicasm_scan.l aicasm_scan.c gcc -I/usr/include -ldb aicasm_gram.c aicasm_scan.c aicasm.c aicasm_symbol.c -o aicasm aicasm_symbol.c:39: db/db_185.h: No such file or directory make[5]: *** [aicasm] Error 1 make[5]: Leaving directory `/usr/src/linux-2.4.2-ac13/drivers/scsi/aic7xxx/aicasm' - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Error compiling aic7xxx driver on 2.4.2-ac13
I actually had the problem with lack-of-lex also, but worked through that... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Alan Cox Sent: Tuesday, March 06, 2001 4:51 PM To: J . A . Magallon Cc: Phil Oester; [EMAIL PROTECTED] Subject: Re: Error compiling aic7xxx driver on 2.4.2-ac13 Which distro is yours ? In my Mandrake 8.0beta there is no /usr/include/db. Mdk offers the 3 db libs (db1, db2, db3), so I had to create a symlink /usr/include/db3 - /usr/include/db. Which is the standard path ? At least, Mdk and RH (Alan...) differ. Im not too worried about this right now since as Al Viro pointed out the libdb use is unneeded. The irony of all this was that the real concern Justin had and discussed with people was about lex/bison/yacc being available, and the problem has been db - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/