Re: CONFIG_HOLES_IN_ZONE and memory hot plug code on x86_64
On 27/08/15 22:20 , "yhlu.ker...@gmail.com on behalf of Yinghai Lu" wrote: >On Fri, Jun 26, 2015 at 4:31 PM, Steffen Persvold wrote: >> We’ve encountered an issue in a special case where we have a sparse E820 map >> [1]. >> >> Basically the memory hotplug code is causing a “kernel paging request” BUG >> [2]. > >the trace does not look like hotplug path. > >> >> By instrumenting the function register_mem_sect_under_node() in >> drivers/base/node.c we see that it is called two times with the same struct >> memory_block argument : >> >> [1.901463] register_mem_sect_under_node: start = 80, end = 8f, nid = 0 >> [1.908129] register_mem_sect_under_node: start = 80, end = 8f, nid = 1 > >Can you post whole log with SRAT related info? I can probably reproduce again and get full logs when I get run time on the system again, but here’s some output that we saved in our internal Jira case : [0.00] NUMA: Initialized distance table, cnt=6 [0.00] NUMA: Node 0 [mem 0x-0x0009] + [mem 0x0010-0xd7ff] -> [mem 0x-0xd7ff] [0.00] NUMA: Node 0 [mem 0x-0xd7ff] + [mem 0x1-0x427ff] -> [mem 0x-0x427ff] [0.00] NODE_DATA(0) allocated [mem 0x407fe3000-0x407ff] [0.00] NODE_DATA(1) allocated [mem 0x807fe3000-0x807ff] [0.00] NODE_DATA(2) allocated [mem 0xc07fe3000-0xc07ff] [0.00] NODE_DATA(3) allocated [mem 0x1007fe3000-0x1007ff] [0.00] NODE_DATA(4) allocated [mem 0x1407fe3000-0x1407ff] [0.00] NODE_DATA(5) allocated [mem 0x1807fdd000-0x1807ff9fff] [0.00] [ea00-ea00101f] PMD -> [8803f860-880407df] on node 0 [0.00] [ea0010a0-ea00201f] PMD -> [8807f860-880807df] on node 1 [0.00] [ea0020a0-ea00301f] PMD -> [880bf860-880c07df] on node 2 [0.00] [ea0030a0-ea00401f] PMD -> [880ff860-881007df] on node 3 [0.00] [ea0040a0-ea00501f] PMD -> [8813f860-881407df] on node 4 [0.00] [ea0050a0-ea00601f] PMD -> [8817f7e0-8818075f] on node 5 If I remember correctly there was a mix of 4GB and 8GB DIMMs populated on this system. In addition the firmware reserved 512MByte at the end of each memory controllers physical range (hence the reserved ranges in the e820 map). Note: this was with 4.1.0 vanilla so it could be obsolete now with 4.2-rc. I have not yet tested with your latest patches that you and Tony discussed. Cheers, Steffen N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: CONFIG_HOLES_IN_ZONE and memory hot plug code on x86_64
On 27/08/15 22:20 , yhlu.ker...@gmail.com on behalf of Yinghai Lu yhlu.ker...@gmail.com on behalf of ying...@kernel.org wrote: On Fri, Jun 26, 2015 at 4:31 PM, Steffen Persvold s...@numascale.com wrote: We’ve encountered an issue in a special case where we have a sparse E820 map [1]. Basically the memory hotplug code is causing a “kernel paging request” BUG [2]. the trace does not look like hotplug path. By instrumenting the function register_mem_sect_under_node() in drivers/base/node.c we see that it is called two times with the same struct memory_block argument : [1.901463] register_mem_sect_under_node: start = 80, end = 8f, nid = 0 [1.908129] register_mem_sect_under_node: start = 80, end = 8f, nid = 1 Can you post whole log with SRAT related info? I can probably reproduce again and get full logs when I get run time on the system again, but here’s some output that we saved in our internal Jira case : [0.00] NUMA: Initialized distance table, cnt=6 [0.00] NUMA: Node 0 [mem 0x-0x0009] + [mem 0x0010-0xd7ff] - [mem 0x-0xd7ff] [0.00] NUMA: Node 0 [mem 0x-0xd7ff] + [mem 0x1-0x427ff] - [mem 0x-0x427ff] [0.00] NODE_DATA(0) allocated [mem 0x407fe3000-0x407ff] [0.00] NODE_DATA(1) allocated [mem 0x807fe3000-0x807ff] [0.00] NODE_DATA(2) allocated [mem 0xc07fe3000-0xc07ff] [0.00] NODE_DATA(3) allocated [mem 0x1007fe3000-0x1007ff] [0.00] NODE_DATA(4) allocated [mem 0x1407fe3000-0x1407ff] [0.00] NODE_DATA(5) allocated [mem 0x1807fdd000-0x1807ff9fff] [0.00] [ea00-ea00101f] PMD - [8803f860-880407df] on node 0 [0.00] [ea0010a0-ea00201f] PMD - [8807f860-880807df] on node 1 [0.00] [ea0020a0-ea00301f] PMD - [880bf860-880c07df] on node 2 [0.00] [ea0030a0-ea00401f] PMD - [880ff860-881007df] on node 3 [0.00] [ea0040a0-ea00501f] PMD - [8813f860-881407df] on node 4 [0.00] [ea0050a0-ea00601f] PMD - [8817f7e0-8818075f] on node 5 If I remember correctly there was a mix of 4GB and 8GB DIMMs populated on this system. In addition the firmware reserved 512MByte at the end of each memory controllers physical range (hence the reserved ranges in the e820 map). Note: this was with 4.1.0 vanilla so it could be obsolete now with 4.2-rc. I have not yet tested with your latest patches that you and Tony discussed. Cheers, Steffen N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
CONFIG_HOLES_IN_ZONE and memory hot plug code on x86_64
Hi, We’ve encountered an issue in a special case where we have a sparse E820 map [1]. Basically the memory hotplug code is causing a “kernel paging request” BUG [2]. By instrumenting the function register_mem_sect_under_node() in drivers/base/node.c we see that it is called two times with the same struct memory_block argument : [1.901463] register_mem_sect_under_node: start = 80, end = 8f, nid = 0 [1.908129] register_mem_sect_under_node: start = 80, end = 8f, nid = 1 The second call is causing paging request because the for loop in register_mem_sect_under_node() is scanning pfns : for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) { and can’t find one that matches the input “nid” argument (1), which is natural enough because those sections does not belong to node1, but rather node0. This results in the for loop entering a “hole” in the pfn range which isn’t mapped. Now, the code appears to have been designed to handle this by checking if the pfn really belongs to this node with the the function get_nid_for_pfn() in the same file : static int get_nid_for_pfn(unsigned long pfn) { struct page *page; if (!pfn_valid_within(pfn)) return -1; page = pfn_to_page(pfn); if (!page_initialized(page)) return -1; return pfn_to_nid(pfn); } However, pfn_valid_within() (from include/linux/mmzone.h) is not getting a false return value because : /* * If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we * need to check pfn validility within that MAX_ORDER_NR_PAGES block. * pfn_valid_within() should be used in this case; we optimise this away * when we have no holes within a MAX_ORDER_NR_PAGES block. */ #ifdef CONFIG_HOLES_IN_ZONE #define pfn_valid_within(pfn) pfn_valid(pfn) #else #define pfn_valid_within(pfn) (1) #endif CONFIG_HOLES_IN_ZONE is not possible to set on x86_64, it is present only on ia64 and mips. Is there a specific reason why CONFIG_HOLES_IN_ZONE isn’t activated on x86_64 ? I’ve added a patch to arch/x86/Kconfig [3] which solves this issue, however I guess another approach would be to figure out why register_mem_sect_under_node() is called with a wrong struct memory_block for node1 Any comments or suggestions are welcome. PS: Even if we avoid the sparse e820 map, register_mem_sect_under_node() is still invoked twice with the same struct memory_block once for node0 (which gets a match) and once for node1. However when all the pfns are mapped, it just goes through the range just fine without a paging request. Cheers, -- Steffen Persvold Chief Architect NumaChip, Numascale AS Tel: +47 23 16 71 88 Fax: +47 23 16 71 80 Skype: spersvold [1] [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00087fff] usable [0.00] BIOS-e820: [mem 0x00088000-0x00089bff] reserved [0.00] BIOS-e820: [mem 0x00089c00-0x0009ebff] usable [0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e84e0-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7e5] usable [0.00] BIOS-e820: [mem 0xd7e6e000-0xd7e6] type 9 [0.00] BIOS-e820: [mem 0xd7e7-0xd7e93fff] ACPI data [0.00] BIOS-e820: [mem 0xd7e94000-0xd7eb] ACPI NVS [0.00] BIOS-e820: [mem 0xd7ec-0xd7ed] reserved [0.00] BIOS-e820: [mem 0xd7eed000-0xd7ff] reserved [0.00] BIOS-e820: [mem 0xe000-0xefff] reserved [0.00] BIOS-e820: [mem 0xffe0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x000407ff] usable [0.00] BIOS-e820: [mem 0x00040800-0x000427ff] reserved [0.00] BIOS-e820: [mem 0x00042800-0x000807ff] usable [0.00] BIOS-e820: [mem 0x00080800-0x000827ff] reserved [0.00] BIOS-e820: [mem 0x00082800-0x000c07ff] usable [0.00] BIOS-e820: [mem 0x000c0800-0x000c27ff] reserved [0.00] BIOS-e820: [mem 0x000c2800-0x001007ff] usable [0.00] BIOS-e820: [mem 0x00100800-0x001027ff] reserved [0.00] BIOS-e820: [mem 0x00102800-0x001407ff] usable [0.00] BIOS-e820: [mem 0x00140800-0x001427ff] reserved [0.00] BIOS-e820: [mem 0x00142800-0x001807ff] usable [0.00] BIOS-e820: [mem 0x00180800-0x001827ff] reserved [0.00] BIOS-e820: [mem 0x00fd-0x00ff] reserved [0.00] BIOS-e820: [mem 0x3f00-0x3fff] reserved [2] [1.915002] BUG: unable to handle kernel paging request at ea0010200020 [1.922
CONFIG_HOLES_IN_ZONE and memory hot plug code on x86_64
Hi, We’ve encountered an issue in a special case where we have a sparse E820 map [1]. Basically the memory hotplug code is causing a “kernel paging request” BUG [2]. By instrumenting the function register_mem_sect_under_node() in drivers/base/node.c we see that it is called two times with the same struct memory_block argument : [1.901463] register_mem_sect_under_node: start = 80, end = 8f, nid = 0 [1.908129] register_mem_sect_under_node: start = 80, end = 8f, nid = 1 The second call is causing paging request because the for loop in register_mem_sect_under_node() is scanning pfns : for (pfn = sect_start_pfn; pfn = sect_end_pfn; pfn++) { and can’t find one that matches the input “nid” argument (1), which is natural enough because those sections does not belong to node1, but rather node0. This results in the for loop entering a “hole” in the pfn range which isn’t mapped. Now, the code appears to have been designed to handle this by checking if the pfn really belongs to this node with the the function get_nid_for_pfn() in the same file : static int get_nid_for_pfn(unsigned long pfn) { struct page *page; if (!pfn_valid_within(pfn)) return -1; page = pfn_to_page(pfn); if (!page_initialized(page)) return -1; return pfn_to_nid(pfn); } However, pfn_valid_within() (from include/linux/mmzone.h) is not getting a false return value because : /* * If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we * need to check pfn validility within that MAX_ORDER_NR_PAGES block. * pfn_valid_within() should be used in this case; we optimise this away * when we have no holes within a MAX_ORDER_NR_PAGES block. */ #ifdef CONFIG_HOLES_IN_ZONE #define pfn_valid_within(pfn) pfn_valid(pfn) #else #define pfn_valid_within(pfn) (1) #endif CONFIG_HOLES_IN_ZONE is not possible to set on x86_64, it is present only on ia64 and mips. Is there a specific reason why CONFIG_HOLES_IN_ZONE isn’t activated on x86_64 ? I’ve added a patch to arch/x86/Kconfig [3] which solves this issue, however I guess another approach would be to figure out why register_mem_sect_under_node() is called with a wrong struct memory_block for node1 Any comments or suggestions are welcome. PS: Even if we avoid the sparse e820 map, register_mem_sect_under_node() is still invoked twice with the same struct memory_block once for node0 (which gets a match) and once for node1. However when all the pfns are mapped, it just goes through the range just fine without a paging request. Cheers, -- Steffen Persvold Chief Architect NumaChip, Numascale AS Tel: +47 23 16 71 88 Fax: +47 23 16 71 80 Skype: spersvold [1] [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00087fff] usable [0.00] BIOS-e820: [mem 0x00088000-0x00089bff] reserved [0.00] BIOS-e820: [mem 0x00089c00-0x0009ebff] usable [0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e84e0-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7e5] usable [0.00] BIOS-e820: [mem 0xd7e6e000-0xd7e6] type 9 [0.00] BIOS-e820: [mem 0xd7e7-0xd7e93fff] ACPI data [0.00] BIOS-e820: [mem 0xd7e94000-0xd7eb] ACPI NVS [0.00] BIOS-e820: [mem 0xd7ec-0xd7ed] reserved [0.00] BIOS-e820: [mem 0xd7eed000-0xd7ff] reserved [0.00] BIOS-e820: [mem 0xe000-0xefff] reserved [0.00] BIOS-e820: [mem 0xffe0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x000407ff] usable [0.00] BIOS-e820: [mem 0x00040800-0x000427ff] reserved [0.00] BIOS-e820: [mem 0x00042800-0x000807ff] usable [0.00] BIOS-e820: [mem 0x00080800-0x000827ff] reserved [0.00] BIOS-e820: [mem 0x00082800-0x000c07ff] usable [0.00] BIOS-e820: [mem 0x000c0800-0x000c27ff] reserved [0.00] BIOS-e820: [mem 0x000c2800-0x001007ff] usable [0.00] BIOS-e820: [mem 0x00100800-0x001027ff] reserved [0.00] BIOS-e820: [mem 0x00102800-0x001407ff] usable [0.00] BIOS-e820: [mem 0x00140800-0x001427ff] reserved [0.00] BIOS-e820: [mem 0x00142800-0x001807ff] usable [0.00] BIOS-e820: [mem 0x00180800-0x001827ff] reserved [0.00] BIOS-e820: [mem 0x00fd-0x00ff] reserved [0.00] BIOS-e820: [mem 0x3f00-0x3fff] reserved [2] [1.915002] BUG: unable to handle kernel paging request at ea0010200020 [1.922003
RFC: Additions to APIC driver
Hi, We’re preparing our APIC driver (arch/x86/kernel/apic/apic_numachip.c) with next-gen hardware support and in that process I have a question on what the cleanest approach would be. Both current generation and next generation chips will share a lot of similar code, but some of the core functionality is slightly different (such as the address to which you communicate with the APIC ICR to send IPIs, how to derive APIC IDs etc.). The way I see it, we have few alternatives : 1) Create a new arc/x86/kernel/apic/apic_numachip2.c (and corresponding entry in the Makefile) which has a new “struct apic” with function pointers to the next-gen specific code. The new APIC driver would still only need CONFIG_X86_NUMACHIP to be compiled. 2) Modify the existing apic_numachip.c to recognise the different HW generations (trivial) and use function pointers to differentiate the IPI send calls (among other things), but use the *same* “struct apic” for both (the function pointers referenced in “struct apic” would need a new indirection level to differentiate between hardware revs). 3) Have two different “struct apic” entries in the existing apic_numachip.c source file, with separate oem_madt check functions etc. This would only be marginally different than 1) as far as implementation and code duplication goes, but it would be contained to one C source file and object file (silly question, maybe: would the apic_driver enumeration even work if it’s all in the same object file?) Any insight into this from the great minds behind this would be highly appreciated. Kind regards, -- Steffen Persvold Chief Architect NumaChip, Numascale AS Tel: +47 23 16 71 88 Fax: +47 23 16 71 80 Skype: spersvold -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RFC: Additions to APIC driver
Hi, We’re preparing our APIC driver (arch/x86/kernel/apic/apic_numachip.c) with next-gen hardware support and in that process I have a question on what the cleanest approach would be. Both current generation and next generation chips will share a lot of similar code, but some of the core functionality is slightly different (such as the address to which you communicate with the APIC ICR to send IPIs, how to derive APIC IDs etc.). The way I see it, we have few alternatives : 1) Create a new arc/x86/kernel/apic/apic_numachip2.c (and corresponding entry in the Makefile) which has a new “struct apic” with function pointers to the next-gen specific code. The new APIC driver would still only need CONFIG_X86_NUMACHIP to be compiled. 2) Modify the existing apic_numachip.c to recognise the different HW generations (trivial) and use function pointers to differentiate the IPI send calls (among other things), but use the *same* “struct apic” for both (the function pointers referenced in “struct apic” would need a new indirection level to differentiate between hardware revs). 3) Have two different “struct apic” entries in the existing apic_numachip.c source file, with separate oem_madt check functions etc. This would only be marginally different than 1) as far as implementation and code duplication goes, but it would be contained to one C source file and object file (silly question, maybe: would the apic_driver enumeration even work if it’s all in the same object file?) Any insight into this from the great minds behind this would be highly appreciated. Kind regards, -- Steffen Persvold Chief Architect NumaChip, Numascale AS Tel: +47 23 16 71 88 Fax: +47 23 16 71 80 Skype: spersvold -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities
On 29 Apr 2014, at 3:20 , Borislav Petkov wrote: > On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote: >> I am sure, it's because some server systems had MMIO ECS access not >> enabled in BIOS. I can't remember which systems were affected. > > Ok, now AMD people: what's the story with IO ECS, can we assume that on > everything after F10h, BIOS has a sensible MCFG and we can limit this to > F10h only? I like Bjorn's idea but we need to make sure a working MCFG > is ubiquitous. > > Which begs the real question: Suravee, why are you even touching IO ECS > provided F15h and later have a MCFG? Or, do they? > Our experience with this is that Fam10h and later have a very well working MCFG setup, earlier generations not so much (hence IO ECS was needed). Cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities
On 29 Apr 2014, at 3:20 , Borislav Petkov b...@suse.de wrote: On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote: I am sure, it's because some server systems had MMIO ECS access not enabled in BIOS. I can't remember which systems were affected. Ok, now AMD people: what's the story with IO ECS, can we assume that on everything after F10h, BIOS has a sensible MCFG and we can limit this to F10h only? I like Bjorn's idea but we need to make sure a working MCFG is ubiquitous. Which begs the real question: Suravee, why are you even touching IO ECS provided F15h and later have a MCFG? Or, do they? Our experience with this is that Fam10h and later have a very well working MCFG setup, earlier generations not so much (hence IO ECS was needed). Cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/9/2013 12:24 PM, Borislav Petkov wrote: On Tue, Apr 09, 2013 at 11:45:44AM +0200, Steffen Persvold wrote: Hmm, yes of course. This of course breaks on our slave servers when the shared mechanism doesn't work properly (i.e NB not visible). Then all cores gets individual kobjects and there can be discrepancies between what the hardware is programmed to and what is reflected in /sys on some cores.. Hold on, are you saying you have cores with an invisible NB? How does that even work? Or is it only invisible to sw? only invisible to the kernel because the multi-pci-domains isn't working pre 3.9 on our architecture. cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/9/2013 11:38 AM, Borislav Petkov wrote: On Tue, Apr 09, 2013 at 11:25:16AM +0200, Steffen Persvold wrote: Why not let all cores just create their individual kobject and skip this "shared" nb->bank4 concept ? Any disadvantage to that (apart from the obvious storage bloat?). Well, bank4 is shared across cores on the northbridge in *hardware*. Well, yes I was aware of that :) So it is only logical to represent the hardware layout correctly in software. Also, if you want to configure any settings over one core's sysfs nodes, you want those to be visible across all cores automagically: Hmm, yes of course. This of course breaks on our slave servers when the shared mechanism doesn't work properly (i.e NB not visible). Then all cores gets individual kobjects and there can be discrepancies between what the hardware is programmed to and what is reflected in /sys on some cores.. Ok, we go with our first approach to not create MC4 at all if NB isn't visible. We'll redo the patch against the tip:x86/ras branch. Cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 9:07 PM, Borislav Petkov wrote: On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote: It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? No, actually *this* explanation should've been in the commit message. You numascale people do crazy things with the hardware :) so explaining yourself more verbosely is an absolute must if anyone is to understand why you're changing the code. Boris, A question came up. Why have this "shared" bank concept for the kobjects at all ? What's the advantage ? Before our patch, when running on our architecture but without pci domains for "slave" servers, everything was working fine except the de-allocation oops due to the NULL pointer when offlining cores. Why not let all cores just create their individual kobject and skip this "shared" nb->bank4 concept ? Any disadvantage to that (apart from the obvious storage bloat?). Cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 9:07 PM, Borislav Petkov wrote: On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote: It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? No, actually *this* explanation should've been in the commit message. You numascale people do crazy things with the hardware :) so explaining yourself more verbosely is an absolute must if anyone is to understand why you're changing the code. Boris, A question came up. Why have this shared bank concept for the kobjects at all ? What's the advantage ? Before our patch, when running on our architecture but without pci domains for slave servers, everything was working fine except the de-allocation oops due to the NULL pointer when offlining cores. Why not let all cores just create their individual kobject and skip this shared nb-bank4 concept ? Any disadvantage to that (apart from the obvious storage bloat?). Cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/9/2013 11:38 AM, Borislav Petkov wrote: On Tue, Apr 09, 2013 at 11:25:16AM +0200, Steffen Persvold wrote: Why not let all cores just create their individual kobject and skip this shared nb-bank4 concept ? Any disadvantage to that (apart from the obvious storage bloat?). Well, bank4 is shared across cores on the northbridge in *hardware*. Well, yes I was aware of that :) So it is only logical to represent the hardware layout correctly in software. Also, if you want to configure any settings over one core's sysfs nodes, you want those to be visible across all cores automagically: Hmm, yes of course. This of course breaks on our slave servers when the shared mechanism doesn't work properly (i.e NB not visible). Then all cores gets individual kobjects and there can be discrepancies between what the hardware is programmed to and what is reflected in /sys on some cores.. Ok, we go with our first approach to not create MC4 at all if NB isn't visible. We'll redo the patch against the tip:x86/ras branch. Cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/9/2013 12:24 PM, Borislav Petkov wrote: On Tue, Apr 09, 2013 at 11:45:44AM +0200, Steffen Persvold wrote: Hmm, yes of course. This of course breaks on our slave servers when the shared mechanism doesn't work properly (i.e NB not visible). Then all cores gets individual kobjects and there can be discrepancies between what the hardware is programmed to and what is reflected in /sys on some cores.. Hold on, are you saying you have cores with an invisible NB? How does that even work? Or is it only invisible to sw? only invisible to the kernel because the multi-pci-domains isn't working pre 3.9 on our architecture. cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 9:07 PM, Borislav Petkov wrote: On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote: It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? No, actually *this* explanation should've been in the commit message. You numascale people do crazy things with the hardware :) so explaining yourself more verbosely is an absolute must if anyone is to understand why you're changing the code. Ok :) So please write a detailed commit message why you need this change, don't be afraid to talk about the big picture. Will do. Also, I'm guessing this is urgent stuff and it needs to go into 3.9? Yes, no? If yes, this patch should probably be tagged for stable. Yes. We found the issue on -stable at first (3.8.2 iirc) because it doesn't have the multi-domain support we needed (which is added in 3.9). Also, please redo this patch against tip:x86/ras which already has patches touching mce_amd.c. Ok. Oh, and lastly, needless to say, it needs to be tested on a "normal", i.e. !numascale AMD multinode box, in case you haven't done so yet. :-) It has been tested on "normal" platforms and NumaConnect platforms (Fam10h and Fam15h AMD processors, SCM and MCM versions). Cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 6:13 PM, Borislav Petkov wrote: > On Thu, Apr 04, 2013 at 11:52:00PM +0800, Daniel J Blueman wrote: >> On platforms where all Northbridges may not be visible (due to routing, eg on >> NumaConnect systems), prevent oopsing due to stale pointer access when >> offlining cores. >> >> Signed-off-by: Steffen Persvold >> Signed-off-by: Daniel J Blueman > > Huh, what's up? > > This one is almost reverting 21c5e50e15b1a which you wrote in the first > place. What's happening? What stale pointer access, where? We have the > if (nb ..) guards there. > > This commit message needs a *lot* more explanation about what's going > on and why we're reverting 21c5e50e15b1a. And why the special handling > for shared banks? I presume you offline some of the cores and there's a > dangling pointer but again, there are the nb validity guards... > > /me is genuinely confused. > You get oopses when offlining cores when there's no NB struct for the shared MC4 bank. In threshold_remove_bank(), there's no "if (!nb)" guard : if (shared_bank[bank]) { if (!atomic_dec_and_test(>cpus)) { __threshold_remove_blocks(b); per_cpu(threshold_banks, cpu)[bank] = NULL; return; } else { /* * the last CPU on this node using the shared bank is * going away, remove that bank now. */ nb = node_to_amd_nb(amd_get_nb_id(cpu)); nb->bank4 = NULL; } } nb->bank4 = NULL will oops, since nb is NULL. It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? Cheers, Steffen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 6:13 PM, Borislav Petkov wrote: On Thu, Apr 04, 2013 at 11:52:00PM +0800, Daniel J Blueman wrote: On platforms where all Northbridges may not be visible (due to routing, eg on NumaConnect systems), prevent oopsing due to stale pointer access when offlining cores. Signed-off-by: Steffen Persvold s...@numascale.com Signed-off-by: Daniel J Blueman dan...@numascale-asia.com Huh, what's up? This one is almost reverting 21c5e50e15b1a which you wrote in the first place. What's happening? What stale pointer access, where? We have the if (nb ..) guards there. This commit message needs a *lot* more explanation about what's going on and why we're reverting 21c5e50e15b1a. And why the special handling for shared banks? I presume you offline some of the cores and there's a dangling pointer but again, there are the nb validity guards... /me is genuinely confused. You get oopses when offlining cores when there's no NB struct for the shared MC4 bank. In threshold_remove_bank(), there's no if (!nb) guard : if (shared_bank[bank]) { if (!atomic_dec_and_test(b-cpus)) { __threshold_remove_blocks(b); per_cpu(threshold_banks, cpu)[bank] = NULL; return; } else { /* * the last CPU on this node using the shared bank is * going away, remove that bank now. */ nb = node_to_amd_nb(amd_get_nb_id(cpu)); nb-bank4 = NULL; } } nb-bank4 = NULL will oops, since nb is NULL. It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? Cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
On 4/4/2013 9:07 PM, Borislav Petkov wrote: On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote: It made more sense (to me) to skip the creation of MC4 all together if you can't find the matching northbridge since you can't reliably do the dec_and_test() reference counting on the shared bank when you don't have the common NB struct for all the shared cores. Or am I just smoking the wrong stuff ? No, actually *this* explanation should've been in the commit message. You numascale people do crazy things with the hardware :) so explaining yourself more verbosely is an absolute must if anyone is to understand why you're changing the code. Ok :) So please write a detailed commit message why you need this change, don't be afraid to talk about the big picture. Will do. Also, I'm guessing this is urgent stuff and it needs to go into 3.9? Yes, no? If yes, this patch should probably be tagged for stable. Yes. We found the issue on -stable at first (3.8.2 iirc) because it doesn't have the multi-domain support we needed (which is added in 3.9). Also, please redo this patch against tip:x86/ras which already has patches touching mce_amd.c. Ok. Oh, and lastly, needless to say, it needs to be tested on a normal, i.e. !numascale AMD multinode box, in case you haven't done so yet. :-) It has been tested on normal platforms and NumaConnect platforms (Fam10h and Fam15h AMD processors, SCM and MCM versions). Cheers, Steffen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 RESEND] Add NumaChip remote PCI support
Hi Bjorn, On 11/30/2012 17:45, Bjorn Helgaas wrote: On Thu, Nov 29, 2012 at 10:28 PM, Daniel J Blueman [] We could expose pci_dev_base via struct x86_init_pci; the extra complexity and performance tradeoff may not be worth it for a single case perhaps? Oh, right, I forgot that you can't decide this at build-time. This is PCI config access, which is not a performance path, so I'm not really concerned about it from that angle, but you make a good point about the complexity. The reason I'm interested in this is because MMCONFIG is a generic PCIe feature but is currently done via several arch-specific implementations, so I'm starting to think about how we can make parts of it more generic. From that perspective, it's nicer to parameterize an existing implementation than to clone it because it makes refactoring opportunities more obvious. Backing up a bit, I'm curious about exactly why you need to check for the limit to begin with. The comment says "Ensure AMD Northbridges don't decode reads to other devices," but that doesn't seem strictly accurate. You're not changing anything in the hardware to prevent it from *decoding* a read, so it seems like you're actually just preventing the read in the first place. What happens without the limit check? Do you get a response timeout and a machine check? Read from the wrong device?' The latter. I'm not sure how familiar you are with how pci config reads are decoded and handled on coherent hypertransport fabrics; The way it works *within* one coherent HT fabric is that the CPU will redirect all config space access above a configured max HT node (a setting in the AMD northbridge) to a specific I/O link (non-coherent link) which usually links up with a "southbridge" device that responds with a target abort (non-existing device). However, this only works when a CPU core is accessing local HT devices. In our architecture, we "glue" together multiple HT fabrics and when a CPU core sends a pci config space request (mmconfig) to a remote machine (via our hardware) this re-direction is not applied anymore. The result is that when a mmconfig read comes in to a coherent HT device on bus00 which is non-existent, one of the other HT nodes on that remote node will respond to the read, leading to "phantom" devices (i.e lspci will show more HT northbridges than what's really physically present) *or* worst case scenario will be that the transaction hangs (alternatively times out, leading to MCE and other bad things). This is why we're checking accesses to bus0, device24-31 and returning a "fake" target abort scenario if the access was to a non-existing HT device. In other words, we're doing in software what a "normal" HT based platform would do in hardware. As far as I can tell, you still describe your MMCONFIG area with an MCFG table (since you use pci_mmconfig_lookup() to find the region). That table only includes the starting and ending bus numbers, so the assumption is that the MMCONFIG space is valid for every possible device on those buses. So it seems like your system is not really compatible with the spec here. Because the MCFG table can't describe finer granularity than start/end bus numbers, we manage MMCONFIG regions as (segment, start_bus, end_bus, address) tuples. Maybe if we tracked it with slightly finer granularity, e.g., (segment, start_bus, end_bus, end_bus_device, address), you could have some sort of MCFG-parsing quirk that reduces the size of the MMCONFIG region you register for bus 0. Just brainstorming here; it's not obvious to me yet what the best solution is. Bjorn Kind regards, -- Steffen Persvold, Chief Architect NumaChip Numascale AS - www.numascale.com Tel: +47 92 49 25 54 Skype: spersvold -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 RESEND] Add NumaChip remote PCI support
Hi Bjorn, On 11/30/2012 17:45, Bjorn Helgaas wrote: On Thu, Nov 29, 2012 at 10:28 PM, Daniel J Blueman [] We could expose pci_dev_base via struct x86_init_pci; the extra complexity and performance tradeoff may not be worth it for a single case perhaps? Oh, right, I forgot that you can't decide this at build-time. This is PCI config access, which is not a performance path, so I'm not really concerned about it from that angle, but you make a good point about the complexity. The reason I'm interested in this is because MMCONFIG is a generic PCIe feature but is currently done via several arch-specific implementations, so I'm starting to think about how we can make parts of it more generic. From that perspective, it's nicer to parameterize an existing implementation than to clone it because it makes refactoring opportunities more obvious. Backing up a bit, I'm curious about exactly why you need to check for the limit to begin with. The comment says Ensure AMD Northbridges don't decode reads to other devices, but that doesn't seem strictly accurate. You're not changing anything in the hardware to prevent it from *decoding* a read, so it seems like you're actually just preventing the read in the first place. What happens without the limit check? Do you get a response timeout and a machine check? Read from the wrong device?' The latter. I'm not sure how familiar you are with how pci config reads are decoded and handled on coherent hypertransport fabrics; The way it works *within* one coherent HT fabric is that the CPU will redirect all config space access above a configured max HT node (a setting in the AMD northbridge) to a specific I/O link (non-coherent link) which usually links up with a southbridge device that responds with a target abort (non-existing device). However, this only works when a CPU core is accessing local HT devices. In our architecture, we glue together multiple HT fabrics and when a CPU core sends a pci config space request (mmconfig) to a remote machine (via our hardware) this re-direction is not applied anymore. The result is that when a mmconfig read comes in to a coherent HT device on bus00 which is non-existent, one of the other HT nodes on that remote node will respond to the read, leading to phantom devices (i.e lspci will show more HT northbridges than what's really physically present) *or* worst case scenario will be that the transaction hangs (alternatively times out, leading to MCE and other bad things). This is why we're checking accesses to bus0, device24-31 and returning a fake target abort scenario if the access was to a non-existing HT device. In other words, we're doing in software what a normal HT based platform would do in hardware. As far as I can tell, you still describe your MMCONFIG area with an MCFG table (since you use pci_mmconfig_lookup() to find the region). That table only includes the starting and ending bus numbers, so the assumption is that the MMCONFIG space is valid for every possible device on those buses. So it seems like your system is not really compatible with the spec here. Because the MCFG table can't describe finer granularity than start/end bus numbers, we manage MMCONFIG regions as (segment, start_bus, end_bus, address) tuples. Maybe if we tracked it with slightly finer granularity, e.g., (segment, start_bus, end_bus, end_bus_device, address), you could have some sort of MCFG-parsing quirk that reduces the size of the MMCONFIG region you register for bus 0. Just brainstorming here; it's not obvious to me yet what the best solution is. Bjorn Kind regards, -- Steffen Persvold, Chief Architect NumaChip Numascale AS - www.numascale.com Tel: +47 92 49 25 54 Skype: spersvold -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Fix AMD Northbridge-ID contiguity assumptions
On 10/12/2012 11:33, Borislav Petkov wrote: On Thu, Oct 04, 2012 at 03:18:02PM +0200, Borislav Petkov wrote: On Wed, Oct 03, 2012 at 09:21:14PM +0800, Daniel J Blueman wrote: The AMD Northbridge initialisation code and EDAC assume the Northbridge IDs are contiguous, which no longer holds on federated systems with multiple HyperTransport fabrics and multiple PCI domains. Address this assumption by searching the Northbridge ID array, rather than directly indexing it, using the upper bits for the PCI domain. v2: Fix Northbridge entry initialisation Tested on a single-socket system and 3-server federated system. Signed-off-by: Daniel J Blueman --- arch/x86/include/asm/amd_nb.h | 23 +-- arch/x86/kernel/amd_nb.c | 16 +--- drivers/edac/amd64_edac.c | 18 +- drivers/edac/amd64_edac.h |6 -- Ok, I've been meaning to clean up that amd_nb.c code which iterates over all PCI devices on the system just so it can count the NBs and then do it again in order to do the ->misc and ->link assignment. So below is what I've come up with and it builds but it is completely untested and I might be completely off, for all I know. The basic idea, though, is to have the first 8 NB descriptors in an array of 8 and use that for a fast lookup on all those single-board machines where the number of the northbridges is the number of physical processors on the board (or x2, if a MCM). Then, there's a linked list of all further NB descriptors which should work in your case of confederated systems. Btw, I've also reused your get_node_id function and the edac changes are still pending but they should be trivial once this new approach pans out. Hi Boris, This patch looks very clean and should serve our purpose as well (I'll double check with Daniel). Regarding the size of the "node" variable, you asked before. The theoretical maximum number of AMD NBs we can have in a confederated NumaConnect system _today_ is 8*4096 (8 NBs per system, 4096 systems) so technically this could fit into a u16 instead of a u32 (you'll have to shift left by 3 instead of 8). However, to allow some flexibility I think a u32 is better and I think we can live with those two extra bytes per struct member, or ? Cheers, -- Steffen Persvold, Chief Architect NumaChip Numascale AS - www.numascale.com Tel: +47 92 49 25 54 Skype: spersvold -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Fix AMD Northbridge-ID contiguity assumptions
On 10/12/2012 11:33, Borislav Petkov wrote: On Thu, Oct 04, 2012 at 03:18:02PM +0200, Borislav Petkov wrote: On Wed, Oct 03, 2012 at 09:21:14PM +0800, Daniel J Blueman wrote: The AMD Northbridge initialisation code and EDAC assume the Northbridge IDs are contiguous, which no longer holds on federated systems with multiple HyperTransport fabrics and multiple PCI domains. Address this assumption by searching the Northbridge ID array, rather than directly indexing it, using the upper bits for the PCI domain. v2: Fix Northbridge entry initialisation Tested on a single-socket system and 3-server federated system. Signed-off-by: Daniel J Blueman dan...@numascale-asia.com --- arch/x86/include/asm/amd_nb.h | 23 +-- arch/x86/kernel/amd_nb.c | 16 +--- drivers/edac/amd64_edac.c | 18 +- drivers/edac/amd64_edac.h |6 -- Ok, I've been meaning to clean up that amd_nb.c code which iterates over all PCI devices on the system just so it can count the NBs and then do it again in order to do the -misc and -link assignment. So below is what I've come up with and it builds but it is completely untested and I might be completely off, for all I know. The basic idea, though, is to have the first 8 NB descriptors in an array of 8 and use that for a fast lookup on all those single-board machines where the number of the northbridges is the number of physical processors on the board (or x2, if a MCM). Then, there's a linked list of all further NB descriptors which should work in your case of confederated systems. Btw, I've also reused your get_node_id function and the edac changes are still pending but they should be trivial once this new approach pans out. Hi Boris, This patch looks very clean and should serve our purpose as well (I'll double check with Daniel). Regarding the size of the node variable, you asked before. The theoretical maximum number of AMD NBs we can have in a confederated NumaConnect system _today_ is 8*4096 (8 NBs per system, 4096 systems) so technically this could fit into a u16 instead of a u32 (you'll have to shift left by 3 instead of 8). However, to allow some flexibility I think a u32 is better and I think we can live with those two extra bytes per struct member, or ? Cheers, -- Steffen Persvold, Chief Architect NumaChip Numascale AS - www.numascale.com Tel: +47 92 49 25 54 Skype: spersvold -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA memory limitation?
> > GFP_DMA is ISA dma reachable, Forget the IA64, their setup is weird and > > should best be ignored until 2.5 as and when they sort it out. Really ? I don't think I can ignore IA64, there are people who ask for it > > > bounce buffers are needed. On Alpha GFP_DMA is not limited at all (I think). >Correct me if > > > > Alpha has various IOMMU facilities > > > > > I'm wrong, but I really think there should be a general way of allocating memory >that is > > > 32bit addressable (something like GFP_32BIT?) so you don't need a lot of >#ifdef's in your > > > code. > > No ifdefs are needed > > > > GFP_DMA - ISA dma reachable > > pci_alloc_* and friends - PCI usable memory > > pci_alloc_* is designed to support ISA. > > Pass pci_dev==NULL to pci_alloc_* for ISA devices, and it allocs GFP_DMA > for you. > Sure, but the IA64 platforms that are out now doesn't have an IOMMU, so bounce buffers are used if you don't specify GFP_DMA in your get_free_page. Now lets say you have a driver with a page allocator. Eventually you want to make some if the allocated pages available to a 32bit PCI device. These pages has to be consistent (i.e the driver doesn't have to wait for a PCI flush for the data to be valid, sort of like a ethernet ring buffer). I could use the pci_alloc_consistent() function (pci_alloc_consistent() allocates a buffer with GFP_DMA on IA64), but since I already have the pages, I have to use pci_map_single (or pci_map_sg). Inside pci_map_single on IA64 something called swiotlb buffers (bounce buffers) are used if the device can't support 64bit addressing and the address of the memory to map is above the 4G limit. The swiotlb buffers are below the 4G limit and therefore reachable by any PCI device. The problem about these buffers are that the content are not copied to the original location before you do a pci_sync_* or a pci_unmap_* (they are not consistent) and they are a limited resource (allocated at boot time). My solution for now was to use : #if defined(__ia64__) int flag = GFP_DMA; #else int flag = 0; #endif Maybe IA64 could implement GFP_HIGHMEM (as on i386) so that if no flags were used you were guaranteed to get 32bit memory ??? Regards -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA memory limitation?
Helge Hafting wrote: > > Vasu Varma P V wrote: > > > > Hi, > > > > Is there any limitation on DMA memory we can allocate using > > kmalloc(size, GFP_DMA)? I am not able to acquire more than > > 14MB of the mem using this on my PCI SMP box with 256MB ram. > > I think there is restriction on ISA boards of 16MB. > > Can we increase it ? > > You can allocate a lot more memory for your pci activities. > No problem there. Just drop the "GFP_DMA" and you'll get > up to 1G or so. > > You shouldn't use GFP_DMA because PCI cards don't need that. > Only ISA cards needs GFP_DMA because they can't use more > than 16M. So obviously GFP_DMA is limited to > 16M because it is really ISA_DMA. > > PCI don't need such special tricks, so don't use GFP_DMA! > Your PCI cards is able to DMA into any memory, including > the non-GFP_DMA memory. > > > but we have a macro in include/asm-i386/dma.h, > > MAX_DMA_ADDRESS (PAGE_OFFSET+0x100). > > > > if i change it to a higher value, i am able to get more dma > > memory. Is there any way i can change this without compiling > > the kernel? > > > No matter what you do, DON'T change that. Yeah, you'll get > a bigger GFP_DMA pool, but that'll break each and every > ISA card that tries to allocate GFP_DMA memory. You > achieve exactly the same effect for your PCI card by ditching > the GFP_DMA parameter, but then you achieve it without breaking > ISA cards. > A problem arises on 64 bit platforms (such as IA64) if your PCI card is only 32bit (can address the first 4G) and you don't wan't to use bounce buffers. If you use GFP_DMA on IA64 you are ensured that the memory you get is below 4G and not 16M as on i386, hence no bounce buffers are needed. On Alpha GFP_DMA is not limited at all (I think). Correct me if I'm wrong, but I really think there should be a general way of allocating memory that is 32bit addressable (something like GFP_32BIT?) so you don't need a lot of #ifdef's in your code. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA memory limitation?
Helge Hafting wrote: Vasu Varma P V wrote: Hi, Is there any limitation on DMA memory we can allocate using kmalloc(size, GFP_DMA)? I am not able to acquire more than 14MB of the mem using this on my PCI SMP box with 256MB ram. I think there is restriction on ISA boards of 16MB. Can we increase it ? You can allocate a lot more memory for your pci activities. No problem there. Just drop the GFP_DMA and you'll get up to 1G or so. You shouldn't use GFP_DMA because PCI cards don't need that. Only ISA cards needs GFP_DMA because they can't use more than 16M. So obviously GFP_DMA is limited to 16M because it is really ISA_DMA. PCI don't need such special tricks, so don't use GFP_DMA! Your PCI cards is able to DMA into any memory, including the non-GFP_DMA memory. but we have a macro in include/asm-i386/dma.h, MAX_DMA_ADDRESS (PAGE_OFFSET+0x100). if i change it to a higher value, i am able to get more dma memory. Is there any way i can change this without compiling the kernel? No matter what you do, DON'T change that. Yeah, you'll get a bigger GFP_DMA pool, but that'll break each and every ISA card that tries to allocate GFP_DMA memory. You achieve exactly the same effect for your PCI card by ditching the GFP_DMA parameter, but then you achieve it without breaking ISA cards. A problem arises on 64 bit platforms (such as IA64) if your PCI card is only 32bit (can address the first 4G) and you don't wan't to use bounce buffers. If you use GFP_DMA on IA64 you are ensured that the memory you get is below 4G and not 16M as on i386, hence no bounce buffers are needed. On Alpha GFP_DMA is not limited at all (I think). Correct me if I'm wrong, but I really think there should be a general way of allocating memory that is 32bit addressable (something like GFP_32BIT?) so you don't need a lot of #ifdef's in your code. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA memory limitation?
GFP_DMA is ISA dma reachable, Forget the IA64, their setup is weird and should best be ignored until 2.5 as and when they sort it out. Really ? I don't think I can ignore IA64, there are people who ask for it bounce buffers are needed. On Alpha GFP_DMA is not limited at all (I think). Correct me if Alpha has various IOMMU facilities I'm wrong, but I really think there should be a general way of allocating memory that is 32bit addressable (something like GFP_32BIT?) so you don't need a lot of #ifdef's in your code. No ifdefs are needed GFP_DMA - ISA dma reachable pci_alloc_* and friends - PCI usable memory pci_alloc_* is designed to support ISA. Pass pci_dev==NULL to pci_alloc_* for ISA devices, and it allocs GFP_DMA for you. Sure, but the IA64 platforms that are out now doesn't have an IOMMU, so bounce buffers are used if you don't specify GFP_DMA in your get_free_page. Now lets say you have a driver with a page allocator. Eventually you want to make some if the allocated pages available to a 32bit PCI device. These pages has to be consistent (i.e the driver doesn't have to wait for a PCI flush for the data to be valid, sort of like a ethernet ring buffer). I could use the pci_alloc_consistent() function (pci_alloc_consistent() allocates a buffer with GFP_DMA on IA64), but since I already have the pages, I have to use pci_map_single (or pci_map_sg). Inside pci_map_single on IA64 something called swiotlb buffers (bounce buffers) are used if the device can't support 64bit addressing and the address of the memory to map is above the 4G limit. The swiotlb buffers are below the 4G limit and therefore reachable by any PCI device. The problem about these buffers are that the content are not copied to the original location before you do a pci_sync_* or a pci_unmap_* (they are not consistent) and they are a limited resource (allocated at boot time). My solution for now was to use : #if defined(__ia64__) int flag = GFP_DMA; #else int flag = 0; #endif Maybe IA64 could implement GFP_HIGHMEM (as on i386) so that if no flags were used you were guaranteed to get 32bit memory ??? Regards -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Router problems with transparent proxy
Hi, I think I've triggered a bug in the ipchains/iptables part of the kernel. Here is the story : The server was a 866MHz PIII with 384 MByte of RAM running RH7.1 with a 2.4.5-ac21 kernel. It was used as a router/firewall with 2 netcards (not sure which type, but I don't think that's important). Using this machine as a plain router was no problem at all, and serving a class C net onto a 3 MBit line was a just a walk in the park, the machine was idle for most of the time. Then we decided to set up transparent proxy and used a pretty standard setup redirecting all port 80 accesses with ipchains to squid. Things worked fine for a while (about 2 hrs) until we noticed that the machine got extremly unresponsive on the console. A 'top' session showed us that the machine was almost a 100% in system time. If we disconnected the some of the segments on the C net, system time went down a bit. We rebooted the machine and noticed that the system time started at zero and went slowly upwards until it reached 100 (after about 2hrs) and we just needed to reboot again. We just disabled the ipchains stuff, and now the server is rock solid with a 'normal' proxy setup (and 100% idle almost all the time). Just for the record : We also tried standard RH7.1 kernels (2.4.2-2 and 2.4.3) with the same results. Any ideas ? Anybody experienced similar behaviour ? It looks like a resource leak somewhere in the IP filter code to me. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Router problems with transparent proxy
Hi, I think I've triggered a bug in the ipchains/iptables part of the kernel. Here is the story : The server was a 866MHz PIII with 384 MByte of RAM running RH7.1 with a 2.4.5-ac21 kernel. It was used as a router/firewall with 2 netcards (not sure which type, but I don't think that's important). Using this machine as a plain router was no problem at all, and serving a class C net onto a 3 MBit line was a just a walk in the park, the machine was idle for most of the time. Then we decided to set up transparent proxy and used a pretty standard setup redirecting all port 80 accesses with ipchains to squid. Things worked fine for a while (about 2 hrs) until we noticed that the machine got extremly unresponsive on the console. A 'top' session showed us that the machine was almost a 100% in system time. If we disconnected the some of the segments on the C net, system time went down a bit. We rebooted the machine and noticed that the system time started at zero and went slowly upwards until it reached 100 (after about 2hrs) and we just needed to reboot again. We just disabled the ipchains stuff, and now the server is rock solid with a 'normal' proxy setup (and 100% idle almost all the time). Just for the record : We also tried standard RH7.1 kernels (2.4.2-2 and 2.4.3) with the same results. Any ideas ? Anybody experienced similar behaviour ? It looks like a resource leak somewhere in the IP filter code to me. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Functionality of mmap, nopage and remap_page_range
Hi kernel list readers, I have a question about the functionality of mmap(), vma->vm_ops functions and different vma->vm_flags. Is there any documentation that describes these methods and how they should work (i.e when should mmap() use remap_page_range and when is the vma->vm_ops->no_page function called) Any help appreciated. -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Functionality of mmap, nopage and remap_page_range
Hi kernel list readers, I have a question about the functionality of mmap(), vma-vm_ops functions and different vma-vm_flags. Is there any documentation that describes these methods and how they should work (i.e when should mmap() use remap_page_range and when is the vma-vm_ops-no_page function called) Any help appreciated. -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
"L. K." wrote: > I haven't encountered any CPU with builtin temperature sensors. > Eh, all Pentium class cpus have a build in sensor for core temperature (I believe Athlons too). It's just the logic which is outside in form of a A/D converter connected to a I2C bus. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
L. K. wrote: I haven't encountered any CPU with builtin temperature sensors. Eh, all Pentium class cpus have a build in sensor for core temperature (I believe Athlons too). It's just the logic which is outside in form of a A/D converter connected to a I2C bus. Regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question regarding pci_alloc_consitent() and __get_free_pages
Hi, I have a question regarding the pci_alloc_consistent() function. Will this function allocate pages that are physical contiguous ? i.e if I call this function with a size argument of 32KByte will that be 8 consecutive pages in memory on i386 architecture (4 pages on alpha). In general, will __get_free_pages(GFP_ATOMIC, order) always return physical contiguous memory ? All feedback appreciated, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question regarding pci_alloc_consitent() and __get_free_pages
Hi, I have a question regarding the pci_alloc_consistent() function. Will this function allocate pages that are physical contiguous ? i.e if I call this function with a size argument of 32KByte will that be 8 consecutive pages in memory on i386 architecture (4 pages on alpha). In general, will __get_free_pages(GFP_ATOMIC, order) always return physical contiguous memory ? All feedback appreciated, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA/PDC/Athlon
Pavel Roskin wrote: > > Hello, Zilvinas! > > There are utilities that work with PnP BIOS. They are included with > pcmcia-cs (which is weird - it should be a separate package) and called > "lspci" and "setpci". They depend on PnP BIOS support in the kernel > (CONFIG_PNPBIOS). > > Dumping your PnP BIOS configuration and checking whether it has changed > after booting to Windows would be more reasonable than checking your PCI > configuration (IMHO). Ehm, "lspci" and "setpci" is part of the pci-utils package (at least on RedHat) and is used to dump/modify PCI configuration space (/proc/bus/pci). If you know how to use these tools to dump PNP bios, please tell us. Regards -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA/PDC/Athlon
Pavel Roskin wrote: Hello, Zilvinas! There are utilities that work with PnP BIOS. They are included with pcmcia-cs (which is weird - it should be a separate package) and called lspci and setpci. They depend on PnP BIOS support in the kernel (CONFIG_PNPBIOS). Dumping your PnP BIOS configuration and checking whether it has changed after booting to Windows would be more reasonable than checking your PCI configuration (IMHO). Ehm, lspci and setpci is part of the pci-utils package (at least on RedHat) and is used to dump/modify PCI configuration space (/proc/bus/pci). If you know how to use these tools to dump PNP bios, please tell us. Regards -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel crash using NFSv3 on 2.4.4
Hi all, I have compiled a stock 2.4.4 kernel and applied SGI's kdb patch v1.8. Most of the time this runs just fine, but one time when I tried to copy a file from a NFS server I got a kernel fault. Luckily it jumped right into the debugger and I could to some batcktracing (quite useful!) : Unable to handle kernel paging request at virtual address 414478b1 printing eip: c012c826 *pde = Entering kdb (current=0xca07a000, pid 971) on processor 0 Oops: Oops due to oops @ 0xc012c826 eax = 0x2000 ebx = 0xc15e4800 ecx = 0x edx = 0xc1447899 esi = 0x edi = 0xc14477a0 esp = 0xca07ba98 eip = 0xc012c826 ebp = 0xca07baa4 xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 xds = 0xc1440018 xes = 0x0018 origeax = 0x = 0xca07ba64 [0]kdb> bt EBP EIP Function(args) 0xca07baa4 0xc012c826 kmem_cache_alloc_batch+0x46 (0xc14477a0, 0x7, 0xcb965260) kernel .text 0xc010 0xc012c7e0 0xc012c864 0xca07bad0 0xc012ca8e kmalloc+0x82 (0x13c, 0x7, 0xca4ca040, 0x0) kernel .text 0xc010 0xc012ca0c 0xc012cb1c 0xca07baf4 0xc01fd254 alloc_skb+0x104 (0x100, 0x7) kernel .text 0xc010 0xc01fd150 0xc01fd31c 0xca07bb14 0xc01fc85c sock_alloc_send_skb+0x68 (0xca4ca040, 0xe7, 0x40, 0xca07bb44) kernel .text 0xc010 0xc01fc7f4 0xc01fc8f8 0xca07bb48 0xc020ff58 ip_build_xmit+0xe8 (0xca4ca040, 0xc022834c, 0xca07bbac, 0xc8, 0xca07bbc4) kernel .text 0xc010 0xc020fe70 0xc02101cc 0xca07bbd0 0xc022879c udp_sendmsg+0x344 (0xca4ca040, 0xca07bcc0, 0xac) kernel .text 0xc010 0xc0228458 0xc0228824 0xca07bbe8 0xc022e494 inet_sendmsg+0x40 (0xca1a2d08, 0xca07bcc0, 0xac, 0xca07bc18, 0xc000) kernel .text 0xc010 0xc022e454 0xc022e49c 0xca07bc2c 0xc01fa27e sock_sendmsg+0x7a (0xca1a2d08, 0xca07bcc0, 0xac) kernel .text 0xc010 0xc01fa204 0xc01fa2a0 0xca07bcdc 0xd089c9e4 [sunrpc]do_xprt_transmit+0x158 (0xca07bd70) sunrpc .text 0xd089a060 0xd089c88c 0xd089ccc0 0xca07bcf4 0xd089c87f [sunrpc]xprt_transmit+0xa3 (0xca07bd70) sunrpc .text 0xd089a060 0xd089c7dc 0xd089c88c 0xca07bd08 0xd089aa43 [sunrpc]call_transmit+0x43 (0xca07bd70) [0]more> sunrpc .text 0xd089a060 0xd089aa00 0xd089aa6c 0xca07bd38 0xd089e1de [sunrpc]__rpc_execute+0xa6 (0xca07bd70, 0x0) sunrpc .text 0xd089a060 0xd089e138 0xd089e3ec 0xca07bd4c 0xd089e448 [sunrpc]rpc_execute_Rsmp_cbcaa361+0x5c (0xca07bd70, 0xca07be2c) sunrpc .text 0xd089a060 0xd089e3ec 0xd089e464 0xca07bdf8 0xd089a493 [sunrpc]rpc_call_sync_Rsmp_1a543287+0x73 (0xcb6f09a0, 0xca07be34, 0x0, 0xca07a000) sunrpc .text 0xd089a060 0xd089a420 0xd089a4c0 0xca07beb0 0xd09524e4 [nfs]nfs3_proc_access+0x108 (0xca1fc600, 0x1, 0x0) nfs .text 0xd0948060 0xd09523dc 0xd0952534 0xca07bed8 0xd094f739 [nfs]nfs_permission+0x8d (0xca1fc600, 0x1) nfs .text 0xd0948060 0xd094f6ac 0xd094f7b0 0xca07bef4 0xc01407db permission+0x4b (0xca1fc600, 0x1) kernel .text 0xc010 0xc0140790 0xc0140828 0xca07bf28 0xc0141488 path_walk+0x898 (0xcfdb401b, 0xca07bf7c) kernel .text 0xc010 0xc0140bf0 0xc01414b0 0xca07bf58 0xc0141ba6 open_namei+0x8a (0xcfdb4000, 0x8001, 0x0, 0xca07bf7c) kernel .text 0xc010 0xc0141b1c 0xc0142100 0xca07bf98 0xc0134fda filp_open+0x3a (0xcfdb4000, 0x8000, 0x0) kernel .text 0xc010 0xc0134fa0 0xc0134ffc 0xca07bfbc 0xc01352fe sys_open+0x42 (0xbc18, 0x8000, 0x0, 0x2, 0xbc55) kernel .text 0xc010 0xc01352bc 0xc01353bc 0xc0106ea7 system_call+0x33 kernel .text 0xc010 0xc0106e74 0xc0106eac [0]more> -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel crash using NFSv3 on 2.4.4
Hi all, I have compiled a stock 2.4.4 kernel and applied SGI's kdb patch v1.8. Most of the time this runs just fine, but one time when I tried to copy a file from a NFS server I got a kernel fault. Luckily it jumped right into the debugger and I could to some batcktracing (quite useful!) : Unable to handle kernel paging request at virtual address 414478b1 printing eip: c012c826 *pde = Entering kdb (current=0xca07a000, pid 971) on processor 0 Oops: Oops due to oops @ 0xc012c826 eax = 0x2000 ebx = 0xc15e4800 ecx = 0x edx = 0xc1447899 esi = 0x edi = 0xc14477a0 esp = 0xca07ba98 eip = 0xc012c826 ebp = 0xca07baa4 xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 xds = 0xc1440018 xes = 0x0018 origeax = 0x regs = 0xca07ba64 [0]kdb bt EBP EIP Function(args) 0xca07baa4 0xc012c826 kmem_cache_alloc_batch+0x46 (0xc14477a0, 0x7, 0xcb965260) kernel .text 0xc010 0xc012c7e0 0xc012c864 0xca07bad0 0xc012ca8e kmalloc+0x82 (0x13c, 0x7, 0xca4ca040, 0x0) kernel .text 0xc010 0xc012ca0c 0xc012cb1c 0xca07baf4 0xc01fd254 alloc_skb+0x104 (0x100, 0x7) kernel .text 0xc010 0xc01fd150 0xc01fd31c 0xca07bb14 0xc01fc85c sock_alloc_send_skb+0x68 (0xca4ca040, 0xe7, 0x40, 0xca07bb44) kernel .text 0xc010 0xc01fc7f4 0xc01fc8f8 0xca07bb48 0xc020ff58 ip_build_xmit+0xe8 (0xca4ca040, 0xc022834c, 0xca07bbac, 0xc8, 0xca07bbc4) kernel .text 0xc010 0xc020fe70 0xc02101cc 0xca07bbd0 0xc022879c udp_sendmsg+0x344 (0xca4ca040, 0xca07bcc0, 0xac) kernel .text 0xc010 0xc0228458 0xc0228824 0xca07bbe8 0xc022e494 inet_sendmsg+0x40 (0xca1a2d08, 0xca07bcc0, 0xac, 0xca07bc18, 0xc000) kernel .text 0xc010 0xc022e454 0xc022e49c 0xca07bc2c 0xc01fa27e sock_sendmsg+0x7a (0xca1a2d08, 0xca07bcc0, 0xac) kernel .text 0xc010 0xc01fa204 0xc01fa2a0 0xca07bcdc 0xd089c9e4 [sunrpc]do_xprt_transmit+0x158 (0xca07bd70) sunrpc .text 0xd089a060 0xd089c88c 0xd089ccc0 0xca07bcf4 0xd089c87f [sunrpc]xprt_transmit+0xa3 (0xca07bd70) sunrpc .text 0xd089a060 0xd089c7dc 0xd089c88c 0xca07bd08 0xd089aa43 [sunrpc]call_transmit+0x43 (0xca07bd70) [0]more sunrpc .text 0xd089a060 0xd089aa00 0xd089aa6c 0xca07bd38 0xd089e1de [sunrpc]__rpc_execute+0xa6 (0xca07bd70, 0x0) sunrpc .text 0xd089a060 0xd089e138 0xd089e3ec 0xca07bd4c 0xd089e448 [sunrpc]rpc_execute_Rsmp_cbcaa361+0x5c (0xca07bd70, 0xca07be2c) sunrpc .text 0xd089a060 0xd089e3ec 0xd089e464 0xca07bdf8 0xd089a493 [sunrpc]rpc_call_sync_Rsmp_1a543287+0x73 (0xcb6f09a0, 0xca07be34, 0x0, 0xca07a000) sunrpc .text 0xd089a060 0xd089a420 0xd089a4c0 0xca07beb0 0xd09524e4 [nfs]nfs3_proc_access+0x108 (0xca1fc600, 0x1, 0x0) nfs .text 0xd0948060 0xd09523dc 0xd0952534 0xca07bed8 0xd094f739 [nfs]nfs_permission+0x8d (0xca1fc600, 0x1) nfs .text 0xd0948060 0xd094f6ac 0xd094f7b0 0xca07bef4 0xc01407db permission+0x4b (0xca1fc600, 0x1) kernel .text 0xc010 0xc0140790 0xc0140828 0xca07bf28 0xc0141488 path_walk+0x898 (0xcfdb401b, 0xca07bf7c) kernel .text 0xc010 0xc0140bf0 0xc01414b0 0xca07bf58 0xc0141ba6 open_namei+0x8a (0xcfdb4000, 0x8001, 0x0, 0xca07bf7c) kernel .text 0xc010 0xc0141b1c 0xc0142100 0xca07bf98 0xc0134fda filp_open+0x3a (0xcfdb4000, 0x8000, 0x0) kernel .text 0xc010 0xc0134fa0 0xc0134ffc 0xca07bfbc 0xc01352fe sys_open+0x42 (0xbc18, 0x8000, 0x0, 0x2, 0xbc55) kernel .text 0xc010 0xc01352bc 0xc01353bc 0xc0106ea7 system_call+0x33 kernel .text 0xc010 0xc0106e74 0xc0106eac [0]more -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ServerWorks LE and MTRR
[EMAIL PROTECTED] wrote: > On Sun, 29 Apr 2001, Steffen Persvold wrote: > > > I've learned it the hard way, I have two types : Compaq DL360 (rev 5) and a > > Tyan S2510 (rev 6). On the compaq machine I constantly get data corruption on > > the last double word (4 bytes) in a 64 byte PCI burst when I use write > > combining on the CPU. On the Tyan however the transfer is always ok. > > > > Are you sure that is not due to board design differences? No I can't be 100% certain that the layout of the board isn't the reason since I haven't asked ServerWorks about this and it doesn't say anything in their docs (yes my company has the NDA, so I shouldn't get to much in detail here), but if this was the case it would be totally wrong to disable write combining on any LE chipset. The test case that I have been using to trigger this is sort of special because we are using SCI shared memory adapters to write (with PIO) into remote nodes memory, and the bandwidth tends to get quite high (approx 170 MByte/sec on LE with write combining). I've been able to run this case on 5 different motherboards using the LE and HE-SL ServerWorks chipsets, but only two of them are LE (the DL360 and the S2510). Everything works fine with write-combining on every motherboard except the DL360 (which has rev 5). One basic test case that I haven't tried, could be to enable write-combining on your PCI graphics adapter memory and see if the X display gets screwed up. I will try to get some information from ServerWorks about this problem, but I'm not sure if ServerWorks would be happy if I told you the answer (because of the NDA). Regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ServerWorks LE and MTRR
Gérard Roudier wrote: > > On Sun, 29 Apr 2001, Steffen Persvold wrote: > > > Hi all, > > > > I just compiled 2.4.4 and are running it on a Serverworks LE motherboard. > > Whenever I try to add a write-combining region, it gets rejected. I took a peek > > in the arch/i386/kernel/mtrr.c and found that this is just as expected with > > v1.40 of the code. It is great that the mtrr code checks and prevents the user > > from doing something that could eventually lead to data corruption. Using > > write-combining on PCI acesses can lead to this on certain LE revisions but > > _not_ all (only rev < 5). Therefore please consider my small patch to allow the > > good ones to be able to use write-combining. I have several rev 06 and they are > > working fine with this patch. > > You wrote that 'only rev < 5' can lead to data corruption, but your patch > seems to disallow use of write combining for rev 5 too. > > Could you clarify? Oops just a typo, it should be <= 5. The patch is correct. > > Gérard. > > PS: > >From what hat did you get this information ? as it seems that ServerWorks > require NDA for letting know technical information on their chipsets. > I've learned it the hard way, I have two types : Compaq DL360 (rev 5) and a Tyan S2510 (rev 6). On the compaq machine I constantly get data corruption on the last double word (4 bytes) in a 64 byte PCI burst when I use write combining on the CPU. On the Tyan however the transfer is always ok. -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ServerWorks LE and MTRR
Hi all, I just compiled 2.4.4 and are running it on a Serverworks LE motherboard. Whenever I try to add a write-combining region, it gets rejected. I took a peek in the arch/i386/kernel/mtrr.c and found that this is just as expected with v1.40 of the code. It is great that the mtrr code checks and prevents the user from doing something that could eventually lead to data corruption. Using write-combining on PCI acesses can lead to this on certain LE revisions but _not_ all (only rev < 5). Therefore please consider my small patch to allow the good ones to be able to use write-combining. I have several rev 06 and they are working fine with this patch. Best regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA diff -Nur linux/arch/i386/kernel/mtrr.c.~1~ linux/arch/i386/kernel/mtrr.c --- linux/arch/i386/kernel/mtrr.c.~1~ Wed Apr 11 21:02:27 2001 +++ linux/arch/i386/kernel/mtrr.c Sun Apr 29 10:18:06 2001 @@ -480,6 +480,7 @@ { unsigned long config, dummy; struct pci_dev *dev = NULL; +u8 rev; /* ServerWorks LE chipsets have problems with write-combining Don't allow it and leave room for other chipsets to be tagged */ @@ -489,7 +490,9 @@ case PCI_VENDOR_ID_SERVERWORKS: switch (dev->device) { case PCI_DEVICE_ID_SERVERWORKS_LE: - return 0; + pci_read_config_byte(dev, PCI_CLASS_REVISION, ); + if (rev <= 5) + return 0; break; default: break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kiobufs and userspace memory
Hi all, I'm writing a device driver for a shared memory adapter from which I plan to support DMA directly from userspace memory (zero copy). I have already implemented a version which I think works, but I'm not sure if I get the IO addresses calculated correctly. The case is as follows : The userspace application has allocated some memory with malloc() and use write to the device's /dev entry. The drivers write() function looks something like this : ssize_t my_write(struct file* file, char* userbuf, size_t len, loff_t* poff) { struct kiobuf * iobuf; size_t size; struct my_sglist *sglist = NULL; int i, err; /* Pin user memory */ err = alloc_kiovec(1, ); if (err) return err; err = map_user_kiobuf(WRITE, userbuf, len); if (err) goto out; /* Traverse the iobuf to get the IO address for each page, * building up the SG table for my DMA machine */ sglist = kmalloc(sizeof(struct my_sglist) * iobuf->nr_pages, GFP_ATOMIC) if (!sglist) goto out_unmap; totlen = 0; for (i = 0; i < iobuf->nr_pages; ++i) { struct page *page = iobuf->maplist[i]; void *vaddr = page_address(page) + iobuf->offset; unsigned long ioaddr = virt_to_bus(vaddr); sglist[i].start = ioaddr; if ((totlen + PAGE_SIZE) > len) sglist[i].len = len - totlen; else sglist[i].len = PAGE_SIZE; totlen + = PAGE_SIZE; } /* Start the synchronous DMA engine */ my_start_dma(sglist); kfree(sglist); out_unmap: /* Unpin user memory */ unmap_kiobuf(iobuf); out: free_kiovec(1, ); } Is this use of kiobufs sensible to you ? If not, what should I really be doing in order to acheive zero copy DMA ? I also have a question regarding a DMA read to userspace memory : If the application didn't initialize the buffer by memset or anything, all the pages maps to the same page (the zero page) right ? So if a map_user_kiobuf() call is made on this buffer, will it sort this out and map the pages to real ones ? Any response greatly appreciated, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kiobufs and userspace memory
Hi all, I'm writing a device driver for a shared memory adapter from which I plan to support DMA directly from userspace memory (zero copy). I have already implemented a version which I think works, but I'm not sure if I get the IO addresses calculated correctly. The case is as follows : The userspace application has allocated some memory with malloc() and use write to the device's /dev entry. The drivers write() function looks something like this : ssize_t my_write(struct file* file, char* userbuf, size_t len, loff_t* poff) { struct kiobuf * iobuf; size_t size; struct my_sglist *sglist = NULL; int i, err; /* Pin user memory */ err = alloc_kiovec(1, iobuf); if (err) return err; err = map_user_kiobuf(WRITE, userbuf, len); if (err) goto out; /* Traverse the iobuf to get the IO address for each page, * building up the SG table for my DMA machine */ sglist = kmalloc(sizeof(struct my_sglist) * iobuf-nr_pages, GFP_ATOMIC) if (!sglist) goto out_unmap; totlen = 0; for (i = 0; i iobuf-nr_pages; ++i) { struct page *page = iobuf-maplist[i]; void *vaddr = page_address(page) + iobuf-offset; unsigned long ioaddr = virt_to_bus(vaddr); sglist[i].start = ioaddr; if ((totlen + PAGE_SIZE) len) sglist[i].len = len - totlen; else sglist[i].len = PAGE_SIZE; totlen + = PAGE_SIZE; } /* Start the synchronous DMA engine */ my_start_dma(sglist); kfree(sglist); out_unmap: /* Unpin user memory */ unmap_kiobuf(iobuf); out: free_kiovec(1, iobuf); } Is this use of kiobufs sensible to you ? If not, what should I really be doing in order to acheive zero copy DMA ? I also have a question regarding a DMA read to userspace memory : If the application didn't initialize the buffer by memset or anything, all the pages maps to the same page (the zero page) right ? So if a map_user_kiobuf() call is made on this buffer, will it sort this out and map the pages to real ones ? Any response greatly appreciated, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ServerWorks LE and MTRR
Hi all, I just compiled 2.4.4 and are running it on a Serverworks LE motherboard. Whenever I try to add a write-combining region, it gets rejected. I took a peek in the arch/i386/kernel/mtrr.c and found that this is just as expected with v1.40 of the code. It is great that the mtrr code checks and prevents the user from doing something that could eventually lead to data corruption. Using write-combining on PCI acesses can lead to this on certain LE revisions but _not_ all (only rev 5). Therefore please consider my small patch to allow the good ones to be able to use write-combining. I have several rev 06 and they are working fine with this patch. Best regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA diff -Nur linux/arch/i386/kernel/mtrr.c.~1~ linux/arch/i386/kernel/mtrr.c --- linux/arch/i386/kernel/mtrr.c.~1~ Wed Apr 11 21:02:27 2001 +++ linux/arch/i386/kernel/mtrr.c Sun Apr 29 10:18:06 2001 @@ -480,6 +480,7 @@ { unsigned long config, dummy; struct pci_dev *dev = NULL; +u8 rev; /* ServerWorks LE chipsets have problems with write-combining Don't allow it and leave room for other chipsets to be tagged */ @@ -489,7 +490,9 @@ case PCI_VENDOR_ID_SERVERWORKS: switch (dev-device) { case PCI_DEVICE_ID_SERVERWORKS_LE: - return 0; + pci_read_config_byte(dev, PCI_CLASS_REVISION, rev); + if (rev = 5) + return 0; break; default: break; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ServerWorks LE and MTRR
Gérard Roudier wrote: On Sun, 29 Apr 2001, Steffen Persvold wrote: Hi all, I just compiled 2.4.4 and are running it on a Serverworks LE motherboard. Whenever I try to add a write-combining region, it gets rejected. I took a peek in the arch/i386/kernel/mtrr.c and found that this is just as expected with v1.40 of the code. It is great that the mtrr code checks and prevents the user from doing something that could eventually lead to data corruption. Using write-combining on PCI acesses can lead to this on certain LE revisions but _not_ all (only rev 5). Therefore please consider my small patch to allow the good ones to be able to use write-combining. I have several rev 06 and they are working fine with this patch. You wrote that 'only rev 5' can lead to data corruption, but your patch seems to disallow use of write combining for rev 5 too. Could you clarify? Oops just a typo, it should be = 5. The patch is correct. Gérard. PS: From what hat did you get this information ? as it seems that ServerWorks require NDA for letting know technical information on their chipsets. I've learned it the hard way, I have two types : Compaq DL360 (rev 5) and a Tyan S2510 (rev 6). On the compaq machine I constantly get data corruption on the last double word (4 bytes) in a 64 byte PCI burst when I use write combining on the CPU. On the Tyan however the transfer is always ok. -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ServerWorks LE and MTRR
[EMAIL PROTECTED] wrote: On Sun, 29 Apr 2001, Steffen Persvold wrote: I've learned it the hard way, I have two types : Compaq DL360 (rev 5) and a Tyan S2510 (rev 6). On the compaq machine I constantly get data corruption on the last double word (4 bytes) in a 64 byte PCI burst when I use write combining on the CPU. On the Tyan however the transfer is always ok. Are you sure that is not due to board design differences? No I can't be 100% certain that the layout of the board isn't the reason since I haven't asked ServerWorks about this and it doesn't say anything in their docs (yes my company has the NDA, so I shouldn't get to much in detail here), but if this was the case it would be totally wrong to disable write combining on any LE chipset. The test case that I have been using to trigger this is sort of special because we are using SCI shared memory adapters to write (with PIO) into remote nodes memory, and the bandwidth tends to get quite high (approx 170 MByte/sec on LE with write combining). I've been able to run this case on 5 different motherboards using the LE and HE-SL ServerWorks chipsets, but only two of them are LE (the DL360 and the S2510). Everything works fine with write-combining on every motherboard except the DL360 (which has rev 5). One basic test case that I haven't tried, could be to enable write-combining on your PCI graphics adapter memory and see if the X display gets screwed up. I will try to get some information from ServerWorks about this problem, but I'm not sure if ServerWorks would be happy if I told you the answer (because of the NDA). Regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question regarding kernel threads and userlevel
Hi linux-kernel, I have a question regarding kernel threads : Are kernel threads treated equally in terms of scheduling as normal userlevel processes ?? In my test case I have a driver for a PCI card from which I want to control access to it's memory (prefetchable PCI space). Userlevel processes can mmap this PCI memory and write directly to it (via the nopage technique). This is also possible from the kernel thread, but to avoid trashing and short bursts on the PCI bus, I protect every access to the memory space with a spin lock (a mmapped kernel memory page which the driver initializes). That means if you have a SMP system and two userlevel processes wants to write to this memory, one will have to wait for the other before doing the memcpy (yep I'm using what you can call PIO). This works great for two userlevel processes. Now the reason for my question is; if also I have a kernel thread wanting to write to this memory space it will also have to wait for the same lock (though not mmapped since we are already in kernel space and can access the lock page directly). What happens, is that if a userlevel process holds this lock and the kernel thread gets scheduled and tries to get the same lock it will deadlock because the userlevel process never gets back control and releases the lock (kinda like when you spinlock in interrupt level on a lock wich is locked without spinlock_irq). Is this because the kernel thread has higher priority than the user level process (it has a nice level of -20) ? Best regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question regarding kernel threads and userlevel
Hi linux-kernel, I have a question regarding kernel threads : Are kernel threads treated equally in terms of scheduling as normal userlevel processes ?? In my test case I have a driver for a PCI card from which I want to control access to it's memory (prefetchable PCI space). Userlevel processes can mmap this PCI memory and write directly to it (via the nopage technique). This is also possible from the kernel thread, but to avoid trashing and short bursts on the PCI bus, I protect every access to the memory space with a spin lock (a mmapped kernel memory page which the driver initializes). That means if you have a SMP system and two userlevel processes wants to write to this memory, one will have to wait for the other before doing the memcpy (yep I'm using what you can call PIO). This works great for two userlevel processes. Now the reason for my question is; if also I have a kernel thread wanting to write to this memory space it will also have to wait for the same lock (though not mmapped since we are already in kernel space and can access the lock page directly). What happens, is that if a userlevel process holds this lock and the kernel thread gets scheduled and tries to get the same lock it will deadlock because the userlevel process never gets back control and releases the lock (kinda like when you spinlock in interrupt level on a lock wich is locked without spinlock_irq). Is this because the kernel thread has higher priority than the user level process (it has a nice level of -20) ? Best regards, -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APIC errors ...
Chris Wedgwood wrote: > > On Wed, Apr 18, 2001 at 09:27:12PM -0500, Rico Tudor wrote: > > Another problem area is ECC monitoring. I'm still waiting for > info from ServerWorks, and so is Dan Hollis. Alexander Stohr has > even submitted code to Jim Foster for approval, without evident > effect. I have 18GB of RAM divided among five ServerWorks boxes, > so the matter is not academic. > > Add environemt monitoring. One mf my play machines is a dell 2540, > dual AC power, lots os fans and temperature sensing, I'd really like > to be able to get this information from it (yeah, closed source Dell > drivers are worth almost zero). > This must be a Dell issue then, because I wrote a lm_sensors (http://www.netroedge.com/~lm78/) driver for the ServerWorks OSB4 (SouthBridge) some time ago and they have merged it with the PIIX4 driver. lm_sensors 2.5.5 and above should have support for the ServerWorks System Management Bus. I have been running lm_sensors 2.5.5 on several mobos with ServerWorks chipset of all kinds (LE, HE, HE-SL) and most of them work with the PIIX4 driver (with OSB4 support). The only one I've had problems with so far, is the Compaq DL360 which seem to have disabled the SMB on the OSB4 and instead using another approach (proprietary). This could be the problem with the Dell machines too (2450, 2550, 1550). Best regards -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Pho : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Pho : (+1) 713 706 0544 10500 Richmond Ave, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APIC errors ...
Chris Wedgwood wrote: On Wed, Apr 18, 2001 at 09:27:12PM -0500, Rico Tudor wrote: Another problem area is ECC monitoring. I'm still waiting for info from ServerWorks, and so is Dan Hollis. Alexander Stohr has even submitted code to Jim Foster for approval, without evident effect. I have 18GB of RAM divided among five ServerWorks boxes, so the matter is not academic. Add environemt monitoring. One mf my play machines is a dell 2540, dual AC power, lots os fans and temperature sensing, I'd really like to be able to get this information from it (yeah, closed source Dell drivers are worth almost zero). This must be a Dell issue then, because I wrote a lm_sensors (http://www.netroedge.com/~lm78/) driver for the ServerWorks OSB4 (SouthBridge) some time ago and they have merged it with the PIIX4 driver. lm_sensors 2.5.5 and above should have support for the ServerWorks System Management Bus. I have been running lm_sensors 2.5.5 on several mobos with ServerWorks chipset of all kinds (LE, HE, HE-SL) and most of them work with the PIIX4 driver (with OSB4 support). The only one I've had problems with so far, is the Compaq DL360 which seem to have disabled the SMB on the OSB4 and instead using another approach (proprietary). This could be the problem with the Dell machines too (2450, 2550, 1550). Best regards -- Steffen PersvoldSystems Engineer Email : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com) Norway : Pho : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA: Pho : (+1) 713 706 0544 10500 Richmond Ave, Suite 190 Houston, Texas 77042, USA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.3 and Alpha
Hi, Any particular reasons why a stock 2.4.3 kernel doesn't have mm.h and pgalloc.h in sync on Alpha. This is what I get : # make boot gcc -D__KERNEL__ -I/usr/src/redhat/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mno-fp-regs -ffixed-8 -mcpu=ev5 -Wa,-mev6 -c -o init/main.o init/main.c In file included from /usr/src/redhat/linux/include/linux/highmem.h:5, from /usr/src/redhat/linux/include/linux/pagemap.h:16, from /usr/src/redhat/linux/include/linux/locks.h:8, from /usr/src/redhat/linux/include/linux/raid/md.h:36, from init/main.c:24: /usr/src/redhat/linux/include/asm/pgalloc.h:334: conflicting types for `pte_alloc' /usr/src/redhat/linux/include/linux/mm.h:399: previous declaration of `pte_alloc' /usr/src/redhat/linux/include/asm/pgalloc.h:352: conflicting types for `pmd_alloc' /usr/src/redhat/linux/include/linux/mm.h:412: previous declaration of `pmd_alloc' make: *** [init/main.o] Error 1 2.4.1 compiled fine, and as far as I can see, some changes has been made to mm.h since then. I think these changes was followed up by i386, ppc, s390 and sparc64 kernels but not on others. Any plans on when this is done ? Best regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.3 and Alpha
Hi, Any particular reasons why a stock 2.4.3 kernel doesn't have mm.h and pgalloc.h in sync on Alpha. This is what I get : # make boot gcc -D__KERNEL__ -I/usr/src/redhat/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mno-fp-regs -ffixed-8 -mcpu=ev5 -Wa,-mev6 -c -o init/main.o init/main.c In file included from /usr/src/redhat/linux/include/linux/highmem.h:5, from /usr/src/redhat/linux/include/linux/pagemap.h:16, from /usr/src/redhat/linux/include/linux/locks.h:8, from /usr/src/redhat/linux/include/linux/raid/md.h:36, from init/main.c:24: /usr/src/redhat/linux/include/asm/pgalloc.h:334: conflicting types for `pte_alloc' /usr/src/redhat/linux/include/linux/mm.h:399: previous declaration of `pte_alloc' /usr/src/redhat/linux/include/asm/pgalloc.h:352: conflicting types for `pmd_alloc' /usr/src/redhat/linux/include/linux/mm.h:412: previous declaration of `pmd_alloc' make: *** [init/main.o] Error 1 2.4.1 compiled fine, and as far as I can see, some changes has been made to mm.h since then. I think these changes was followed up by i386, ppc, s390 and sparc64 kernels but not on others. Any plans on when this is done ? Best regards, -- Steffen Persvold Systems Engineer Email : mailto:[EMAIL PROTECTED] Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/