Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified
Original Message - > > > > [root linux]$ crash vmlinux > > /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore > > WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values > > > > KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux > > DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore [PARTIAL > > DUMP] > > CPUS: 128 > > DATE: Thu Sep 19 08:31:18 2019 > > UPTIME: 00:01:21 > > LOAD AVERAGE: 0.16, 0.07, 0.02 > >TASKS: 1343 > > NODENAME: amd-ethanol > > RELEASE: 5.3.0-rc7+ > > VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019 > > MACHINE: x86_64 (2195 Mhz) > > MEMORY: 127.9 GB > >PANIC: "Kernel panic - not syncing: sysrq triggered crash" > > PID: 9789 > > COMMAND: "bash" > > TASK: "89711894ae80 [THREAD_INFO: 89711894ae80]" > > CPU: 83 > >STATE: TASK_RUNNING (PANIC) > > > > crash> kmem -s|grep -i invalid > > kmem: dma-kmalloc-512: slab:d77680001c00 invalid > > freepointer:a6086ac099f0c5a4 > > kmem: dma-kmalloc-512: slab:d77680001c00 invalid > > freepointer:a6086ac099f0c5a4 > > crash> > > I fail to see what that's trying to tell me? You have invalid pointers? Correct, because the pointer values are encrypted. The command is walking through the singly-linked list of free objects in a slab from the dma-kmalloc-512 slab cache. The slab memory had been allocated from low memory, and because of the problem at hand, it was was copied to the vmcore in its encrypted state. Dave
Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
- Original Message - > On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote: > > No. It needs *both* the vmlinux file and the vmcore file in order to read > > kernel > > virtual memory, so just having a kernel virtual address is insufficient. > > > > So it's a chicken-and-egg situation. This particular --osrelease option is > > used > > to determine *what* vmlinux file would be required for an actual crash > > analysis > > session. > > Ok, that makes sense. I could've used that explanation when reviewing > the documentation. Do you mind skimming through this: > > https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic > > in case we've missed explaining relevant usage - like that above - of > some of the vmcoreinfo members? Yeah, I've been watching the thread, and the document looks fine to me. It's just that when I saw the discussion of this one being removed that I felt the need to respond... ;-) Dave
Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
- Original Message - > On Mon, Jan 14, 2019 at 03:07:33PM -0500, Dave Anderson wrote: > > That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and > > pulls out the OSRELEASE string from it, so that a user can determine what > > vmlinux file should be used with that vmcore for normal crash analysis. > > And the vmcoreinfo is part of the vmcore, right? Correct. > > So it can just as well read out the address of init_uts_ns and get the > kernel version from there. No. It needs *both* the vmlinux file and the vmcore file in order to read kernel virtual memory, so just having a kernel virtual address is insufficient. So it's a chicken-and-egg situation. This particular --osrelease option is used to determine *what* vmlinux file would be required for an actual crash analysis session. Dave
Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
- Original Message - > On Mon, Jan 14, 2019 at 02:36:47PM -0500, Dave Anderson wrote: > > There's no reading of the dumpfile's memory involved, and that being the > > case, > > the vmlinux file is not utilized. That's the whole point of the crash > > option, i.e., > > taking a vmcore file, and trying to determine what kernel should be used > > with it: > > > > $ man crash > > ... > >--osrelease dumpfile > > Display the OSRELEASE vmcoreinfo string from a kdump dumpfile > > header. > > I don't understand - if you have the vmcoreinfo (which I assume is part > of the vmcore, yes, no?) you can go and dig out the kernel version from > it, no? > > Why should you not utilize the vmcore file? That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and pulls out the OSRELEASE string from it, so that a user can determine what vmlinux file should be used with that vmcore for normal crash analysis. Dave > > (I'm most likely missing something.) > > > Well, I just don't agree that the OSRELEASE item is "frivolous". It's > > been in place, and depended upon, for many years. > > Yeah, no. The ABI argument is moot in this case as in the last couple > of months people have been persuading me that vmcoreinfo is not ABI. So > you guys need to make up your mind what is it. And if it is an ABI, it > wasn't documented anywhere. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. >
Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
- Original Message - > On Mon, Jan 14, 2019 at 01:58:32PM -0500, Dave Anderson wrote: > > Preferably it would be left as-is. The crash utility has a "crash > > --osrelease vmcore" > > option that only looks at the dumpfile header, and just dump the string. > > With respect > > to compressed kdumps, crash could alternatively look at the utsname data > > that is stored > > in the diskdump_header.utsname field, but with ELF vmcores, there is no > > such back-up. > > Well, there is: > > 4f 53 52 45 4c 45 41 53 45 3d 35 2e 30 2e 30 2d |OSRELEASE=5.0.0-| > 0010 72 63 32 2b 0a 50 41 47 45 53 49 5a 45 3d 34 30 |rc2+.PAGESIZE=40| > 0020 39 36 0a 53 59 4d 42 4f 4c 28 6d 65 6d 5f 73 65 |96.SYMBOL(mem_se| > 0030 63 74 69 6f 6e 29 3d 66 66 66 66 66 66 66 66 38 |ction)=8| > 0040 34 35 31 61 31 61 38 0a 53 59 4d 42 4f 4c 28 69 |451a1a8.SYMBOL(i| > 0050 6e 69 74 5f 75 74 73 5f 6e 73 29 3d 66 66 66 66 |nit_uts_ns)=| > > 0060 66 66 66 66 38 32 30 31 33 35 34 30 0a 53 59 4d |82013540 > > > This address has it. There's no reading of the dumpfile's memory involved, and that being the case, the vmlinux file is not utilized. That's the whole point of the crash option, i.e., taking a vmcore file, and trying to determine what kernel should be used with it: $ man crash ... --osrelease dumpfile Display the OSRELEASE vmcoreinfo string from a kdump dumpfile header. ... > > > What's the problem with leaving it alone? > > The problem is that I'd like to get all those vmcoreinfo exports under > control and to not have people frivolously export whatever they feel > like, for obvious reasons, and to get rid of the duplicate/unused pieces > being part of vmcoreinfo. Well, I just don't agree that the OSRELEASE item is "frivolous". It's been in place, and depended upon, for many years. Dave
Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
- Original Message - > On Mon, Jan 14, 2019 at 05:48:48PM +, Kazuhito Hagio wrote: > > As for makedumpfile, it will be not impossible to remove the > > init_uts_ns.name.relase (OSRELEASE), but some fixes are needed. > > Because historically OSRELEASE has been a kind of a mandatory entry > > in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses > > its existence to check whether a vmcoreinfo is sane. > > Well, init_uts_ns is exported in vmcoreinfo anyway - makedumpfile > can simply test init_uts_ns.name.release just as well. And the > "historically" argument doesn't matter because vmcoreinfo is not an ABI. > > So makedumpfile needs to be changed to check that new export. > > > Also, I think crash also will need to be fixed if it is removed. Preferably it would be left as-is. The crash utility has a "crash --osrelease vmcore" option that only looks at the dumpfile header, and just dump the string. With respect to compressed kdumps, crash could alternatively look at the utsname data that is stored in the diskdump_header.utsname field, but with ELF vmcores, there is no such back-up. What's the problem with leaving it alone? Dave > > Yes, I'm expecting user tools to be fixed and then exports removed. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. >
Re: BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)
- Original Message - > > > - Original Message - > > On 04/26/2018 02:16 PM, Kees Cook wrote: > > > On Thu, Apr 26, 2018 at 12:31 PM, Dave Anderson > > > wrote: > > >> > > >> While testing /proc/kcore as the live memory source for the crash > > >> utility, > > >> it fails on arm64. The failure on arm64 occurs because only the > > >> vmalloc/module space segments are exported in PT_LOAD segments, > > >> and it's missing all of the PT_LOAD segments for the generic > > >> unity-mapped regions of physical memory, as well as their associated > > >> vmemmap sections. > > >> > > >> The mapping of unity-mapped RAM segments in fs/proc/kcore.c is > > >> architecture-neutral, and after debugging it, I found this as the > > >> problem. For each chunk of physical memory, kcore_update_ram() > > >> calls walk_system_ram_range(), passing kclist_add_private() as a > > >> callback function to add the chunk to the kclist, and eventually > > >> leading to the creation of a PT_LOAD segment. > > >> > > >> kclist_add_private() does some verification of the memory region, > > >> but this one below is bogus for arm64: > > >> > > >> static int > > >> kclist_add_private(unsigned long pfn, unsigned long nr_pages, void > > >> *arg) > > >> { > > >> ... [ cut ] ... > > >> ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT)); > > >> ... [ cut ] ... > > >> > > >> /* Sanity check: Can happen in 32bit arch...maybe */ > > >> if (ent->addr < (unsigned long) __va(0)) > > >> goto free_out; > > >> > > >> And that's because __va(0) is a bogus check for arm64. It is checking > > >> whether the ent->addr value is less than the lowest possible > > >> unity-mapped > > >> address. But "0" should not be used as a physical address on arm64; the > > >> lowest legitimate physical address for this __va() check would be the > > >> arm64 > > >> PHYS_OFFSET, or memstart_addr: > > >> > > >> Here's the arm64 __va() and PHYS_OFFSET: > > >> > > >>#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) > > >>#define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | > > >>PAGE_OFFSET) > > >> > > >>extern s64 memstart_addr; > > >>/* PHYS_OFFSET - the physical address of the start of memory. */ > > >>#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); > > >>memstart_addr; }) > > >> > > >> If PHYS_OFFSET/memstart_addr is anything other than 0 (it is > > >> 0x40 > > >> on my > > >> test system), the __va(0) calculation goes negative and creates a bogus, > > >> very > > >> large, virtual address. And since the ent->addr virtual address is less > > >> than > > >> bogus __va(0) address, the test fails, and the memory chunk is rejected. > > >> > > >> Looking at the kernel sources, it seems that this would affect other > > >> architectures as well, i.e., the ones whose __va() is not a simple > > >> addition of the physical address with PAGE_OFFSET. > > >> > > >> Anyway, I don't know what the best approach for an architecture-neutral > > >> fix would be in this case. So I figured I'd throw it out to you guys > > >> for > > >> some ideas. > > > > > > I'm not as familiar with this code, but I've added Ard and Laura to CC > > > here, as this feels like something they'd be able to comment on. :) > > > > > > -Kees > > > > > > > It seems backwards that we're converting a physical address to > > a virtual address and then validating that. I think checking against > > pfn_valid (to ensure there is a valid memmap entry) > > and then checking page_to_virt against virt_addr_valid to catch > > other cases (e.g. highmem or holes in the space) seems cleaner. > > Hi Laura, > > Thanks a lot for looking into this -- I couldn't find a maintainer for kcore. > > The patch looks good to me, as long as virt_addr_valid() will fail on 32-bit > arches when page_to_virt() create
Re: BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)
- Original Message - > On 04/26/2018 02:16 PM, Kees Cook wrote: > > On Thu, Apr 26, 2018 at 12:31 PM, Dave Anderson > > wrote: > >> > >> While testing /proc/kcore as the live memory source for the crash utility, > >> it fails on arm64. The failure on arm64 occurs because only the > >> vmalloc/module space segments are exported in PT_LOAD segments, > >> and it's missing all of the PT_LOAD segments for the generic > >> unity-mapped regions of physical memory, as well as their associated > >> vmemmap sections. > >> > >> The mapping of unity-mapped RAM segments in fs/proc/kcore.c is > >> architecture-neutral, and after debugging it, I found this as the > >> problem. For each chunk of physical memory, kcore_update_ram() > >> calls walk_system_ram_range(), passing kclist_add_private() as a > >> callback function to add the chunk to the kclist, and eventually > >> leading to the creation of a PT_LOAD segment. > >> > >> kclist_add_private() does some verification of the memory region, > >> but this one below is bogus for arm64: > >> > >> static int > >> kclist_add_private(unsigned long pfn, unsigned long nr_pages, void > >> *arg) > >> { > >> ... [ cut ] ... > >> ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT)); > >> ... [ cut ] ... > >> > >> /* Sanity check: Can happen in 32bit arch...maybe */ > >> if (ent->addr < (unsigned long) __va(0)) > >> goto free_out; > >> > >> And that's because __va(0) is a bogus check for arm64. It is checking > >> whether the ent->addr value is less than the lowest possible unity-mapped > >> address. But "0" should not be used as a physical address on arm64; the > >> lowest legitimate physical address for this __va() check would be the > >> arm64 > >> PHYS_OFFSET, or memstart_addr: > >> > >> Here's the arm64 __va() and PHYS_OFFSET: > >> > >>#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) > >>#define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | > >>PAGE_OFFSET) > >> > >>extern s64 memstart_addr; > >>/* PHYS_OFFSET - the physical address of the start of memory. */ > >>#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); > >>memstart_addr; }) > >> > >> If PHYS_OFFSET/memstart_addr is anything other than 0 (it is 0x40 > >> on my > >> test system), the __va(0) calculation goes negative and creates a bogus, > >> very > >> large, virtual address. And since the ent->addr virtual address is less > >> than > >> bogus __va(0) address, the test fails, and the memory chunk is rejected. > >> > >> Looking at the kernel sources, it seems that this would affect other > >> architectures as well, i.e., the ones whose __va() is not a simple > >> addition of the physical address with PAGE_OFFSET. > >> > >> Anyway, I don't know what the best approach for an architecture-neutral > >> fix would be in this case. So I figured I'd throw it out to you guys for > >> some ideas. > > > > I'm not as familiar with this code, but I've added Ard and Laura to CC > > here, as this feels like something they'd be able to comment on. :) > > > > -Kees > > > > It seems backwards that we're converting a physical address to > a virtual address and then validating that. I think checking against > pfn_valid (to ensure there is a valid memmap entry) > and then checking page_to_virt against virt_addr_valid to catch > other cases (e.g. highmem or holes in the space) seems cleaner. Hi Laura, Thanks a lot for looking into this -- I couldn't find a maintainer for kcore. The patch looks good to me, as long as virt_addr_valid() will fail on 32-bit arches when page_to_virt() creates an invalid address when it gets passed a highmem-physical address. Thanks again, Dave > Maybe something like: > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > index d1e82761de81..e64ecb9f2720 100644 > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c > @@ -209,25 +209,34 @@ kclist_add_private(unsigned long pfn, unsigned long > nr_pages, void *arg) > { > struct list_head *head = (struct list_head *)arg; > struct kcore_list *ent; > + struct page *p; > + > + if (!pf
BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)
While testing /proc/kcore as the live memory source for the crash utility, it fails on arm64. The failure on arm64 occurs because only the vmalloc/module space segments are exported in PT_LOAD segments, and it's missing all of the PT_LOAD segments for the generic unity-mapped regions of physical memory, as well as their associated vmemmap sections. The mapping of unity-mapped RAM segments in fs/proc/kcore.c is architecture-neutral, and after debugging it, I found this as the problem. For each chunk of physical memory, kcore_update_ram() calls walk_system_ram_range(), passing kclist_add_private() as a callback function to add the chunk to the kclist, and eventually leading to the creation of a PT_LOAD segment. kclist_add_private() does some verification of the memory region, but this one below is bogus for arm64: static int kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg) { ... [ cut ] ... ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT)); ... [ cut ] ... /* Sanity check: Can happen in 32bit arch...maybe */ if (ent->addr < (unsigned long) __va(0)) goto free_out; And that's because __va(0) is a bogus check for arm64. It is checking whether the ent->addr value is less than the lowest possible unity-mapped address. But "0" should not be used as a physical address on arm64; the lowest legitimate physical address for this __va() check would be the arm64 PHYS_OFFSET, or memstart_addr: Here's the arm64 __va() and PHYS_OFFSET: #define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET) extern s64 memstart_addr; /* PHYS_OFFSET - the physical address of the start of memory. */ #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) If PHYS_OFFSET/memstart_addr is anything other than 0 (it is 0x40 on my test system), the __va(0) calculation goes negative and creates a bogus, very large, virtual address. And since the ent->addr virtual address is less than bogus __va(0) address, the test fails, and the memory chunk is rejected. Looking at the kernel sources, it seems that this would affect other architectures as well, i.e., the ones whose __va() is not a simple addition of the physical address with PAGE_OFFSET. Anyway, I don't know what the best approach for an architecture-neutral fix would be in this case. So I figured I'd throw it out to you guys for some ideas. Dave Anderson
Re: [PATCH 0/2] Determine kernel text mapping size at runtime for x86_64
- Original Message - > On Wed, Dec 7, 2016 at 11:56 PM, Baoquan He wrote: > > Dave Anderson ever told in Crash utility he makes judgement whether it's > > a kaslr kernel by size of KERNEL_IMAGE_SIZE. As long as it's 1G, it's > > recognized as kaslr. Then the current upstream kernel has a wrong behaviour, > > it sets KERNEL_IMAGE_SIZE as 1G as long as CONFIG_RANDOMIZE_BASE is enabled, > > though people specify "nokaslr" into cmdline to disable kaslr explicitly. > > I'm not sure that's the correct solution to the Crash utility -- the > kaslr-ness of a kernel should be already exposed in the dump with the > kaslr_enabled variable yes? The crash utility doesn't use KERNEL_IMAGE_SIZE to determine whether KASLR is in play, but rather to determine the base of the modules virtual address space (i.e, the same way the kernel does). And then it uses that value in a couple other places. Dave > > > So in this patchset, made changes to determine the size of kernel text > > mapping > > area at runtime. If "nokaslr" specified, kernel mapping size is 512M though > > CONFIG_RANDOMIZE_BASE is enabled. > > This seems to make the non-KASLR case more consistent, so I'm fine > with the idea. Once the build-bots are happy with everything, consider > the series: > > Acked-by: Kees Cook > > Thanks! > > -Kees > > > > > Baoquan He (2): > > x86/64: Make kernel text mapping always take one whole page table in > > early boot code > > x86/KASLR/64: Determine kernel text mapping size at runtime > > > > arch/x86/boot/compressed/kaslr.c| 15 ++- > > arch/x86/include/asm/kaslr.h| 1 + > > arch/x86/include/asm/page_64_types.h| 20 > > arch/x86/include/asm/pgtable_64_types.h | 2 +- > > arch/x86/kernel/head64.c| 11 ++- > > arch/x86/kernel/head_64.S | 16 +--- > > arch/x86/mm/dump_pagetables.c | 3 ++- > > arch/x86/mm/init_64.c | 2 +- > > arch/x86/mm/physaddr.c | 6 +++--- > > 9 files changed, 45 insertions(+), 31 deletions(-) > > > > -- > > 2.5.5 > > > > > > -- > Kees Cook > Nexus Security >
Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo
- Original Message - > Hi Dave, > > On 11/01/16 at 10:13am, Dave Anderson wrote: > > > > > > > > But we have this in mainline which also introduced the VMCOREINFO > > > > numbers, can you send a patch to revert them? > > > > > > OK, will do. > > > > > > However for find_vmemmap_x86_64() in makedumpfile, vmemmap_start is > > > still needed. I checked code, seems no better way to avoid. I am not > > > sure how many people are really using "-e" option to exclude unused > > > vmemmap pages. > > > > > > Maybe just leave it as is, and fix it when people complain? > > > > Speaking of complaints, is there any chance you can make the > > x86_64 "phys_base" value available? The VMCOREINFO_SYMBOL(phys_base) > > is useless since its contents are needed in order to access the > > symbol address. > > Yeah, the current exporting of virt addr of phys_base is really useless > for x86_64. While I saw you have got phys_base from kdump-ed vmcore elf > program header since kexec-tools has created that pt_load for kernel text > region. > > machdep->machspec->phys_base = phdr->p_paddr - (phdr->p_vaddr & > ~(__START_KERNEL_map)); > > Do you still want the value of phys_base? If yes, I can change it to > export the real value of phys_base, not &phys_base. > > In fact, exporting &phys_base was done in 2008, makedumpfile has taken > the similar way you use in crash to get value of phys_base. Means it has > been ignored very earlier. You could be the first person to complain > about it. No, this is not the first time it's been brought up. Anyway, yes, it would be preferable if it were readily available in vmcoreinfo rather than depending upon the PT_LOAD semantics. Thanks, Dave
Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo
- Original Message - > On 11/01/16 at 01:10pm, Dave Young wrote: > > On 10/06/16 at 04:46pm, Baoquan He wrote: > > > KASLR memory randomization can randomize the base of the physical memory > > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap > > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space > > > utility, mainly makedumpfile can use them to identify the base of each > > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing > > > struct number_table in makedumpfile to import data easily. > > > > > > Since they are related to x86_64 only, put them into > > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE > > > together since it's also for x86_64 only. > > > > > > Signed-off-by: Baoquan He > > > --- > > > arch/x86/kernel/machine_kexec_64.c | 4 > > > kernel/kexec_core.c| 3 --- > > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kernel/machine_kexec_64.c > > > b/arch/x86/kernel/machine_kexec_64.c > > > index 5a294e4..e150dd7 100644 > > > --- a/arch/x86/kernel/machine_kexec_64.c > > > +++ b/arch/x86/kernel/machine_kexec_64.c > > > @@ -337,6 +337,10 @@ void arch_crash_save_vmcoreinfo(void) > > > #endif > > > vmcoreinfo_append_str("KERNELOFFSET=%lx\n", > > > kaslr_offset()); > > > + VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE); > > > + VMCOREINFO_NUMBER(PAGE_OFFSET); > > > + VMCOREINFO_NUMBER(VMALLOC_START); > > > + VMCOREINFO_NUMBER(VMEMMAP_START); > > > > Pratyush has posted makedumpfile patches below to avoid the VMCOREINFO: > > http://lists.infradead.org/pipermail/kexec/2016-October/017540.html > > > > But we have this in mainline which also introduced the VMCOREINFO > > numbers, can you send a patch to revert them? > > OK, will do. > > However for find_vmemmap_x86_64() in makedumpfile, vmemmap_start is > still needed. I checked code, seems no better way to avoid. I am not > sure how many people are really using "-e" option to exclude unused > vmemmap pages. > > Maybe just leave it as is, and fix it when people complain? Speaking of complaints, is there any chance you can make the x86_64 "phys_base" value available? The VMCOREINFO_SYMBOL(phys_base) is useless since its contents are needed in order to access the symbol address. Dave > > > commit 0549a3c02efb350776bc869685a361045efd3a29 > > Author: Thomas Garnier > > Date: Tue Oct 11 13:55:08 2016 -0700 > > > > kdump, vmcoreinfo: report memory sections virtual addresses > > [snip]] > > > > > } > > > > > > /* arch-dependent functionality related to kexec file-based syscall */ > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > > index 5616755..8ad3a29e 100644 > > > --- a/kernel/kexec_core.c > > > +++ b/kernel/kexec_core.c > > > @@ -1467,9 +1467,6 @@ static int __init crash_save_vmcoreinfo_init(void) > > > #endif > > > VMCOREINFO_NUMBER(PG_head_mask); > > > VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); > > > -#ifdef CONFIG_X86 > > > - VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE); > > > -#endif > > > > Moving KERNEL_IMAGE_SIZE to x86 should be a standalone patch. > > I remember Dave Anderson said he use it in crash utility, cced him. > > > > > #ifdef CONFIG_HUGETLB_PAGE > > > VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR); > > > #endif > > > -- > > > 2.5.5 > > > > > > > > > ___ > > > kexec mailing list > > > ke...@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/kexec > > > > Thanks > > Dave >
Re: crash/gdb: DW_FORM_strp pointing outside of .debug_str section
> Hi, > I have been trying to debug a multi controller hang issue with crash. > > It seems that crash works fine on stock ubuntu 14.04 running > 3.13.0-39-generic. However when I recompile kernel and try to run > crash on it I get the following error: > > crash> mod -s nvme ./nvme.ko > MODULE NAMESIZE OBJECT FILE > a01d1240 nvme 58727 ./nvme.ko > crash> add-symbol-file ./nvme.ko 0xa01d1240 > add symbol table from file "./nvme.ko" at > .text_addr = 0xa01d1240 > Reading symbols from /root/mynvme.sriov/nvme.ko...DW_FORM_strp > pointing outside of .debug_str section [in module > /root/mynvme.sriov/nvme.ko] > gdb: gdb request failed: add-symbol-file ./nvme.ko 0xa01d1240 > crash> > Any input on what I might have done wrong? Do I need to rebuild the > crash utility package along with the kernel? > > Thanks, > Kallol This is the wrong mailing list for crash utility issues. Please send your queries to crash-util...@redhat.com. That being said, the "mod -s" command does the add-symbol-file command behind the scenes. Your discrete "add-symbol-file" command is both unnecessary and incorrect -- 0xffffa01d1240 is the address of the nvme module's module data structure and not the starting text address. Dave Anderson -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: Export 'pageflag_names' array
- Original Message - > Hi > > On Tue, Oct 8, 2013 at 5:40 AM, Dave Anderson wrote: > > > > > > - Original Message - > >> Hi Anatol, > >> > >> On Mon, Oct 07, 2013 at 10:53:32AM -0700, Anatol Pomozov wrote: > >> > Hi Wu > >> > > >> > I have a request wrt your old commit 718a38211. > >> > > >> > I think it makes sense to export array pageflag_names so kernel dump > >> > debug tools (like 'crash') can use it as well. Currently the tool > >> > hard-codes flag names but it is suboptimal as flags are different for > >> > different configs. > >> > > >> > What do you think? (I can send a patch if you are ok). > >> > >> I wonder if the KPF_* defined in > >> > >> include/uapi/linux/kernel-page-flags.h > >> > >> fit your needs. These are kernel page flags exported to the user space > >> and will be maintained stable and immune to kconfig changes. You can > >> find use examples of them in > >> > >> fs/proc/page.c > >> tools/vm/page-types.c > >> > >> Thanks, > >> Fengguang > >> > > > > Nothing is required for the crash utility. The pageflag_names array (as > > well > > as the pageflags enumerator) are readily available in the kernel debuginfo > > data. > > D'oh. You are right Dave. Everyone, please ignore my original question > I mislooked this variable in symbols map (or maybe looked at an old > kernel map). > > Dave, do you think 'crash' should use pageflag_names to output flag names? It's a good addition -- I'm working on a patch as we speak... Thanks, Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic
- Original Message - > (2013/10/07 22:21), Dave Anderson wrote: > > > - Original Message - > >> (2013/10/03 22:47), Dave Anderson wrote: > > >>> - Original Message - > >>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote: > >>>>> (2013/10/02 16:48), Kees Cook wrote: > > >> > >> Thanks for detailed explanation. So, there's already a feature in crash > >> utility > >> to address relocation!, though it's better for me to try them to check if > >> it's > >> really applicable to this feature. My concern is whether --reloc works > >> well > >> on x86_64 too, because relocation has never done on x86_64 ever, right? > > > > Correct. > > > >> Another concern is that in case of relocation, users need to additional > >> information > >> regarding runtime symbol information to crash utility. I want to avoid > >> additional > >> process, automation is preferable if possible. > > > > Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no > > automation > > available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN. > > The user > > either has to be aware of their values in order to calculate the --reloc > > argument, > > or has to capture a copy of the /proc/kallsyms file on the crashed system. > > Typically > > users/distros using kdump changed their x86 configurations to avoid having > > to deal > > with that. > > > > Sorry, I don't understand why relocation size cannot be calculated when > CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that? I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN, the 32-bit x86 kernel gets relocated (like the secondary kdump kernel), but that information is not readily available from the vmlinux/vmcore pair. > > >> I guess it's enough if there's runtime symbol addresses because we can get > >> relocated > >> offset value by comparing it with the compile-time symbol address > >> contained in > >> a given debuginfo file. Candidates for such symbols are the ones contained > >> in > >> VMCOREINFO note containing some symbol values for makedumpfile to refer to > >> mm-related > >> objects in kernel, which is always contained in vmcore generated by > >> current kdump and > >> also vmcores converted by makedumpfile from it. How about this idea? > > > > But how would that differ from using an incorrect (non-matching) vmlinux > > file? > > > > It seems to me almost similar to what crash currently does even if we do > relocation check. > The current check crash currently does is trial-and-error since there's no > information > indicating given vmcore and vmlinuxcertainly match well. > > For example, the process I imagine is: > >1) try to match vmcore and vmlinux with no relocation. If fails, go to 2). >2) try to match vmcore and vmlinux with relocation. > > The two steps include symbol table initialization so it might actually be > difficult to > resume back from 2) to 1). > > Also, if gap due to phys_base and gap due to relocation can happen at the > same time, > calculating two values automatically might be futher complicated. So, it > would be better > to add relocation value in VMCOREINFO. Then, what crash utility sholud do > becomes very simple. Yes please... And while you're at it, the kernel's VMCOREINFO_SYMBOL(phys_base); is pretty much useless, at least w/respect to ELF vmcores, since we need to know its value in order to translate the address. And I don't think that makedumpfile uses it when it calculates the phys_base that it stores in compressed kdump headers. Why not put its value instead of its address? > BTW, can it really happen that gaps due to phys_base and due to relocation > happen at the > same time? I feel relocation covers phys_base mechanism. If there's > relocation, phys_base > is not necessary. > > -- > Thanks. > HATAYAMA, Daisuke > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: Export 'pageflag_names' array
- Original Message - > Hi Anatol, > > On Mon, Oct 07, 2013 at 10:53:32AM -0700, Anatol Pomozov wrote: > > Hi Wu > > > > I have a request wrt your old commit 718a38211. > > > > I think it makes sense to export array pageflag_names so kernel dump > > debug tools (like 'crash') can use it as well. Currently the tool > > hard-codes flag names but it is suboptimal as flags are different for > > different configs. > > > > What do you think? (I can send a patch if you are ok). > > I wonder if the KPF_* defined in > > include/uapi/linux/kernel-page-flags.h > > fit your needs. These are kernel page flags exported to the user space > and will be maintained stable and immune to kconfig changes. You can > find use examples of them in > > fs/proc/page.c > tools/vm/page-types.c > > Thanks, > Fengguang > Nothing is required for the crash utility. The pageflag_names array (as well as the pageflags enumerator) are readily available in the kernel debuginfo data. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic
- Original Message - > (2013/10/03 22:47), Dave Anderson wrote: > > > > > > - Original Message - > >> (2013/10/02 18:13), HATAYAMA Daisuke wrote: > >>> (2013/10/02 16:48), Kees Cook wrote: > >> > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> +/* > >>>>>> * Determine if we were loaded by an EFI loader. If so, then we > >>>>>> have also been > >>>>>> * passed the efi memmap, systab, etc., so we should use these > >>>>>> data structures > >>>>>> * for initialization. Note, the efi init code path is determined > >>>>>> by the > >>>>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void) > >>>>>> } > >>>>>> > >>>>>> #endif /* CONFIG_X86_32 */ > >>>>>> + > >>>>>> +static struct notifier_block kernel_offset_notifier = { > >>>>>> + .notifier_call = dump_kernel_offset > >>>>>> +}; > >>>>>> + > >>>>>> +static int __init register_kernel_offset_dumper(void) > >>>>>> +{ > >>>>>> + atomic_notifier_chain_register(&panic_notifier_list, > >>>>>> + &kernel_offset_notifier); > >>>>>> + return 0; > >>>>>> +} > >>>>>> +__initcall(register_kernel_offset_dumper); > >>>>>> > >>>>> > >>>>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS > >>>>> doesn't use > >>>>> kdump? Anyway, kdump related tools now calculate phys_base from memory > >>>>> map > >>>>> information passed as ELF PT_LOAD entries like below. > >>>> > >>>> Correct, we are not currently using kdump. > >>>> > >>>>> $ LANG=C readelf -l vmcore-rhel6up4 > >>>>> > >>>>> Elf file type is CORE (Core file) > >>>>> Entry point 0x0 > >>>>> There are 5 program headers, starting at offset 64 > >>>>> > >>>>> Program Headers: > >>>>> Type Offset VirtAddr PhysAddr > >>>>>FileSizMemSiz Flags Align > >>>>> NOTE 0x0158 0x > >>>>> 0x > >>>>>0x0b08 0x0b08 0 > >>>>> LOAD 0x0c60 0x8100 > >>>>> 0x0100 > >>>>>0x0103b000 0x0103b000 RWE0 > >>>>> LOAD 0x0103bc60 0x88001000 > >>>>> 0x1000 > >>>>>0x0009cc00 0x0009cc00 RWE0 > >>>>> LOAD 0x010d8860 0x8810 > >>>>> 0x0010 > >>>>>0x02f0 0x02f0 RWE0 > >>>>> LOAD 0x03fd8860 0x88001300 > >>>>> 0x1300 > >>>>>0x2cffd000 0x2cffd000 RWE0 > >>>>> > >>>>> Each PT_LOAD entry is assigned to virtual and physical address. In this > >>>>> case, > >>>>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we > >>>>> can > >>>>> calculate phys_base value. > >>>> > >>>> It seems like all the information you need would still be available? > >>>> The virtual address is there, so it should be trivial to see the > >>>> offset, IIUC. > >>>> > >>> > >>> Partially yes. I think OK to analyze crash dump by crash utility, a > >>> gdb-based > >>> symbolic debugger for kernel, since phys_base absorbs kernel offset > >>> caused by > >>> relocation and phys_base is available in the way I explained above. > >>> > >>> However, the gained phys_base is not correct one, exac
Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic
- Original Message - > (2013/10/02 18:13), HATAYAMA Daisuke wrote: > > (2013/10/02 16:48), Kees Cook wrote: > > + > + return 0; > +} > + > +/* > * Determine if we were loaded by an EFI loader. If so, then we have > also been > * passed the efi memmap, systab, etc., so we should use these data > structures > * for initialization. Note, the efi init code path is determined by > the > @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void) > } > > #endif /* CONFIG_X86_32 */ > + > +static struct notifier_block kernel_offset_notifier = { > + .notifier_call = dump_kernel_offset > +}; > + > +static int __init register_kernel_offset_dumper(void) > +{ > + atomic_notifier_chain_register(&panic_notifier_list, > + &kernel_offset_notifier); > + return 0; > +} > +__initcall(register_kernel_offset_dumper); > > >>> > >>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS > >>> doesn't use > >>> kdump? Anyway, kdump related tools now calculate phys_base from memory map > >>> information passed as ELF PT_LOAD entries like below. > >> > >> Correct, we are not currently using kdump. > >> > >>> $ LANG=C readelf -l vmcore-rhel6up4 > >>> > >>> Elf file type is CORE (Core file) > >>> Entry point 0x0 > >>> There are 5 program headers, starting at offset 64 > >>> > >>> Program Headers: > >>>Type Offset VirtAddr PhysAddr > >>> FileSizMemSiz Flags Align > >>>NOTE 0x0158 0x > >>> 0x > >>> 0x0b08 0x0b08 0 > >>>LOAD 0x0c60 0x8100 > >>> 0x0100 > >>> 0x0103b000 0x0103b000 RWE0 > >>>LOAD 0x0103bc60 0x88001000 > >>> 0x1000 > >>> 0x0009cc00 0x0009cc00 RWE0 > >>>LOAD 0x010d8860 0x8810 > >>> 0x0010 > >>> 0x02f0 0x02f0 RWE0 > >>>LOAD 0x03fd8860 0x88001300 > >>> 0x1300 > >>> 0x2cffd000 0x2cffd000 RWE0 > >>> > >>> Each PT_LOAD entry is assigned to virtual and physical address. In this > >>> case, > >>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can > >>> calculate phys_base value. > >> > >> It seems like all the information you need would still be available? > >> The virtual address is there, so it should be trivial to see the > >> offset, IIUC. > >> > > > > Partially yes. I think OK to analyze crash dump by crash utility, a > > gdb-based > > symbolic debugger for kernel, since phys_base absorbs kernel offset caused > > by > > relocation and phys_base is available in the way I explained above. > > > > However, the gained phys_base is not correct one, exactly phys_base + > > offset_by_relocation. > > When analyzing crash dump by crash utility, we use debug information > > generated > > during kernel build, which we install as kernel-debuginfo on RHEL for > > example. > > Symbols in debuginfo have statically assigned addresses at build so we see > > the statically assigned addresses during debugging and we see > > phys_base + offset_by_relocation as phys_base. This would be problematic > > if failure on crash dump is relevant to the relocated addresses, though I > > don't > > immediately come up with crash senario where relocated symbol is defitely > > necessary. > > > > Still we can get relocated addresses if kallsyms is enabled on the kernel, > > but kallsyms and relocatable kernels are authogonal. I don't think it > > natural > > to rely on kallsyms. It seems natural to export relocation information newly > > as debugging information. > > > > I was confused yesterday. As I said above, kdump related tools now don't > support > relocation on x86_64, phys_base only. kdump related tools think of present > kernel > offset as phys_base. Then, they reflect kernel offset caused by relocation in > physical addresses only, not in virtual addresses. This obviously affects the > tools. > > BTW, relocation looks more sophisticated than phys_base one. Is it possible to > switch from phys_base one to relocation on x86_64? On x86, relocation is used > so > I guess x86_64 can work in the same way. Is there something missing? > Is there what phys_base can but relocation cannot on x86_64? > > And, Dave, is there feature for crash utility to treat relocation now? Well sort of, there are couple guessing-game kludges that can be used. For 32-bit x86 systems configured with a CONFIG_PHYSICAL_START value that is larger than its CONFIG_PHYSICAL_
Re: [RFC PATCH 0/8] remove vm_struct list management
- Original Message - > On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote: > > [..] > > > So without knowing details of both the data structures, I think if vmlist > > > is going away, then user space tools should be able to traverse > > > vmap_area_root > > > rb tree. I am assuming it is sorted using ->addr field and we should be > > > able to get vmalloc area start from there. It will just be a matter of > > > exporting right fields to user space (instead of vmlist). > > > > There is address sorted list of vmap_area, vmap_area_list. > > So we can use it for traversing vmalloc areas if it is necessary. > > But, as I mentioned before, kexec write *just* address of vmlist and > > offset of vm_struct's address field. It imply that they don't traverse > > vmlist, > > because they didn't write vm_struct's next field which is needed for > > traversing. > > Without vm_struct's next field, they have no method for traversing. > > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] > > is > > a safe way to maintain a compatibility of userspace tool. :) > > Actually the design of "makedumpfile" and "crash" tool is that they know > about kernel data structures and they adopt to changes. So for major > changes they keep track of kernel version numbers and if access the > data structures accordingly. > > Currently we access first element of vmlist to determine start of vmalloc > address. True we don't have to traverse the list. > > But as you mentioned we should be able to get same information by > traversing to left most element of vmap_area_list rb tree. So I think > instead of trying to retain vmlist first element just for backward > compatibility, I will rather prefer get rid of that code completely > from kernel and let user space tool traverse rbtree. Just export > minimum needed info for traversal in user space. There's no need to traverse the rbtree. There is a vmap_area_list linked list of vmap_area structures that is also sorted by virtual address. All that makedumpfile would have to do is to access the first vmap_area in the vmap_area_list -- as opposed to the way that it does now, which is by accessing the first vm_struct in the to-be-obsoleted vmlist list. So it seems silly to keep the dummy "vmlist" around. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/8] remove vm_struct list management
- Original Message - > > Can we get the same information from this rb-tree of vmap_area? Is > > ->va_start field communication same information as vmlist was > > communicating? What's the difference between vmap_area_root and vmlist. > > Thanks for comment. > > Yes. vmap_area's va_start field represent same information as vm_struct's > addr. > vmap_area_root is data structure for fast searching an area. > vmap_area_list is address sorted list, so we can use it like as vmlist. > > There is a little difference vmap_area_list and vmlist. > vmlist is lack of information about some areas in vmalloc address space. > For example, vm_map_ram() allocate area in vmalloc address space, > but it doesn't make a link with vmlist. To provide full information > about vmalloc address space, using vmap_area_list is more adequate. > > > So without knowing details of both the data structures, I think if vmlist > > is going away, then user space tools should be able to traverse > > vmap_area_root > > rb tree. I am assuming it is sorted using ->addr field and we should be > > able to get vmalloc area start from there. It will just be a matter of > > exporting right fields to user space (instead of vmlist). > > There is address sorted list of vmap_area, vmap_area_list. > So we can use it for traversing vmalloc areas if it is necessary. > But, as I mentioned before, kexec write *just* address of vmlist and > offset of vm_struct's address field. It imply that they don't traverse > vmlist, > because they didn't write vm_struct's next field which is needed for > traversing. > Without vm_struct's next field, they have no method for traversing. > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is > a safe way to maintain a compatibility of userspace tool. :) Why bother keeping vmlist around? kdump's makedumpfile command would not even need to traverse the vmap_area rbtree, because it could simply look at the first vmap_area in the sorted vmap_area_list, correct? Dave Anderson -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ptrace: fix 2.6.25 ptrace_bts_config structure declaration
The 2.6.25 ptrace_bts_config structure in asm-x86/ptrace-abi.h is defined with u32 types: #include /* configuration/status structure used in PTRACE_BTS_CONFIG and PTRACE_BTS_STATUS commands. */ struct ptrace_bts_config { /* requested or actual size of BTS buffer in bytes */ u32 size; /* bitmask of below flags */ u32 flags; /* buffer overflow signal */ u32 signal; /* actual size of bts_struct in bytes */ u32 bts_size; }; #endif But u32 is only accessible in asm-x86/types.h if __KERNEL__, leading to compile errors when ptrace.h is included from user-space. Perhaps the double-underscore versions that are exported to user-space in asm-x86/types.h should be used instead? Signed-off-by: [EMAIL PROTECTED] --- linux-2.6.25-rc2/include/asm-x86/ptrace-abi.h.orig +++ linux-2.6.25-rc2/include/asm-x86/ptrace-abi.h @@ -89,13 +89,13 @@ */ struct ptrace_bts_config { /* requested or actual size of BTS buffer in bytes */ - u32 size; + __u32 size; /* bitmask of below flags */ - u32 flags; + __u32 flags; /* buffer overflow signal */ - u32 signal; + __u32 signal; /* actual size of bts_struct in bytes */ - u32 bts_size; + __u32 bts_size; }; #endif
Re: [NEW-PATCH] exec: allow > 2GB executables to run on 64-bit systems
Andi Kleen wrote: Since Dave didn't post an updated patch. This is how I think what the patch should be. I also changed sys_uselib just to be complete. Thanks Andi -- I just tested open_exec() w/O_LARGEFILE on an i386 with a 2.5GB+ binary (mostly debuginfo), and it works as expected. Interesting to note that the test binary couldn't be compiled with i386 gcc, but it could be built with x86_64 gcc -m32. Dave Always use O_LARGEFILE for opening executables This allows to use executables >2GB. Based on a patch by Dave Anderson Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux-2.6.24-rc3/fs/exec.c === --- linux-2.6.24-rc3.orig/fs/exec.c +++ linux-2.6.24-rc3/fs/exec.c @@ -119,7 +119,7 @@ asmlinkage long sys_uselib(const char __ if (error) goto exit; - file = nameidata_to_filp(&nd, O_RDONLY); + file = nameidata_to_filp(&nd, O_RDONLY|O_LARGEFILE); error = PTR_ERR(file); if (IS_ERR(file)) goto out; @@ -658,7 +658,8 @@ struct file *open_exec(const char *name) int err = vfs_permission(&nd, MAY_EXEC); file = ERR_PTR(err); if (!err) { - file = nameidata_to_filp(&nd, O_RDONLY); + file = nameidata_to_filp(&nd, + O_RDONLY|O_LARGEFILE); if (!IS_ERR(file)) { err = deny_write_access(file); if (err) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] exec: allow > 2GB executables to run on 64-bit systems
Andi Kleen wrote: I agree in theory. We've only seen instances on 64-bitters... I think that's because gcc does not support the medium/large code models for i386. Although in theory someone could create an executable with a large enough .data. -Andi Or perhaps huge debuginfo section(s)? The x86_64 instance had 2.5GB .debug_macinfo DWARF section. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] exec: allow > 2GB executables to run on 64-bit systems
Andi Kleen wrote: Dave Anderson <[EMAIL PROTECTED]> writes: When a executable that is greater than 2GB in size is attempted on a 64-bit system on a file system that calls, or uses generic_file_open() as its open handler, it fails with an EOVERFLOW erro. This patch adds a call to force_o_largefile() call in open_exec(), as done in sys_open() and sys_openat(). Wouldn't it be better to just always pass O_LARGEFILE unconditionally there? e.g. in theory a 2.5GB executable should work on i386 and binfmt_* shouldn't have any problems with a large file. That would simplify your patch. -Andi I agree in theory. We've only seen instances on 64-bitters... Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] exec: allow > 2GB executables to run on 64-bit systems
When a executable that is greater than 2GB in size is attempted on a 64-bit system on a file system that calls, or uses generic_file_open() as its open handler, it fails with an EOVERFLOW erro. This patch adds a call to force_o_largefile() call in open_exec(), as done in sys_open() and sys_openat(). Signed-off-by: [EMAIL PROTECTED] --- --- linux-2.6.24-rc3/fs/exec.c.orig +++ linux-2.6.24-rc3/fs/exec.c @@ -658,7 +658,8 @@ struct file *open_exec(const char *name) int err = vfs_permission(&nd, MAY_EXEC); file = ERR_PTR(err); if (!err) { - file = nameidata_to_filp(&nd, O_RDONLY); + file = nameidata_to_filp(&nd, force_o_largefile() ? + O_RDONLY|O_LARGEFILE : O_RDONLY); if (!IS_ERR(file)) { err = deny_write_access(file); if (err) {
Re: Query: Kdump: Core Image ELF Format
"Eric W. Biederman" wrote: > Dave Anderson <[EMAIL PROTECTED]> writes: > > > vivek goyal wrote: > > > > > Hi, > > > > > > Kdump (A kexec based crash dumping mechanism) is going to export the > > > kernel core image in ELF format. ELF was chosen as a format, keeping in > > > mind that gdb can be used for limited debugging and "Crash" can be used > > > for advanced debugging. > > > > > > Core image ELF headers are prepared before crash and stored at a safe > > > place in memory. These headers are retrieved over a kexec boot and final > > > elf core image is prepared for analysis. > > > > > > Given the fact physical memory can be dis-contiguous, One program header > > > of type PT_LOAD is created for every contiguous memory chunk present in > > > the system. Other information like register states etc. is captured in > > > notes section. > > > > > > Now the issue is, on i386, whether to prepare core headers in ELF32 or > > > ELF64 format. gdb can not analyze ELF64 core image for i386 system. I > > > don't know about "crash". Can "crash" support ELF64 core image file for > > > i386 system? > > > > > > > Not in its current state, but it can certainly be modified to do so. > > The embedded gdb module never is even aware of the vmcore file. > > (It is essentially executed as "gdb vmlinux"). > > > > And currently crash only expects a single PT_LOAD section, but > > that's due for a change. It's been OK for its current set of supported > > processors to use sparse file space for non-existent memory, > > but it's kind of a pain with ia64's 256GB holes. > > Weird. A sparse file. I can almost see that but that looks like > a really bad format to transport from one system to another. > Exactly. Although the -S tar flag helps, and when uncompressed it actually makes a smaller file on the receiving end because of "real" zero-filled pages. > > > The point is that adapting crash to handle whatever format > > you come up with is the easy part of the whole equation. > > Good. Then the concentration should be a simple well understood > format that we don't need to change all of the time. > > > > Given the limitation of analysis tools, if core headers are prepared in > > > ELF32 format then how to handle PAE systems? > > > > > > > Are you asking about what would be the p_vaddr values for the higher > > memory segments? FWIW, with the single-PT_LOAD segment currently > > supported by crash, there's only one p_vaddr, but in any case, crash doesn't > > use it, so PAE is not a problem. > > PAE (physical address extension) are 64bit addresses on a 32bit box. So > vivek real question was where do we put the bytes. > Do I understand it correctly that crash currently just gets raw > memory data and the register state from the core file? Then it > figures out the virtual to physical mapping by looking at vmlinux? > > Eric Pretty much... The register set is there, but it's been primarily used to figure out the panic task from the kernel stack address there, although in later dump versions, the NT_TASKSTRUCT notes section gives us the same thing. Because of the various dump formats it's supported over time, yes, it tries to glean the vast majority of the information it needs by using the vmlinux file and raw data, and not rely on stuff in the various different header formats. In any case, I totally agree with your aims: > What I aimed at was a simple picture of memory decorated with the > register state. We should be able to derive everything beyond that. > And the fact that it is all in user space should make it straight > forward to change if needed. > Dave Anderson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Query: Kdump: Core Image ELF Format
vivek goyal wrote: > Hi, > > Kdump (A kexec based crash dumping mechanism) is going to export the > kernel core image in ELF format. ELF was chosen as a format, keeping in > mind that gdb can be used for limited debugging and "Crash" can be used > for advanced debugging. > > Core image ELF headers are prepared before crash and stored at a safe > place in memory. These headers are retrieved over a kexec boot and final > elf core image is prepared for analysis. > > Given the fact physical memory can be dis-contiguous, One program header > of type PT_LOAD is created for every contiguous memory chunk present in > the system. Other information like register states etc. is captured in > notes section. > > Now the issue is, on i386, whether to prepare core headers in ELF32 or > ELF64 format. gdb can not analyze ELF64 core image for i386 system. I > don't know about "crash". Can "crash" support ELF64 core image file for > i386 system? > Not in its current state, but it can certainly be modified to do so. The embedded gdb module never is even aware of the vmcore file. (It is essentially executed as "gdb vmlinux"). And currently crash only expects a single PT_LOAD section, but that's due for a change. It's been OK for its current set of supported processors to use sparse file space for non-existent memory, but it's kind of a pain with ia64's 256GB holes. The point is that adapting crash to handle whatever format you come up with is the easy part of the whole equation. > > Given the limitation of analysis tools, if core headers are prepared in > ELF32 format then how to handle PAE systems? > Are you asking about what would be the p_vaddr values for the higher memory segments? FWIW, with the single-PT_LOAD segment currently supported by crash, there's only one p_vaddr, but in any case, crash doesn't use it, so PAE is not a problem. Dave Anderson > > Any thoughts or suggestions on this? > > Thanks > Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/