Re: [PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

2019-10-22 Thread Dave Anderson


 Original Message -

> > 
> > [root linux]$ crash vmlinux
> > /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> > WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
> > 
> >   KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
> > DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL 
> > DUMP]
> > CPUS: 128
> > DATE: Thu Sep 19 08:31:18 2019
> >   UPTIME: 00:01:21
> > LOAD AVERAGE: 0.16, 0.07, 0.02
> >TASKS: 1343
> > NODENAME: amd-ethanol
> >  RELEASE: 5.3.0-rc7+
> >  VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
> >  MACHINE: x86_64  (2195 Mhz)
> >   MEMORY: 127.9 GB
> >PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> >  PID: 9789
> >  COMMAND: "bash"
> > TASK: "89711894ae80  [THREAD_INFO: 89711894ae80]"
> >  CPU: 83
> >STATE: TASK_RUNNING (PANIC)
> > 
> > crash> kmem -s|grep -i invalid
> > kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
> > freepointer:a6086ac099f0c5a4
> > kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
> > freepointer:a6086ac099f0c5a4
> > crash>
> 
> I fail to see what that's trying to tell me? You have invalid pointers?

Correct, because the pointer values are encrypted.  The command is walking 
through the
singly-linked list of free objects in a slab from the dma-kmalloc-512 slab 
cache.  The
slab memory had been allocated from low memory, and because of the  problem at 
hand,
it was was copied to the vmcore in its encrypted state.

Dave 



Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote:
> > No.  It needs *both* the vmlinux file and the vmcore file in order to read
> > kernel
> > virtual memory, so just having a kernel virtual address is insufficient.
> > 
> > So it's a chicken-and-egg situation.  This particular --osrelease option is 
> > used
> > to determine *what* vmlinux file would be required for an actual crash 
> > analysis
> > session.
> 
> Ok, that makes sense. I could've used that explanation when reviewing
> the documentation. Do you mind skimming through this:
> 
> https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic
> 
> in case we've missed explaining relevant usage - like that above - of
> some of the vmcoreinfo members?

Yeah, I've been watching the thread, and the document looks fine to me.
It's just that when I saw the discussion of this one being removed that
I felt the need to respond...  ;-)

Dave


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 03:07:33PM -0500, Dave Anderson wrote:
> > That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and
> > pulls out the OSRELEASE string from it, so that a user can determine what
> > vmlinux file should be used with that vmcore for normal crash analysis.
> 
> And the vmcoreinfo is part of the vmcore, right?

Correct.

> 
> So it can just as well read out the address of init_uts_ns and get the
> kernel version from there.

No.  It needs *both* the vmlinux file and the vmcore file in order to read 
kernel
virtual memory, so just having a kernel virtual address is insufficient.

So it's a chicken-and-egg situation.  This particular --osrelease option is used
to determine *what* vmlinux file would be required for an actual crash analysis 
session.

Dave

 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 02:36:47PM -0500, Dave Anderson wrote:
> > There's no reading of the dumpfile's memory involved, and that being the 
> > case,
> > the vmlinux file is not utilized.  That's the whole point of the crash 
> > option, i.e.,
> > taking a vmcore file, and trying to determine what kernel should be used
> > with it:
> > 
> >   $ man crash
> >   ...
> >--osrelease dumpfile
> >   Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
> > header.
> 
> I don't understand - if you have the vmcoreinfo (which I assume is part
> of the vmcore, yes, no?) you can go and dig out the kernel version from
> it, no?
> 
> Why should you not utilize the vmcore file?

That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and 
pulls out the OSRELEASE string from it, so that a user can determine what
vmlinux file should be used with that vmcore for normal crash analysis.

Dave

> 
> (I'm most likely missing something.)
> 
> > Well, I just don't agree that the OSRELEASE item is "frivolous". It's
> > been in place, and depended upon, for many years.
> 
> Yeah, no. The ABI argument is moot in this case as in the last couple
> of months people have been persuading me that vmcoreinfo is not ABI. So
> you guys need to make up your mind what is it. And if it is an ABI, it
> wasn't documented anywhere.
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
> 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson




- Original Message -
> On Mon, Jan 14, 2019 at 01:58:32PM -0500, Dave Anderson wrote:
> > Preferably it would be left as-is.  The crash utility has a "crash
> > --osrelease vmcore"
> > option that only looks at the dumpfile header, and just dump the string.
> > With respect
> > to compressed kdumps, crash could alternatively look at the utsname data
> > that is stored
> > in the diskdump_header.utsname field, but with ELF vmcores, there is no
> > such back-up.
> 
> Well, there is:
> 
>   4f 53 52 45 4c 45 41 53  45 3d 35 2e 30 2e 30 2d |OSRELEASE=5.0.0-|
> 0010  72 63 32 2b 0a 50 41 47  45 53 49 5a 45 3d 34 30 |rc2+.PAGESIZE=40|
> 0020  39 36 0a 53 59 4d 42 4f  4c 28 6d 65 6d 5f 73 65 |96.SYMBOL(mem_se|
> 0030  63 74 69 6f 6e 29 3d 66  66 66 66 66 66 66 66 38 |ction)=8|
> 0040  34 35 31 61 31 61 38 0a  53 59 4d 42 4f 4c 28 69 |451a1a8.SYMBOL(i|
> 0050  6e 69 74 5f 75 74 73 5f  6e 73 29 3d 66 66 66 66 |nit_uts_ns)=|
>  
> 0060  66 66 66 66 38 32 30 31  33 35 34 30 0a 53 59 4d  |82013540
>  
> 
> This address has it.

There's no reading of the dumpfile's memory involved, and that being the case,
the vmlinux file is not utilized.  That's the whole point of the crash option, 
i.e.,
taking a vmcore file, and trying to determine what kernel should be used with 
it:

  $ man crash
  ...
   --osrelease dumpfile
  Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
header.
  ...


> 
> > What's the problem with leaving it alone?
> 
> The problem is that I'd like to get all those vmcoreinfo exports under
> control and to not have people frivolously export whatever they feel
> like, for obvious reasons, and to get rid of the duplicate/unused pieces
> being part of vmcoreinfo.

Well, I just don't agree that the OSRELEASE item is "frivolous".  It's been in 
place,
and depended upon, for many years.

Dave



Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 05:48:48PM +, Kazuhito Hagio wrote:
> > As for makedumpfile, it will be not impossible to remove the
> > init_uts_ns.name.relase (OSRELEASE), but some fixes are needed.
> > Because historically OSRELEASE has been a kind of a mandatory entry
> > in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses
> > its existence to check whether a vmcoreinfo is sane.
> 
> Well, init_uts_ns is exported in vmcoreinfo anyway - makedumpfile
> can simply test init_uts_ns.name.release just as well. And the
> "historically" argument doesn't matter because vmcoreinfo is not an ABI.
> 
> So makedumpfile needs to be changed to check that new export.
> 
> > Also, I think crash also will need to be fixed if it is removed.

Preferably it would be left as-is.  The crash utility has a "crash --osrelease 
vmcore"
option that only looks at the dumpfile header, and just dump the string.  With 
respect
to compressed kdumps, crash could alternatively look at the utsname data that 
is stored
in the diskdump_header.utsname field, but with ELF vmcores, there is no such 
back-up.
What's the problem with leaving it alone?

Dave


> 
> Yes, I'm expecting user tools to be fixed and then exports removed.
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
> 


Re: BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)

2018-05-01 Thread Dave Anderson


- Original Message -
> 
> 
> - Original Message -
> > On 04/26/2018 02:16 PM, Kees Cook wrote:
> > > On Thu, Apr 26, 2018 at 12:31 PM, Dave Anderson 
> > > wrote:
> > >>
> > >> While testing /proc/kcore as the live memory source for the crash
> > >> utility,
> > >> it fails on arm64.  The failure on arm64 occurs because only the
> > >> vmalloc/module space segments are exported in PT_LOAD segments,
> > >> and it's missing all of the PT_LOAD segments for the generic
> > >> unity-mapped regions of physical memory, as well as their associated
> > >> vmemmap sections.
> > >>
> > >> The mapping of unity-mapped RAM segments in fs/proc/kcore.c is
> > >> architecture-neutral, and after debugging it, I found this as the
> > >> problem.  For each chunk of physical memory, kcore_update_ram()
> > >> calls walk_system_ram_range(), passing kclist_add_private() as a
> > >> callback function to add the chunk to the kclist, and eventually
> > >> leading to the creation of a PT_LOAD segment.
> > >>
> > >> kclist_add_private() does some verification of the memory region,
> > >> but this one below is bogus for arm64:
> > >>
> > >>  static int
> > >>  kclist_add_private(unsigned long pfn, unsigned long nr_pages, void
> > >>  *arg)
> > >>  {
> > >>  ... [ cut ] ...
> > >>  ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
> > >>  ... [ cut ] ...
> > >>
> > >>  /* Sanity check: Can happen in 32bit arch...maybe */
> > >>  if (ent->addr < (unsigned long) __va(0))
> > >>  goto free_out;
> > >>
> > >> And that's because __va(0) is a bogus check for arm64.  It is checking
> > >> whether the ent->addr value is less than the lowest possible
> > >> unity-mapped
> > >> address.  But "0" should not be used as a physical address on arm64; the
> > >> lowest legitimate physical address for this __va() check would be the
> > >> arm64
> > >> PHYS_OFFSET, or memstart_addr:
> > >>
> > >> Here's the arm64 __va() and PHYS_OFFSET:
> > >>
> > >>#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
> > >>#define __phys_to_virt(x)   ((unsigned long)((x) - PHYS_OFFSET) |
> > >>PAGE_OFFSET)
> > >>
> > >>extern s64  memstart_addr;
> > >>/* PHYS_OFFSET - the physical address of the start of memory. */
> > >>#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1);
> > >>memstart_addr; })
> > >>
> > >> If PHYS_OFFSET/memstart_addr is anything other than 0 (it is
> > >> 0x40
> > >> on my
> > >> test system), the __va(0) calculation goes negative and creates a bogus,
> > >> very
> > >> large, virtual address.  And since the ent->addr virtual address is less
> > >> than
> > >> bogus __va(0) address, the test fails, and the memory chunk is rejected.
> > >>
> > >> Looking at the kernel sources, it seems that this would affect other
> > >> architectures as well, i.e., the ones whose __va() is not a simple
> > >> addition of the physical address with PAGE_OFFSET.
> > >>
> > >> Anyway, I don't know what the best approach for an architecture-neutral
> > >> fix would be in this case.  So I figured I'd throw it out to you guys
> > >> for
> > >> some ideas.
> > > 
> > > I'm not as familiar with this code, but I've added Ard and Laura to CC
> > > here, as this feels like something they'd be able to comment on. :)
> > > 
> > > -Kees
> > > 
> > 
> > It seems backwards that we're converting a physical address to
> > a virtual address and then validating that. I think checking against
> > pfn_valid (to ensure there is a valid memmap entry)
> > and then checking page_to_virt against virt_addr_valid to catch
> > other cases (e.g. highmem or holes in the space) seems cleaner.
> 
> Hi Laura,
> 
> Thanks a lot for looking into this -- I couldn't find a maintainer for kcore.
> 
> The patch looks good to me, as long as virt_addr_valid() will fail on 32-bit
> arches when page_to_virt() create

Re: BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)

2018-04-30 Thread Dave Anderson


- Original Message -
> On 04/26/2018 02:16 PM, Kees Cook wrote:
> > On Thu, Apr 26, 2018 at 12:31 PM, Dave Anderson 
> > wrote:
> >>
> >> While testing /proc/kcore as the live memory source for the crash utility,
> >> it fails on arm64.  The failure on arm64 occurs because only the
> >> vmalloc/module space segments are exported in PT_LOAD segments,
> >> and it's missing all of the PT_LOAD segments for the generic
> >> unity-mapped regions of physical memory, as well as their associated
> >> vmemmap sections.
> >>
> >> The mapping of unity-mapped RAM segments in fs/proc/kcore.c is
> >> architecture-neutral, and after debugging it, I found this as the
> >> problem.  For each chunk of physical memory, kcore_update_ram()
> >> calls walk_system_ram_range(), passing kclist_add_private() as a
> >> callback function to add the chunk to the kclist, and eventually
> >> leading to the creation of a PT_LOAD segment.
> >>
> >> kclist_add_private() does some verification of the memory region,
> >> but this one below is bogus for arm64:
> >>
> >>  static int
> >>  kclist_add_private(unsigned long pfn, unsigned long nr_pages, void
> >>  *arg)
> >>  {
> >>  ... [ cut ] ...
> >>  ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
> >>  ... [ cut ] ...
> >>
> >>  /* Sanity check: Can happen in 32bit arch...maybe */
> >>  if (ent->addr < (unsigned long) __va(0))
> >>  goto free_out;
> >>
> >> And that's because __va(0) is a bogus check for arm64.  It is checking
> >> whether the ent->addr value is less than the lowest possible unity-mapped
> >> address.  But "0" should not be used as a physical address on arm64; the
> >> lowest legitimate physical address for this __va() check would be the
> >> arm64
> >> PHYS_OFFSET, or memstart_addr:
> >>
> >> Here's the arm64 __va() and PHYS_OFFSET:
> >>
> >>#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
> >>#define __phys_to_virt(x)   ((unsigned long)((x) - PHYS_OFFSET) |
> >>PAGE_OFFSET)
> >>
> >>extern s64  memstart_addr;
> >>/* PHYS_OFFSET - the physical address of the start of memory. */
> >>#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1);
> >>memstart_addr; })
> >>
> >> If PHYS_OFFSET/memstart_addr is anything other than 0 (it is 0x40
> >> on my
> >> test system), the __va(0) calculation goes negative and creates a bogus,
> >> very
> >> large, virtual address.  And since the ent->addr virtual address is less
> >> than
> >> bogus __va(0) address, the test fails, and the memory chunk is rejected.
> >>
> >> Looking at the kernel sources, it seems that this would affect other
> >> architectures as well, i.e., the ones whose __va() is not a simple
> >> addition of the physical address with PAGE_OFFSET.
> >>
> >> Anyway, I don't know what the best approach for an architecture-neutral
> >> fix would be in this case.  So I figured I'd throw it out to you guys for
> >> some ideas.
> > 
> > I'm not as familiar with this code, but I've added Ard and Laura to CC
> > here, as this feels like something they'd be able to comment on. :)
> > 
> > -Kees
> > 
> 
> It seems backwards that we're converting a physical address to
> a virtual address and then validating that. I think checking against
> pfn_valid (to ensure there is a valid memmap entry)
> and then checking page_to_virt against virt_addr_valid to catch
> other cases (e.g. highmem or holes in the space) seems cleaner.

Hi Laura,

Thanks a lot for looking into this -- I couldn't find a maintainer for kcore.  

The patch looks good to me, as long as virt_addr_valid() will fail on 32-bit
arches when page_to_virt() creates an invalid address when it gets passed a
highmem-physical address.  

Thanks again,
  Dave


> Maybe something like:
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index d1e82761de81..e64ecb9f2720 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -209,25 +209,34 @@ kclist_add_private(unsigned long pfn, unsigned long
> nr_pages, void *arg)
>   {
>   struct list_head *head = (struct list_head *)arg;
>   struct kcore_list *ent;
> + struct page *p;
> +
> + if (!pf

BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)

2018-04-26 Thread Dave Anderson

While testing /proc/kcore as the live memory source for the crash utility,
it fails on arm64.  The failure on arm64 occurs because only the 
vmalloc/module space segments are exported in PT_LOAD segments, 
and it's missing all of the PT_LOAD segments for the generic 
unity-mapped regions of physical memory, as well as their associated
vmemmap sections.
  
The mapping of unity-mapped RAM segments in fs/proc/kcore.c is 
architecture-neutral, and after debugging it, I found this as the
problem.  For each chunk of physical memory, kcore_update_ram()
calls walk_system_ram_range(), passing kclist_add_private() as a 
callback function to add the chunk to the kclist, and eventually 
leading to the creation of a PT_LOAD segment.
  
kclist_add_private() does some verification of the memory region,
but this one below is bogus for arm64:
  
static int
kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
{
... [ cut ] ...
ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
... [ cut ] ...
  
/* Sanity check: Can happen in 32bit arch...maybe */
if (ent->addr < (unsigned long) __va(0))
goto free_out;
  
And that's because __va(0) is a bogus check for arm64.  It is checking
whether the ent->addr value is less than the lowest possible unity-mapped
address.  But "0" should not be used as a physical address on arm64; the 
lowest legitimate physical address for this __va() check would be the arm64 
PHYS_OFFSET, or memstart_addr:
  
Here's the arm64 __va() and PHYS_OFFSET:
  
  #define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
  #define __phys_to_virt(x)   ((unsigned long)((x) - PHYS_OFFSET) | 
PAGE_OFFSET)
  
  extern s64  memstart_addr;
  /* PHYS_OFFSET - the physical address of the start of memory. */
  #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); 
memstart_addr; })
  
If PHYS_OFFSET/memstart_addr is anything other than 0 (it is 0x40 on my 
test system), the __va(0) calculation goes negative and creates a bogus, very 
large, virtual address.  And since the ent->addr virtual address is less than 
bogus __va(0) address, the test fails, and the memory chunk is rejected. 
  
Looking at the kernel sources, it seems that this would affect other
architectures as well, i.e., the ones whose __va() is not a simple
addition of the physical address with PAGE_OFFSET.
  
Anyway, I don't know what the best approach for an architecture-neutral
fix would be in this case.  So I figured I'd throw it out to you guys for
some ideas.

Dave Anderson




Re: [PATCH 0/2] Determine kernel text mapping size at runtime for x86_64

2016-12-08 Thread Dave Anderson


- Original Message -
> On Wed, Dec 7, 2016 at 11:56 PM, Baoquan He  wrote:
> > Dave Anderson ever told in Crash utility he makes judgement whether it's
> > a kaslr kernel by size of KERNEL_IMAGE_SIZE. As long as it's 1G, it's
> > recognized as kaslr. Then the current upstream kernel has a wrong behaviour,
> > it sets KERNEL_IMAGE_SIZE as 1G as long as CONFIG_RANDOMIZE_BASE is enabled,
> > though people specify "nokaslr" into cmdline to disable kaslr explicitly.
> 
> I'm not sure that's the correct solution to the Crash utility -- the
> kaslr-ness of a kernel should be already exposed in the dump with the
> kaslr_enabled variable yes?

The crash utility doesn't use KERNEL_IMAGE_SIZE to determine whether
KASLR is in play, but rather to determine the base of the modules virtual
address space (i.e, the same way the kernel does).  And then it uses that
value in a couple other places.

Dave


> 
> > So in this patchset, made changes to determine the size of kernel text
> > mapping
> > area at runtime. If "nokaslr" specified, kernel mapping size is 512M though
> > CONFIG_RANDOMIZE_BASE is enabled.
> 
> This seems to make the non-KASLR case more consistent, so I'm fine
> with the idea. Once the build-bots are happy with everything, consider
> the series:
> 
> Acked-by: Kees Cook 
> 
> Thanks!
> 
> -Kees
> 
> >
> > Baoquan He (2):
> >   x86/64: Make kernel text mapping always take one whole page table in
> > early boot code
> >   x86/KASLR/64: Determine kernel text mapping size at runtime
> >
> >  arch/x86/boot/compressed/kaslr.c| 15 ++-
> >  arch/x86/include/asm/kaslr.h|  1 +
> >  arch/x86/include/asm/page_64_types.h| 20 
> >  arch/x86/include/asm/pgtable_64_types.h |  2 +-
> >  arch/x86/kernel/head64.c| 11 ++-
> >  arch/x86/kernel/head_64.S   | 16 +---
> >  arch/x86/mm/dump_pagetables.c   |  3 ++-
> >  arch/x86/mm/init_64.c   |  2 +-
> >  arch/x86/mm/physaddr.c  |  6 +++---
> >  9 files changed, 45 insertions(+), 31 deletions(-)
> >
> > --
> > 2.5.5
> >
> 
> 
> 
> --
> Kees Cook
> Nexus Security
> 


Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-11-02 Thread Dave Anderson


- Original Message -
> Hi Dave,
> 
> On 11/01/16 at 10:13am, Dave Anderson wrote:
> > 
> > 
> > > > But we have this in mainline which also introduced the VMCOREINFO
> > > > numbers, can you send a patch to revert them?
> > > 
> > > OK, will do.
> > > 
> > > However for find_vmemmap_x86_64() in makedumpfile, vmemmap_start is
> > > still needed. I checked code, seems no better way to avoid. I am not
> > > sure how many people are really using "-e" option to exclude unused
> > > vmemmap pages.
> > > 
> > > Maybe just leave it as is, and fix it when people complain?
> > 
> > Speaking of complaints, is there any chance you can make the
> > x86_64 "phys_base" value available?  The VMCOREINFO_SYMBOL(phys_base)
> > is useless since its contents are needed in order to access the
> > symbol address.
> 
> Yeah, the current exporting of virt addr of phys_base is really useless
> for x86_64. While I saw you have got phys_base from kdump-ed vmcore elf
> program header since kexec-tools has created that pt_load for kernel text
> region.
> 
> machdep->machspec->phys_base = phdr->p_paddr - (phdr->p_vaddr &
> ~(__START_KERNEL_map));
> 
> Do you still want the value of phys_base? If yes, I can change it to
> export the real value of phys_base, not &phys_base.
> 
> In fact, exporting &phys_base was done in 2008, makedumpfile has taken
> the similar way you use in crash to get value of phys_base. Means it has
> been ignored very earlier. You could be the first person to complain
> about it.

No, this is not the first time it's been brought up.  Anyway, yes,
it would be preferable if it were readily available in vmcoreinfo
rather than depending upon the PT_LOAD semantics. 

Thanks,
  Dave
  


Re: [PATCH] kexec: Export memory sections virtual addresses to vmcoreinfo

2016-11-01 Thread Dave Anderson


- Original Message -
> On 11/01/16 at 01:10pm, Dave Young wrote:
> > On 10/06/16 at 04:46pm, Baoquan He wrote:
> > > KASLR memory randomization can randomize the base of the physical memory
> > > mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
> > > (VMEMMAP_START). These need be exported to VMCOREINFO so that user space
> > > utility, mainly makedumpfile can use them to identify the base of each
> > > memory section. Here using VMCOREINFO_NUMBER we can reuse the existing
> > > struct number_table in makedumpfile to import data easily.
> > > 
> > > Since they are related to x86_64 only, put them into
> > > arch_crash_save_vmcoreinfo. And move the exportion of KERNEL_IMAGE_SIZE
> > > together since it's also for x86_64 only.
> > > 
> > > Signed-off-by: Baoquan He 
> > > ---
> > >  arch/x86/kernel/machine_kexec_64.c | 4 
> > >  kernel/kexec_core.c| 3 ---
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/machine_kexec_64.c
> > > b/arch/x86/kernel/machine_kexec_64.c
> > > index 5a294e4..e150dd7 100644
> > > --- a/arch/x86/kernel/machine_kexec_64.c
> > > +++ b/arch/x86/kernel/machine_kexec_64.c
> > > @@ -337,6 +337,10 @@ void arch_crash_save_vmcoreinfo(void)
> > >  #endif
> > >   vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
> > > kaslr_offset());
> > > + VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> > > + VMCOREINFO_NUMBER(PAGE_OFFSET);
> > > + VMCOREINFO_NUMBER(VMALLOC_START);
> > > + VMCOREINFO_NUMBER(VMEMMAP_START);
> > 
> > Pratyush has posted makedumpfile patches below to avoid the VMCOREINFO:
> > http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
> > 
> > But we have this in mainline which also introduced the VMCOREINFO
> > numbers, can you send a patch to revert them?
> 
> OK, will do.
> 
> However for find_vmemmap_x86_64() in makedumpfile, vmemmap_start is
> still needed. I checked code, seems no better way to avoid. I am not
> sure how many people are really using "-e" option to exclude unused
> vmemmap pages.
> 
> Maybe just leave it as is, and fix it when people complain?

Speaking of complaints, is there any chance you can make the
x86_64 "phys_base" value available?  The VMCOREINFO_SYMBOL(phys_base)
is useless since its contents are needed in order to access the
symbol address.

Dave



> 
> > commit 0549a3c02efb350776bc869685a361045efd3a29
> > Author: Thomas Garnier 
> > Date:   Tue Oct 11 13:55:08 2016 -0700
> > 
> > kdump, vmcoreinfo: report memory sections virtual addresses
> > [snip]]
> > 
> > >  }
> > >  
> > >  /* arch-dependent functionality related to kexec file-based syscall */
> > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > > index 5616755..8ad3a29e 100644
> > > --- a/kernel/kexec_core.c
> > > +++ b/kernel/kexec_core.c
> > > @@ -1467,9 +1467,6 @@ static int __init crash_save_vmcoreinfo_init(void)
> > >  #endif
> > >   VMCOREINFO_NUMBER(PG_head_mask);
> > >   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> > > -#ifdef CONFIG_X86
> > > - VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
> > > -#endif
> > 
> > Moving KERNEL_IMAGE_SIZE to x86 should be a standalone patch.
> > I remember Dave Anderson said he use it in crash utility, cced him.
> > 
> > >  #ifdef CONFIG_HUGETLB_PAGE
> > >   VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
> > >  #endif
> > > --
> > > 2.5.5
> > > 
> > > 
> > > ___
> > > kexec mailing list
> > > ke...@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/kexec
> > 
> > Thanks
> > Dave
> 


Re: crash/gdb: DW_FORM_strp pointing outside of .debug_str section

2014-11-19 Thread Dave Anderson

> Hi,
> I have been trying to debug a multi controller hang issue with crash.
> 
> It seems that crash works fine on stock ubuntu 14.04 running
> 3.13.0-39-generic. However when I recompile kernel and try to run
> crash on it I get the following error:
> 
> crash> mod -s nvme ./nvme.ko
>  MODULE   NAMESIZE  OBJECT FILE
> a01d1240  nvme   58727  ./nvme.ko
> crash> add-symbol-file ./nvme.ko 0xa01d1240
> add symbol table from file "./nvme.ko" at
> .text_addr = 0xa01d1240
> Reading symbols from /root/mynvme.sriov/nvme.ko...DW_FORM_strp
> pointing outside of .debug_str section [in module
> /root/mynvme.sriov/nvme.ko]
> gdb: gdb request failed: add-symbol-file ./nvme.ko 0xa01d1240
> crash>
> Any input on what I might have done wrong? Do I need to rebuild the
> crash utility package along with the kernel?
> 
> Thanks,
> Kallol
 
This is the wrong mailing list for crash utility issues.  Please send
your queries to crash-util...@redhat.com.

That being said, the "mod -s" command does the add-symbol-file command
behind the scenes.  Your discrete "add-symbol-file" command is both
unnecessary and incorrect -- 0xffffa01d1240 is the address
of the nvme module's module data structure and not the starting
text address.

Dave Anderson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: Export 'pageflag_names' array

2013-10-08 Thread Dave Anderson


- Original Message -
> Hi
> 
> On Tue, Oct 8, 2013 at 5:40 AM, Dave Anderson  wrote:
> >
> >
> > - Original Message -
> >> Hi Anatol,
> >>
> >> On Mon, Oct 07, 2013 at 10:53:32AM -0700, Anatol Pomozov wrote:
> >> > Hi Wu
> >> >
> >> > I have a request wrt your old commit 718a38211.
> >> >
> >> > I think it makes sense to export array pageflag_names so kernel dump
> >> > debug tools (like 'crash') can use it as well. Currently the tool
> >> > hard-codes flag names but it is suboptimal as flags are different for
> >> > different configs.
> >> >
> >> > What do you think? (I can send a patch if you are ok).
> >>
> >> I wonder if the KPF_* defined in
> >>
> >> include/uapi/linux/kernel-page-flags.h
> >>
> >> fit your needs. These are kernel page flags exported to the user space
> >> and will be maintained stable and immune to kconfig changes. You can
> >> find use examples of them in
> >>
> >> fs/proc/page.c
> >> tools/vm/page-types.c
> >>
> >> Thanks,
> >> Fengguang
> >>
> >
> > Nothing is required for the crash utility.  The pageflag_names array (as 
> > well
> > as the pageflags enumerator) are readily available in the kernel debuginfo 
> > data.
> 
> D'oh. You are right Dave. Everyone, please ignore my original question
> I mislooked this variable in symbols map (or maybe looked at an old
> kernel map).
> 
> Dave, do you think 'crash' should use pageflag_names to output flag names?

It's a good addition -- I'm working on a patch as we speak...

Thanks,
  Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

2013-10-08 Thread Dave Anderson


- Original Message -
> (2013/10/07 22:21), Dave Anderson wrote:
> 
> > - Original Message -
> >> (2013/10/03 22:47), Dave Anderson wrote:
> 
> >>> - Original Message -
> >>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> >>>>> (2013/10/02 16:48), Kees Cook wrote:
> 
> >>
> >> Thanks for detailed explanation. So, there's already a feature in crash
> >> utility
> >> to address relocation!, though it's better for me to try them to check if
> >> it's
> >> really applicable to this feature. My concern is whether --reloc works
> >> well
> >> on x86_64 too, because relocation has never done on x86_64 ever, right?
> >
> > Correct.
> >
> >> Another concern is that in case of relocation, users need to additional 
> >> information
> >> regarding runtime symbol information to crash utility. I want to avoid 
> >> additional
> >> process, automation is preferable if possible.
> >
> > Right.  As I mentioned in the case of 32-bit x86 dumpfiles, there is no 
> > automation
> > available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN.  
> > The user
> > either has to be aware of their values in order to calculate the --reloc 
> > argument,
> > or has to capture a copy of the /proc/kallsyms file on the crashed system.  
> > Typically
> > users/distros using kdump changed their x86 configurations to avoid having 
> > to deal
> > with that.
> >
> 
> Sorry, I don't understand why relocation size cannot be calculated when
> CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?

I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN, the 
32-bit x86 kernel
gets relocated (like the secondary kdump kernel), but that information is not 
readily available
from the vmlinux/vmcore pair.  

> 
> >> I guess it's enough if there's runtime symbol addresses because we can get 
> >> relocated
> >> offset value by comparing it with the compile-time symbol address 
> >> contained in
> >> a given debuginfo file. Candidates for such symbols are the ones contained 
> >> in
> >> VMCOREINFO note containing some symbol values for makedumpfile to refer to 
> >> mm-related
> >> objects in kernel, which is always contained in vmcore generated by 
> >> current kdump and
> >> also vmcores converted by makedumpfile from it. How about this idea?
> >
> > But how would that differ from using an incorrect (non-matching) vmlinux 
> > file?
> >
> 
> It seems to me almost similar to what crash currently does even if we do 
> relocation check.
> The current check crash currently does is trial-and-error since there's no 
> information
> indicating given vmcore and vmlinuxcertainly match well.
> 
> For example, the process I imagine is:
> 
>1) try to match vmcore and vmlinux with no relocation. If fails, go to 2).
>2) try to match vmcore and vmlinux with relocation.
> 
> The two steps include symbol table initialization so it might actually be 
> difficult to
> resume back from 2) to 1).
> 
> Also, if gap due to phys_base and gap due to relocation can happen at the 
> same time,
> calculating two values automatically might be futher complicated. So, it 
> would be better
> to add relocation value in VMCOREINFO. Then, what crash utility sholud do 
> becomes very simple.

Yes please...

And while you're at it, the kernel's

  VMCOREINFO_SYMBOL(phys_base);

is pretty much useless, at least w/respect to ELF vmcores, since we need to 
know its
value in order to translate the address.  And I don't think that makedumpfile 
uses
it when it calculates the phys_base that it stores in compressed kdump headers. 
 Why
not put its value instead of its address?

> BTW, can it really happen that gaps due to phys_base and due to relocation 
> happen at the
> same time? I feel relocation covers phys_base mechanism. If there's 
> relocation, phys_base
> is not necessary.
> 
> --
> Thanks.
> HATAYAMA, Daisuke
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: Export 'pageflag_names' array

2013-10-08 Thread Dave Anderson


- Original Message -
> Hi Anatol,
> 
> On Mon, Oct 07, 2013 at 10:53:32AM -0700, Anatol Pomozov wrote:
> > Hi Wu
> > 
> > I have a request wrt your old commit 718a38211.
> > 
> > I think it makes sense to export array pageflag_names so kernel dump
> > debug tools (like 'crash') can use it as well. Currently the tool
> > hard-codes flag names but it is suboptimal as flags are different for
> > different configs.
> > 
> > What do you think? (I can send a patch if you are ok).
> 
> I wonder if the KPF_* defined in
> 
> include/uapi/linux/kernel-page-flags.h
> 
> fit your needs. These are kernel page flags exported to the user space
> and will be maintained stable and immune to kconfig changes. You can
> find use examples of them in
> 
> fs/proc/page.c
> tools/vm/page-types.c
> 
> Thanks,
> Fengguang
> 

Nothing is required for the crash utility.  The pageflag_names array (as well
as the pageflags enumerator) are readily available in the kernel debuginfo
data.

Dave
  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

2013-10-07 Thread Dave Anderson


- Original Message -
> (2013/10/03 22:47), Dave Anderson wrote:
> >
> >
> > - Original Message -
> >> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> >>> (2013/10/02 16:48), Kees Cook wrote:
> >> 
> >>>>>> +
> >>>>>> + return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/*
> >>>>>>  * Determine if we were loaded by an EFI loader.  If so, then we
> >>>>>>  have also been
> >>>>>>  * passed the efi memmap, systab, etc., so we should use these
> >>>>>>  data structures
> >>>>>>  * for initialization.  Note, the efi init code path is determined
> >>>>>>  by the
> >>>>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
> >>>>>> }
> >>>>>>
> >>>>>> #endif /* CONFIG_X86_32 */
> >>>>>> +
> >>>>>> +static struct notifier_block kernel_offset_notifier = {
> >>>>>> + .notifier_call = dump_kernel_offset
> >>>>>> +};
> >>>>>> +
> >>>>>> +static int __init register_kernel_offset_dumper(void)
> >>>>>> +{
> >>>>>> + atomic_notifier_chain_register(&panic_notifier_list,
> >>>>>> + &kernel_offset_notifier);
> >>>>>> + return 0;
> >>>>>> +}
> >>>>>> +__initcall(register_kernel_offset_dumper);
> >>>>>>
> >>>>>
> >>>>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS
> >>>>> doesn't use
> >>>>> kdump? Anyway, kdump related tools now calculate phys_base from memory
> >>>>> map
> >>>>> information passed as ELF PT_LOAD entries like below.
> >>>>
> >>>> Correct, we are not currently using kdump.
> >>>>
> >>>>> $ LANG=C readelf -l vmcore-rhel6up4
> >>>>>
> >>>>> Elf file type is CORE (Core file)
> >>>>> Entry point 0x0
> >>>>> There are 5 program headers, starting at offset 64
> >>>>>
> >>>>> Program Headers:
> >>>>> Type   Offset VirtAddr   PhysAddr
> >>>>>FileSizMemSiz  Flags  Align
> >>>>> NOTE   0x0158 0x
> >>>>> 0x
> >>>>>0x0b08 0x0b08 0
> >>>>> LOAD   0x0c60 0x8100
> >>>>> 0x0100
> >>>>>0x0103b000 0x0103b000  RWE0
> >>>>> LOAD   0x0103bc60 0x88001000
> >>>>> 0x1000
> >>>>>0x0009cc00 0x0009cc00  RWE0
> >>>>> LOAD   0x010d8860 0x8810
> >>>>> 0x0010
> >>>>>0x02f0 0x02f0  RWE0
> >>>>> LOAD   0x03fd8860 0x88001300
> >>>>> 0x1300
> >>>>>0x2cffd000 0x2cffd000  RWE0
> >>>>>
> >>>>> Each PT_LOAD entry is assigned to virtual and physical address. In this
> >>>>> case,
> >>>>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we
> >>>>> can
> >>>>> calculate phys_base value.
> >>>>
> >>>> It seems like all the information you need would still be available?
> >>>> The virtual address is there, so it should be trivial to see the
> >>>> offset, IIUC.
> >>>>
> >>>
> >>> Partially yes. I think OK to analyze crash dump by crash utility, a
> >>> gdb-based
> >>> symbolic debugger for kernel, since phys_base absorbs kernel offset
> >>> caused by
> >>> relocation and phys_base is available in the way I explained above.
> >>>
> >>> However, the gained phys_base is not correct one, exac

Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

2013-10-03 Thread Dave Anderson


- Original Message -
> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> > (2013/10/02 16:48), Kees Cook wrote:
> 
>  +
>  + return 0;
>  +}
>  +
>  +/*
>  * Determine if we were loaded by an EFI loader.  If so, then we have 
>  also been
>  * passed the efi memmap, systab, etc., so we should use these data 
>  structures
>  * for initialization.  Note, the efi init code path is determined by 
>  the
>  @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
> }
> 
> #endif /* CONFIG_X86_32 */
>  +
>  +static struct notifier_block kernel_offset_notifier = {
>  + .notifier_call = dump_kernel_offset
>  +};
>  +
>  +static int __init register_kernel_offset_dumper(void)
>  +{
>  + atomic_notifier_chain_register(&panic_notifier_list,
>  + &kernel_offset_notifier);
>  + return 0;
>  +}
>  +__initcall(register_kernel_offset_dumper);
> 
> >>>
> >>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS 
> >>> doesn't use
> >>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
> >>> information passed as ELF PT_LOAD entries like below.
> >>
> >> Correct, we are not currently using kdump.
> >>
> >>> $ LANG=C readelf -l vmcore-rhel6up4
> >>>
> >>> Elf file type is CORE (Core file)
> >>> Entry point 0x0
> >>> There are 5 program headers, starting at offset 64
> >>>
> >>> Program Headers:
> >>>Type   Offset VirtAddr   PhysAddr
> >>>   FileSizMemSiz  Flags  Align
> >>>NOTE   0x0158 0x  
> >>> 0x
> >>>   0x0b08 0x0b08 0
> >>>LOAD   0x0c60 0x8100  
> >>> 0x0100
> >>>   0x0103b000 0x0103b000  RWE0
> >>>LOAD   0x0103bc60 0x88001000  
> >>> 0x1000
> >>>   0x0009cc00 0x0009cc00  RWE0
> >>>LOAD   0x010d8860 0x8810  
> >>> 0x0010
> >>>   0x02f0 0x02f0  RWE0
> >>>LOAD   0x03fd8860 0x88001300  
> >>> 0x1300
> >>>   0x2cffd000 0x2cffd000  RWE0
> >>>
> >>> Each PT_LOAD entry is assigned to virtual and physical address. In this 
> >>> case,
> >>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
> >>> calculate phys_base value.
> >>
> >> It seems like all the information you need would still be available?
> >> The virtual address is there, so it should be trivial to see the
> >> offset, IIUC.
> >>
> >
> > Partially yes. I think OK to analyze crash dump by crash utility, a 
> > gdb-based
> > symbolic debugger for kernel, since phys_base absorbs kernel offset caused 
> > by
> > relocation and phys_base is available in the way I explained above.
> >
> > However, the gained phys_base is not correct one, exactly phys_base + 
> > offset_by_relocation.
> > When analyzing crash dump by crash utility, we use debug information 
> > generated
> > during kernel build, which we install as kernel-debuginfo on RHEL for 
> > example.
> > Symbols in debuginfo have statically assigned addresses at build so we see
> > the statically assigned addresses during debugging and we see
> > phys_base + offset_by_relocation as phys_base. This would be problematic
> > if failure on crash dump is relevant to the relocated addresses, though I 
> > don't
> > immediately come up with crash senario where relocated symbol is defitely 
> > necessary.
> >
> > Still we can get relocated addresses if kallsyms is enabled on the kernel,
> > but kallsyms and relocatable kernels are authogonal. I don't think it 
> > natural
> > to rely on kallsyms. It seems natural to export relocation information newly
> > as debugging information.
> >
> 
> I was confused yesterday. As I said above, kdump related tools now don't 
> support
> relocation on x86_64, phys_base only. kdump related tools think of present 
> kernel
> offset as phys_base. Then, they reflect kernel offset caused by relocation in
> physical addresses only, not in virtual addresses. This obviously affects the
> tools.
> 
> BTW, relocation looks more sophisticated than phys_base one. Is it possible to
> switch from phys_base one to relocation on x86_64? On x86, relocation is used 
> so
> I guess x86_64 can work in the same way. Is there something missing?
> Is there what phys_base can but relocation cannot on x86_64?
> 
> And, Dave, is there feature for crash utility to treat relocation now?

Well sort of, there are couple guessing-game kludges that can be used.

For 32-bit x86 systems configured with a CONFIG_PHYSICAL_START value 
that is larger than its CONFIG_PHYSICAL_

Re: [RFC PATCH 0/8] remove vm_struct list management

2012-12-11 Thread Dave Anderson


- Original Message -
> On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:
> 
> [..]
> > > So without knowing details of both the data structures, I think if vmlist
> > > is going away, then user space tools should be able to traverse 
> > > vmap_area_root
> > > rb tree. I am assuming it is sorted using ->addr field and we should be
> > > able to get vmalloc area start from there. It will just be a matter of
> > > exporting right fields to user space (instead of vmlist).
> > 
> > There is address sorted list of vmap_area, vmap_area_list.
> > So we can use it for traversing vmalloc areas if it is necessary.
> > But, as I mentioned before, kexec write *just* address of vmlist and
> > offset of vm_struct's address field.  It imply that they don't traverse 
> > vmlist,
> > because they didn't write vm_struct's next field which is needed for 
> > traversing.
> > Without vm_struct's next field, they have no method for traversing.
> > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] 
> > is
> > a safe way to maintain a compatibility of userspace tool. :)
> 
> Actually the design of "makedumpfile" and "crash" tool is that they know
> about kernel data structures and they adopt to changes. So for major
> changes they keep track of kernel version numbers and if access the
> data structures accordingly.
> 
> Currently we access first element of vmlist to determine start of vmalloc
> address. True we don't have to traverse the list.
> 
> But as you mentioned we should be able to get same information by
> traversing to left most element of vmap_area_list rb tree. So I think
> instead of trying to retain vmlist first element just for backward
> compatibility, I will rather prefer get rid of that code completely
> from kernel and let user space tool traverse rbtree. Just export
> minimum needed info for traversal in user space.

There's no need to traverse the rbtree.  There is a vmap_area_list
linked list of vmap_area structures that is also sorted by virtual
address.

All that makedumpfile would have to do is to access the first vmap_area
in the vmap_area_list -- as opposed to the way that it does now, which is
by accessing the first vm_struct in the to-be-obsoleted vmlist list.

So it seems silly to keep the dummy "vmlist" around.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/8] remove vm_struct list management

2012-12-11 Thread Dave Anderson


- Original Message -

> > Can we get the same information from this rb-tree of vmap_area? Is
> > ->va_start field communication same information as vmlist was
> > communicating? What's the difference between vmap_area_root and vmlist.
> 
> Thanks for comment.
> 
> Yes. vmap_area's va_start field represent same information as vm_struct's 
> addr.
> vmap_area_root is data structure for fast searching an area.
> vmap_area_list is address sorted list, so we can use it like as vmlist.
> 
> There is a little difference vmap_area_list and vmlist.
> vmlist is lack of information about some areas in vmalloc address space.
> For example, vm_map_ram() allocate area in vmalloc address space,
> but it doesn't make a link with vmlist. To provide full information
> about vmalloc address space, using vmap_area_list is more adequate.
> 
> > So without knowing details of both the data structures, I think if vmlist
> > is going away, then user space tools should be able to traverse 
> > vmap_area_root
> > rb tree. I am assuming it is sorted using ->addr field and we should be
> > able to get vmalloc area start from there. It will just be a matter of
> > exporting right fields to user space (instead of vmlist).
> 
> There is address sorted list of vmap_area, vmap_area_list.
> So we can use it for traversing vmalloc areas if it is necessary.
> But, as I mentioned before, kexec write *just* address of vmlist and
> offset of vm_struct's address field.  It imply that they don't traverse 
> vmlist,
> because they didn't write vm_struct's next field which is needed for 
> traversing.
> Without vm_struct's next field, they have no method for traversing.
> So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
> a safe way to maintain a compatibility of userspace tool. :)

Why bother keeping vmlist around?  kdump's makedumpfile command would not
even need to traverse the vmap_area rbtree, because it could simply look
at the first vmap_area in the sorted vmap_area_list, correct?

Dave Anderson


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ptrace: fix 2.6.25 ptrace_bts_config structure declaration

2008-02-21 Thread Dave Anderson


The 2.6.25 ptrace_bts_config structure in asm-x86/ptrace-abi.h
is defined with u32 types:

  #include 

  /* configuration/status structure used in PTRACE_BTS_CONFIG and
 PTRACE_BTS_STATUS commands.
  */
  struct ptrace_bts_config {
  /* requested or actual size of BTS buffer in bytes */
  u32 size;
  /* bitmask of below flags */
  u32 flags;
  /* buffer overflow signal */
  u32 signal;
  /* actual size of bts_struct in bytes */
  u32 bts_size;
  };
  #endif

But u32 is only accessible in asm-x86/types.h if __KERNEL__,
leading to compile errors when ptrace.h is included from
user-space.  Perhaps the double-underscore versions that
are exported to user-space in asm-x86/types.h should be
used instead?

Signed-off-by: [EMAIL PROTECTED]






--- linux-2.6.25-rc2/include/asm-x86/ptrace-abi.h.orig
+++ linux-2.6.25-rc2/include/asm-x86/ptrace-abi.h
@@ -89,13 +89,13 @@
 */
 struct ptrace_bts_config {
 	/* requested or actual size of BTS buffer in bytes */
-	u32 size;
+	__u32 size;
 	/* bitmask of below flags */
-	u32 flags;
+	__u32 flags;
 	/* buffer overflow signal */
-	u32 signal;
+	__u32 signal;
 	/* actual size of bts_struct in bytes */
-	u32 bts_size;
+	__u32 bts_size;
 };
 #endif
 


Re: [NEW-PATCH] exec: allow > 2GB executables to run on 64-bit systems

2007-12-06 Thread Dave Anderson

Andi Kleen wrote:

Since Dave didn't post an updated patch. This is how I think what
the patch should be. I also changed sys_uselib just to be complete.



Thanks Andi -- I just tested open_exec() w/O_LARGEFILE on an
i386 with a 2.5GB+ binary (mostly debuginfo), and it works as
expected.  Interesting to note that the test binary couldn't
be compiled with i386 gcc, but it could be built with x86_64
gcc -m32.

Dave






Always use O_LARGEFILE for opening executables

This allows to use executables >2GB.

Based on a patch by Dave Anderson

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux-2.6.24-rc3/fs/exec.c
===
--- linux-2.6.24-rc3.orig/fs/exec.c
+++ linux-2.6.24-rc3/fs/exec.c
@@ -119,7 +119,7 @@ asmlinkage long sys_uselib(const char __
if (error)
goto exit;
 
-	file = nameidata_to_filp(&nd, O_RDONLY);

+   file = nameidata_to_filp(&nd, O_RDONLY|O_LARGEFILE);
error = PTR_ERR(file);
if (IS_ERR(file))
goto out;
@@ -658,7 +658,8 @@ struct file *open_exec(const char *name)
int err = vfs_permission(&nd, MAY_EXEC);
file = ERR_PTR(err);
if (!err) {
-   file = nameidata_to_filp(&nd, O_RDONLY);
+   file = nameidata_to_filp(&nd,
+   O_RDONLY|O_LARGEFILE);
if (!IS_ERR(file)) {
err = deny_write_access(file);
if (err) {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] exec: allow > 2GB executables to run on 64-bit systems

2007-12-05 Thread Dave Anderson

Andi Kleen wrote:

I agree in theory.  We've only seen instances on 64-bitters...



I think that's because gcc does not support the medium/large code models
for i386. Although in theory someone could create an executable with
a large enough .data.

-Andi


Or perhaps huge debuginfo section(s)?  The x86_64 instance
had 2.5GB .debug_macinfo DWARF section.

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] exec: allow > 2GB executables to run on 64-bit systems

2007-12-05 Thread Dave Anderson

Andi Kleen wrote:

Dave Anderson <[EMAIL PROTECTED]> writes:



When a executable that is greater than 2GB in size is attempted on a 64-bit
system on a file system that calls, or uses generic_file_open() as its
open handler, it fails with an EOVERFLOW erro.  This patch adds a call
to force_o_largefile() call in open_exec(), as done in sys_open() and
sys_openat().



Wouldn't it be better to just always pass O_LARGEFILE unconditionally
there? e.g. in theory a 2.5GB executable should work on i386 and binfmt_*
shouldn't have any problems with a large file.
That would simplify your patch.

-Andi



I agree in theory.  We've only seen instances on 64-bitters...

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] exec: allow > 2GB executables to run on 64-bit systems

2007-12-05 Thread Dave Anderson

When a executable that is greater than 2GB in size is attempted on a 64-bit
system on a file system that calls, or uses generic_file_open() as its
open handler, it fails with an EOVERFLOW erro.  This patch adds a call
to force_o_largefile() call in open_exec(), as done in sys_open() and
sys_openat().

Signed-off-by: [EMAIL PROTECTED]
---
--- linux-2.6.24-rc3/fs/exec.c.orig
+++ linux-2.6.24-rc3/fs/exec.c
@@ -658,7 +658,8 @@ struct file *open_exec(const char *name)
int err = vfs_permission(&nd, MAY_EXEC);
file = ERR_PTR(err);
if (!err) {
-   file = nameidata_to_filp(&nd, O_RDONLY);
+   file = nameidata_to_filp(&nd, 
force_o_largefile() ?
+   O_RDONLY|O_LARGEFILE : O_RDONLY);
if (!IS_ERR(file)) {
err = deny_write_access(file);
if (err) {


Re: Query: Kdump: Core Image ELF Format

2005-03-08 Thread Dave Anderson
"Eric W. Biederman" wrote:

> Dave Anderson <[EMAIL PROTECTED]> writes:
>
> > vivek goyal wrote:
> >
> > > Hi,
> > >
> > > Kdump (A kexec based crash dumping mechanism) is going to export the
> > > kernel core image in ELF format. ELF was chosen as a format, keeping in
> > > mind that gdb can be used for limited debugging and "Crash" can be used
> > > for advanced debugging.
> > >
> > > Core image ELF headers are prepared before crash and stored at a safe
> > > place in memory. These headers are retrieved over a kexec boot and final
> > > elf core image is prepared for analysis.
> > >
> > > Given the fact physical memory can be dis-contiguous, One program header
> > > of type PT_LOAD is created for every contiguous memory chunk present in
> > > the system. Other information like register states etc. is captured in
> > > notes section.
> > >
> > > Now the issue is, on i386, whether to prepare core headers in ELF32 or
> > > ELF64 format. gdb can not analyze ELF64 core image for i386 system. I
> > > don't know about "crash". Can "crash" support ELF64 core image file for
> > > i386 system?
> > >
> >
> > Not in its current state, but it can certainly be modified to do so.
> > The embedded gdb module never is even aware of the vmcore file.
> > (It is essentially executed as "gdb vmlinux").
> >
> > And currently crash only expects a single PT_LOAD section, but
> > that's due for a change.  It's been OK for its current set of supported
> > processors to use sparse file space for non-existent memory,
> > but it's kind of a pain with ia64's 256GB holes.
>
> Weird.  A sparse file.  I can almost see that but that looks like
> a really bad format to transport from one system to another.
>

Exactly.  Although the -S tar flag helps, and when uncompressed
it actually makes a smaller file on the receiving end because of
"real" zero-filled pages.

>
> > The point is that adapting crash to handle whatever format
> > you come up with is the easy part of the whole equation.
>
> Good.  Then the concentration should be a simple well understood
> format that we don't need to change all of the time.
>
> > > Given the limitation of analysis tools, if core headers are prepared in
> > > ELF32 format then how to handle PAE systems?
> > >
> >
> > Are you asking about what would be the p_vaddr values for the higher
> > memory segments?   FWIW, with the single-PT_LOAD segment currently
> > supported by crash, there's only one p_vaddr, but in any case, crash doesn't
> > use it, so PAE is not a problem.
>
> PAE (physical address extension) are 64bit addresses on a 32bit box.  So
> vivek real question was where do we put the bytes.

> Do I understand it correctly that crash currently just gets raw
> memory data and the register state from the core file?  Then it
> figures out the virtual to physical mapping by looking at vmlinux?
>
> Eric

Pretty much...

The register set is there, but it's been primarily used to figure out
the panic task from the kernel stack address there, although in later
dump versions, the NT_TASKSTRUCT notes section gives us the same thing.
Because of the various dump formats it's supported over time, yes, it
tries to glean the vast majority of the information it needs by using
the vmlinux file and raw data, and not rely on stuff in the various
different header formats.

In any case, I totally agree with your aims:

> What I aimed at was a simple picture of memory decorated with the
> register state.  We should be able to derive everything beyond that.
> And the fact that it is all in user space should make it straight
> forward to change if needed.
>

Dave Anderson


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Query: Kdump: Core Image ELF Format

2005-03-08 Thread Dave Anderson
vivek goyal wrote:

> Hi,
>
> Kdump (A kexec based crash dumping mechanism) is going to export the
> kernel core image in ELF format. ELF was chosen as a format, keeping in
> mind that gdb can be used for limited debugging and "Crash" can be used
> for advanced debugging.
>
> Core image ELF headers are prepared before crash and stored at a safe
> place in memory. These headers are retrieved over a kexec boot and final
> elf core image is prepared for analysis.
>
> Given the fact physical memory can be dis-contiguous, One program header
> of type PT_LOAD is created for every contiguous memory chunk present in
> the system. Other information like register states etc. is captured in
> notes section.
>
> Now the issue is, on i386, whether to prepare core headers in ELF32 or
> ELF64 format. gdb can not analyze ELF64 core image for i386 system. I
> don't know about "crash". Can "crash" support ELF64 core image file for
> i386 system?
>

Not in its current state, but it can certainly be modified to do so.
The embedded gdb module never is even aware of the vmcore file.
(It is essentially executed as "gdb vmlinux").

And currently crash only expects a single PT_LOAD section, but
that's due for a change.  It's been OK for its current set of supported
processors to use sparse file space for non-existent memory,
but it's kind of a pain with ia64's 256GB holes.

The point is that adapting crash to handle whatever format
you come up with is the easy part of the whole equation.

>
> Given the limitation of analysis tools, if core headers are prepared in
> ELF32 format then how to handle PAE systems?
>

Are you asking about what would be the p_vaddr values for the higher
memory segments?   FWIW, with the single-PT_LOAD segment currently
supported by crash, there's only one p_vaddr, but in any case, crash doesn't
use it, so PAE is not a problem.

Dave Anderson


>
> Any thoughts or suggestions on this?
>
> Thanks
> Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/