Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu wrote: > > Hi Borislav, > > Do you think the following patch is good at present? > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 81f9d23..9213073 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -460,7 +460,7 @@ static void __init > memblock_x86_reserve_range_setup_data(void) > # define CRASH_ADDR_LOW_MAX(512 << 20) > # define CRASH_ADDR_HIGH_MAX (512 << 20) > #else > -# define CRASH_ADDR_LOW_MAX(896UL << 20) > +# define CRASH_ADDR_LOW_MAX(1 << 32) > # define CRASH_ADDR_HIGH_MAX MAXMEM > #endif > Or patch lools like: diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3d872a5..ed0def5 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -459,7 +459,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_LOW_MAX(512 << 20) # define CRASH_ADDR_HIGH_MAX (512 << 20) #else -# define CRASH_ADDR_LOW_MAX(896UL << 20) +# define CRASH_ADDR_LOW_MAX(1 << 32) # define CRASH_ADDR_HIGH_MAX MAXMEM #endif @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void) high ? CRASH_ADDR_HIGH_MAX : CRASH_ADDR_LOW_MAX, crash_size, CRASH_ALIGN); +#ifdef CONFIG_X86_64 + /* +* crashkernel=X reserve below 4G fails? Try MAXMEM +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + CRASH_ADDR_HIGH_MAX, + crash_size, CRASH_ALIGN); +#endif which tries 0-4G, the fall back to 4G above > For documentation, I will send another patch to improve the description. > > Thanks, > Pingfan > > On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov wrote: > > > > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote: > > > If we move to high as default, it will allocate 160M high + 256M low. It > > > > We won't move to high by default - we will *fall* back to high if the > > default allocation fails. > > > > > To make the process less fragile maybe we can remove the 896M limitation > > > and only try <4G then go to high. > > > > Sure, the more robust for the user, the better. > > > > -- > > Regards/Gruss, > > Boris. > > > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
Hi Borislav, Do you think the following patch is good at present? diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 81f9d23..9213073 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -460,7 +460,7 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_LOW_MAX(512 << 20) # define CRASH_ADDR_HIGH_MAX (512 << 20) #else -# define CRASH_ADDR_LOW_MAX(896UL << 20) +# define CRASH_ADDR_LOW_MAX(1 << 32) # define CRASH_ADDR_HIGH_MAX MAXMEM #endif For documentation, I will send another patch to improve the description. Thanks, Pingfan On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov wrote: > > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote: > > If we move to high as default, it will allocate 160M high + 256M low. It > > We won't move to high by default - we will *fall* back to high if the > default allocation fails. > > > To make the process less fragile maybe we can remove the 896M limitation > > and only try <4G then go to high. > > Sure, the more robust for the user, the better. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote: > If we move to high as default, it will allocate 160M high + 256M low. It We won't move to high by default - we will *fall* back to high if the default allocation fails. > To make the process less fragile maybe we can remove the 896M limitation > and only try <4G then go to high. Sure, the more robust for the user, the better. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/25/19 at 12:00pm, Joerg Roedel wrote: > On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote: > > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote: > > > The current default of 256MB was found by experiments on a bigger > > > number of machines, to create a reasonable default that is at least > > > likely to be sufficient of an average machine. > > > > Exactly, and this is what makes sense. > > > > The code should try the requested reservation and if it fails, it should > > try high allocation with default swiotlb size because we need to reserve > > *some* range. > > Right, makes sense. While at it, maybe it is time to move the default > allocation policy to 'high' again. The change was reverted six years ago > because it broke old kexec tools, but those are probably out-of-service > now. I think this change would make the whole crashdump allocation > process less fragile. One concern about this is for average cases, one do not need so much memory for kdump. For example in RHEL we use crashkernel=auto to automatically reserve kdump kernel memory, and for x86 the reserved size is like below now: 1G-64G:160M,64G-1T:256M,1T-:512M That means for a machine with less than 64G memory we only allocate 160M, it works for most machines in our lab. If we move to high as default, it will allocate 160M high + 256M low. It is too much for people who is good with the default 160M. Especially for virtual machine with less memory (but > 4G) To make the process less fragile maybe we can remove the 896M limitation and only try <4G then go to high. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote: > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote: > > The current default of 256MB was found by experiments on a bigger > > number of machines, to create a reasonable default that is at least > > likely to be sufficient of an average machine. > > Exactly, and this is what makes sense. > > The code should try the requested reservation and if it fails, it should > try high allocation with default swiotlb size because we need to reserve > *some* range. Right, makes sense. While at it, maybe it is time to move the default allocation policy to 'high' again. The change was reverted six years ago because it broke old kexec tools, but those are probably out-of-service now. I think this change would make the whole crashdump allocation process less fragile. Regards, Joerg
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Sun, Feb 24, 2019 at 09:25:18PM +0800, Pingfan Liu wrote: > Maybe I misunderstood you, but does "requested range failed" mean that > user specify the range? If yes, then it should be the duty of user as > you said later, not the duty of kernel" No, it should say that it selected a different range only when the user didn't specify it. Which would mean that the user didn't care about the range - she/he only wanted to have *any* crashkernel range reserved. I.e., crashkernel=X invocation. > We do not know the memory layout of a system, maybe a system with > memory less than 4GB. So it is better to try all the range of system > memory. Ok. If 4G fails, you set high and then try again. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/24/19 at 09:25pm, Pingfan Liu wrote: > On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov wrote: > > > > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote: > > > The current default of 256MB was found by experiments on a bigger > > > number of machines, to create a reasonable default that is at least > > > likely to be sufficient of an average machine. > > > > Exactly, and this is what makes sense. > > > > The code should try the requested reservation and if it fails, it should > > try high allocation with default swiotlb size because we need to reserve > > *some* range. > > > > If that reservation succeeds, we should say something along the lines of > > > > "... requested range failed, reserved range instead." > > > Maybe I misunderstood you, but does "requested range failed" mean that > user specify the range? If yes, then it should be the duty of user as > you said later, not the duty of kernel" If you go with the changes in your current patch it is needed to say something like: "crashkernel: can not find free memory under 4G, reserve XM@.. instead" Also need to print the reserved low memory area in case ,high being used. But for 896M -> 4G, the 896M faulure is not necessary to show in dmesg, it is some in kernel logic. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov wrote: > > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote: > > The current default of 256MB was found by experiments on a bigger > > number of machines, to create a reasonable default that is at least > > likely to be sufficient of an average machine. > > Exactly, and this is what makes sense. > > The code should try the requested reservation and if it fails, it should > try high allocation with default swiotlb size because we need to reserve > *some* range. > > If that reservation succeeds, we should say something along the lines of > > "... requested range failed, reserved range instead." > Maybe I misunderstood you, but does "requested range failed" mean that user specify the range? If yes, then it should be the duty of user as you said later, not the duty of kernel" > And then in Documentation/admin-guide/kernel-parameters.txt above the > crashkernel= explanations, the allocation strategy of best effort should > be explained in short. That the kernel will try to allocate high if the > requested allocation didn't succeed and that the user can tweak the > allocation with the below options. > Yes, it should be improved. > Bottom line is: the kernel should assist the user and try harder to > allocate *some* range for a crash kernel when there's no detailed > specification what that range should be. > > *If* the user adds ,low, high, then the kernel should try only that > specified range because the assumption is that the user knows what she's > doing. > > But if the user simply wants a range for a crash kernel without stating > where that range should be in particular and it's placement is a don't > care - as long as there is a range - then the kernel should simply try > high, etc. > We do not know the memory layout of a system, maybe a system with memory less than 4GB. So it is better to try all the range of system memory Thanks, Pingfan > Makes sense? > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote: > The current default of 256MB was found by experiments on a bigger > number of machines, to create a reasonable default that is at least > likely to be sufficient of an average machine. Exactly, and this is what makes sense. The code should try the requested reservation and if it fails, it should try high allocation with default swiotlb size because we need to reserve *some* range. If that reservation succeeds, we should say something along the lines of "... requested range failed, reserved range instead." And then in Documentation/admin-guide/kernel-parameters.txt above the crashkernel= explanations, the allocation strategy of best effort should be explained in short. That the kernel will try to allocate high if the requested allocation didn't succeed and that the user can tweak the allocation with the below options. Bottom line is: the kernel should assist the user and try harder to allocate *some* range for a crash kernel when there's no detailed specification what that range should be. *If* the user adds ,low, high, then the kernel should try only that specified range because the assumption is that the user knows what she's doing. But if the user simply wants a range for a crash kernel without stating where that range should be in particular and it's placement is a don't care - as long as there is a range - then the kernel should simply try high, etc. Makes sense? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Feb 22, 2019 at 10:11:01AM +0800, Dave Young wrote: > In case people have a lot of devices need more swiotlb, then he manually > set the ,high with ,low together. The option to specify the high and low values for the crashkernel are important for certain machines. The point is that swiotlb already allocates 64MB of low memory by default. But that memory is only used for 32bit DMA-mask devices that want to DMA into high memory. There are drivers just allocating GFP_DMA32 memory, which also ends up in the low region (but not swiotlb), that is why the previous default of 72MB low memory was not enough, it only left 8MB of GFP_DMA32 memory. The current default of 256MB was found by experiments on a bigger number of machines, to create a reasonable default that is at least likely to be sufficient of an average machine. There is no way today for the kernel to find an optimum value for the amount of low memory required to successfully create a crash dump. It depends on the amount of devices in the system and how the drivers for them are written. The drivers have no way to report back their requirements, and even if they had, at the time the allocation happens no driver is loaded yet. So it is up to the system administrator to find workable values for the high and low memory requirements, even using experiments as a last resort. Regards, Joerg
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/21/19 at 06:13pm, Borislav Petkov wrote: > On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote: > > Previously Joerg posted below patch, maybe he has some idea. Joerg? > > Isn't it clear from the commit message? Then, does it answered your question? 256M is set as a default value in the patch, but it is not a predict to satisfy all use cases, from the description it is also possible that some people run out of the 256M and the ,low and ,high format is still necessary to exist even if we make crashkernel=X do the allocation automatically in high in case failed in low area. crashkernel=X: allocate in low first, if not possible, then allocate in high In case people have a lot of devices need more swiotlb, then he manually set the ,high with ,low together. What's your suggestion then? remove ,low and ,high and increase default 256M in case we get failure bug report?
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote: > Previously Joerg posted below patch, maybe he has some idea. Joerg? Isn't it clear from the commit message? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Wed, Feb 20, 2019 at 5:41 PM Dave Young wrote: > > On 02/20/19 at 09:32am, Borislav Petkov wrote: > > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote: > > > It is ideal if kernel can do it automatically, but I'm not sure if > > > kernel can predict the swiotlb reserved size automatically. > > > > Do you see how even more absurd this gets? > > > > If the kernel cannot know the swiotlb reserved size automatically, how > > is the normal user even supposed to know?! > > I think swiotlb is bounce-buffer, if we enlarge it, we can get better performance. Default size should be enough for platform to work. But in case of reserving low memory for crashkernel, things are different. The reserve low memory = swiotlb_size_or_default() + DMA32 memory for devices. And the 2nd item in the right of the equation varies, based on machine type and dynamic payload > > I see swiotlb_size_or_default() so we have a sane default which we fall > > back to. Now where's the problem with that? > > Good question, I expect some answer from people who know more about the > background. It would be good to have some actual test results, Pingfan > is trying to do some tests. > Not following the idea, I do not think the following test result can tell much. (We need various type of machine to get a final result.) I do a quick test on "HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10", command line "crashkernel=180M,high crashkernel=64M,low" can work for the 2nd kernel. Although it complained some memory shortage issue: [7.655591] fbcon: mgadrmfb (fb0) is primary device [7.655639] Console: switching to colour frame buffer device 128x48 [7.660609] systemd-udevd: page allocation failure: order:0, mode:0x280d4 [7.660611] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted 3.10.0-957.el7.x86_64 #1 [7.660612] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018 [7.660612] Call Trace: [7.660621] [] dump_stack+0x19/0x1b [7.660625] [] warn_alloc_failed+0x110/0x180 [7.660628] [] __alloc_pages_slowpath+0x6b6/0x724 [7.660631] [] __alloc_pages_nodemask+0x405/0x420 [7.660633] [] alloc_pages_current+0x98/0x110 [7.660638] [] ttm_pool_populate+0x3d2/0x4b0 [ttm] [7.660641] [] ttm_tt_populate+0x7d/0x90 [ttm] [7.660644] [] ttm_bo_kmap+0x124/0x240 [ttm] [7.660648] [] ? __wake_up_sync_key+0x4f/0x60 [7.660650] [] mga_dirty_update+0x25e/0x310 [mgag200] [7.660653] [] mga_imageblit+0x2f/0x40 [mgag200] [7.660657] [] soft_cursor+0x1ba/0x260 [7.660659] [] bit_cursor+0x663/0x6a0 [7.660662] [] ? console_trylock+0x19/0x70 [7.660664] [] fbcon_cursor+0x13d/0x1c0 [7.660665] [] ? bit_clear+0x120/0x120 [7.660668] [] hide_cursor+0x2e/0xa0 [7.660669] [] redraw_screen+0x188/0x270 [7.660671] [] do_bind_con_driver+0x316/0x340 [7.660672] [] do_take_over_console+0x49/0x60 [7.660674] [] do_fbcon_takeover+0x63/0xd0 [7.660675] [] fbcon_event_notify+0x61d/0x730 [7.660678] [] notifier_call_chain+0x4f/0x70 [7.660681] [] __blocking_notifier_call_chain+0x4d/0x70 [7.660683] [] blocking_notifier_call_chain+0x16/0x20 [7.660684] [] fb_notifier_call_chain+0x1b/0x20 [7.660686] [] register_framebuffer+0x1f6/0x340 [7.660690] [] __drm_fb_helper_initial_config_and_unlock+0x252/0x3e0 [drm_kms_helper] [7.660694] [] drm_fb_helper_initial_config+0x3e/0x50 [drm_kms_helper] [7.660697] [] mgag200_fbdev_init+0xe3/0x100 [mgag200] [7.660699] [] mgag200_modeset_init+0x154/0x1d0 [mgag200] [7.660701] [] mgag200_driver_load+0x41d/0x5b0 [mgag200] [7.660708] [] drm_dev_register+0x15f/0x1f0 [drm] [7.660711] [] ? pci_enable_device_flags+0xe8/0x140 [7.660718] [] drm_get_pci_dev+0x8a/0x1a0 [drm] [7.660720] [] mga_pci_probe+0x9b/0xc0 [mgag200] [7.660722] [] local_pci_probe+0x4a/0xb0 [7.660723] [] pci_device_probe+0x109/0x160 [7.660726] [] driver_probe_device+0xc5/0x3e0 [7.660727] [] __driver_attach+0x93/0xa0 [7.660728] [] ? __device_attach+0x50/0x50 [7.660730] [] bus_for_each_dev+0x75/0xc0 [7.660731] [] driver_attach+0x1e/0x20 [7.660733] [] bus_add_driver+0x200/0x2d0 [7.660734] [] driver_register+0x64/0xf0 [7.660735] [] __pci_register_driver+0xa5/0xc0 [7.660737] [] ? 0xc012cfff [7.660739] [] mgag200_init+0x39/0x1000 [mgag200] [7.660742] [] do_one_initcall+0xba/0x240 [7.660745] [] load_module+0x272c/0x2bc0 [7.660748] [] ? ddebug_proc_write+0x100/0x100 [7.660750] [] SyS_init_module+0xef/0x140 [7.660752] [] system_call_fastpath+0x22/0x27 [7.660753] Mem-Info: [7.660756] active_anon:3364 inactive_anon:6661 isolated_anon:0 [7.660756] active_file:0 inactive_file:0 isolated_file:0 [7.660756] unevictable:0 dirty:0 writeback:0 unstable:0 [7.660756] slab_reclaimable:1492 slab_unreclaimable:3116 [7.660756] mapped:1223 shmem:8449 pagetables:179 bounce:0 [7.660756] free:20626
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/20/19 at 09:32am, Borislav Petkov wrote: > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote: > > It is ideal if kernel can do it automatically, but I'm not sure if > > kernel can predict the swiotlb reserved size automatically. > > Do you see how even more absurd this gets? > > If the kernel cannot know the swiotlb reserved size automatically, how > is the normal user even supposed to know?! > > I see swiotlb_size_or_default() so we have a sane default which we fall > back to. Now where's the problem with that? Good question, I expect some answer from people who know more about the background. It would be good to have some actual test results, Pingfan is trying to do some tests. Previously Joerg posted below patch, maybe he has some idea. Joerg? commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4 Author: Joerg Roedel Date: Wed Jun 10 17:49:42 2015 +0200 x86/crash: Allocate enough low memory when crashkernel=high Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote: > It is ideal if kernel can do it automatically, but I'm not sure if > kernel can predict the swiotlb reserved size automatically. Do you see how even more absurd this gets? If the kernel cannot know the swiotlb reserved size automatically, how is the normal user even supposed to know?! I see swiotlb_size_or_default() so we have a sane default which we fall back to. Now where's the problem with that? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Mon, Feb 18, 2019 at 9:48 AM Dave Young wrote: > > On 02/15/19 at 11:24am, Borislav Petkov wrote: > > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote: > > > Even we make it automatic in kernel, but we have to have some default > > > value for swiotlb in case crashkernel can not find a free region under 4G. > > > So this default value can not work for every use cases, people need > > > manually use crashkernel=,low and crashkernel=,high in case > > > crashkernel=X does not work. > > > > Why would the user need to find swiotlb range? The kernel has all the > > information it requires at its finger tips in order to decide properly. > > > > The user wants a crashkernel range, the kernel tries the low range => > > no workie, then it tries the next range => workie but needs to allocate > > swiotlb range so that DMA can happen too. Doh, then the kernel does > > allocate that too. > > It is ideal if kernel can do it automatically, but I'm not sure if > kernel can predict the swiotlb reserved size automatically. > Agreed, I think it is hard to decide the reserved size automatically. We do not know the requirement for memory of ZONE_DMA32 at boot time. The requirement depends on how many DMA32 devices, and the dynamic payload of them. > Let's add more people to seek for comments. > > > > > Why would the user need to do anything here?! > > > > -- > > Regards/Gruss, > > Boris. > > > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/15/19 at 11:24am, Borislav Petkov wrote: > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote: > > Even we make it automatic in kernel, but we have to have some default > > value for swiotlb in case crashkernel can not find a free region under 4G. > > So this default value can not work for every use cases, people need > > manually use crashkernel=,low and crashkernel=,high in case > > crashkernel=X does not work. > > Why would the user need to find swiotlb range? The kernel has all the > information it requires at its finger tips in order to decide properly. > > The user wants a crashkernel range, the kernel tries the low range => > no workie, then it tries the next range => workie but needs to allocate > swiotlb range so that DMA can happen too. Doh, then the kernel does > allocate that too. It is ideal if kernel can do it automatically, but I'm not sure if kernel can predict the swiotlb reserved size automatically. Let's add more people to seek for comments. > > Why would the user need to do anything here?! > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote: > Even we make it automatic in kernel, but we have to have some default > value for swiotlb in case crashkernel can not find a free region under 4G. > So this default value can not work for every use cases, people need > manually use crashkernel=,low and crashkernel=,high in case > crashkernel=X does not work. Why would the user need to find swiotlb range? The kernel has all the information it requires at its finger tips in order to decide properly. The user wants a crashkernel range, the kernel tries the low range => no workie, then it tries the next range => workie but needs to allocate swiotlb range so that DMA can happen too. Doh, then the kernel does allocate that too. Why would the user need to do anything here?! -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Tue, Feb 12, 2019 at 4:48 AM Dave Young wrote: > > On 02/06/19 at 08:08pm, Dave Young wrote: > > On 02/05/19 at 09:15am, Borislav Petkov wrote: > > > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote: > > > > Is your objection only to the second fallback of allocating > > > > memory above >= 4GB? Or are you objecting to allocating from > > > > (896 .. 4GB) as well? > > > > > > My problem is why should the user need to specify high or low allocation > > > explicitly when we can handle all that in the kernel automatically. > > > > > > The presence of crashkernel= on the cmdline sure means that the user > > > wants to allocate memory for a second kernel. > > > > > > Now, if the requested allocation fails, we say: > > > > > > Error reserving crashkernel > > > > > > So, instead of saying that, we can *try* *again* and say > > > > > > Error reserving requested crashkernel at @..., attempting a high range. > > > > > > and run memblock_find_in_range() on the other regions which we deemed > > > are ok to allocate from. > > > > > > Why aren't we doing that by default instead of placing all those > > > different options in front of the user and expecting her/him to know > > > something about all those magic ranges? > > > > As we talked in another reply, for the >4G allocation we can not avoid > > the swiotlb issue, but if one request for 256M in high region and we > > allocate the low part automatically, it will eat more memory eg. 512M. > > > > But probably in case allacation failed in low region ,high is a must for > > kdump > > reservation, since no other choices perhaps we can make that as you said > > That is exactly what Pingfan is doing in this patch. > > Even we make it automatic in kernel, but we have to have some default > value for swiotlb in case crashkernel can not find a free region under 4G. > So this default value can not work for every use cases, people need > manually use crashkernel=,low and crashkernel=,high in case > crashkernel=X does not work. One can tune it for their use: > > 1) crashkernel=X reservation fails, likely the ,low default value is > still too big, one can shrink the value and manually try other value > 2) crashkernel=X reserve successfully on high memory and along with some > default low memory region. But the low region is not enough. In this > case one can increase the > > This should answer the question why ,high and ,low is still needed. > > But for above consumption 1), KASLR can still cause default ,low memory > failed to reserve. So I wonder if KASLR can skip the 0-896M if the > system memory is big enough. > A little fix about the comment. Refer to reserve_crashkernel_low(), low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN); So it should try 0~4G for ",low". And the default size for low is 256M. Given the limited memory region reserved by other component before crashkernel, we always can find a continuous chunk of 256M inside the fragmented [0,4G], which is split by initrd, KASLR. Thanks, Pingfan
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/06/19 at 08:08pm, Dave Young wrote: > On 02/05/19 at 09:15am, Borislav Petkov wrote: > > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote: > > > Is your objection only to the second fallback of allocating > > > memory above >= 4GB? Or are you objecting to allocating from > > > (896 .. 4GB) as well? > > > > My problem is why should the user need to specify high or low allocation > > explicitly when we can handle all that in the kernel automatically. > > > > The presence of crashkernel= on the cmdline sure means that the user > > wants to allocate memory for a second kernel. > > > > Now, if the requested allocation fails, we say: > > > > Error reserving crashkernel > > > > So, instead of saying that, we can *try* *again* and say > > > > Error reserving requested crashkernel at @..., attempting a high range. > > > > and run memblock_find_in_range() on the other regions which we deemed > > are ok to allocate from. > > > > Why aren't we doing that by default instead of placing all those > > different options in front of the user and expecting her/him to know > > something about all those magic ranges? > > As we talked in another reply, for the >4G allocation we can not avoid > the swiotlb issue, but if one request for 256M in high region and we > allocate the low part automatically, it will eat more memory eg. 512M. > > But probably in case allacation failed in low region ,high is a must for kdump > reservation, since no other choices perhaps we can make that as you said That is exactly what Pingfan is doing in this patch. Even we make it automatic in kernel, but we have to have some default value for swiotlb in case crashkernel can not find a free region under 4G. So this default value can not work for every use cases, people need manually use crashkernel=,low and crashkernel=,high in case crashkernel=X does not work. One can tune it for their use: 1) crashkernel=X reservation fails, likely the ,low default value is still too big, one can shrink the value and manually try other value 2) crashkernel=X reserve successfully on high memory and along with some default low memory region. But the low region is not enough. In this case one can increase the This should answer the question why ,high and ,low is still needed. But for above consumption 1), KASLR can still cause default ,low memory failed to reserve. So I wonder if KASLR can skip the 0-896M if the system memory is big enough. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 02/05/19 at 09:15am, Borislav Petkov wrote: > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote: > > Is your objection only to the second fallback of allocating > > memory above >= 4GB? Or are you objecting to allocating from > > (896 .. 4GB) as well? > > My problem is why should the user need to specify high or low allocation > explicitly when we can handle all that in the kernel automatically. > > The presence of crashkernel= on the cmdline sure means that the user > wants to allocate memory for a second kernel. > > Now, if the requested allocation fails, we say: > > Error reserving crashkernel > > So, instead of saying that, we can *try* *again* and say > > Error reserving requested crashkernel at @..., attempting a high range. > > and run memblock_find_in_range() on the other regions which we deemed > are ok to allocate from. > > Why aren't we doing that by default instead of placing all those > different options in front of the user and expecting her/him to know > something about all those magic ranges? As we talked in another reply, for the >4G allocation we can not avoid the swiotlb issue, but if one request for 256M in high region and we allocate the low part automatically, it will eat more memory eg. 512M. But probably in case allacation failed in low region ,high is a must for kdump reservation, since no other choices perhaps we can make that as you said > > I don't think most of the users care about where the kernel gets > allocated - all they want is a working kdump setup. > > > Falling back to allocating < 4GB probably satisfes most of the cases > > where the original allocation fails. > > Yes. Now make that automatic. For the time being, this should be good enough. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote: > Is your objection only to the second fallback of allocating > memory above >= 4GB? Or are you objecting to allocating from > (896 .. 4GB) as well? My problem is why should the user need to specify high or low allocation explicitly when we can handle all that in the kernel automatically. The presence of crashkernel= on the cmdline sure means that the user wants to allocate memory for a second kernel. Now, if the requested allocation fails, we say: Error reserving crashkernel So, instead of saying that, we can *try* *again* and say Error reserving requested crashkernel at @..., attempting a high range. and run memblock_find_in_range() on the other regions which we deemed are ok to allocate from. Why aren't we doing that by default instead of placing all those different options in front of the user and expecting her/him to know something about all those magic ranges? I don't think most of the users care about where the kernel gets allocated - all they want is a working kdump setup. > Falling back to allocating < 4GB probably satisfes most of the cases > where the original allocation fails. Yes. Now make that automatic. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Feb 01, 2019 at 12:47:40AM +0100, Borislav Petkov wrote: > On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote: > > So even if a system administrator is diligent and tests > > that a chosen kdump configuration works, that configuration > > might not work on some random reboot 7 months in the future. > > Jerry, did you read the rest of the thread where I'm *actually* > suggesting to make the allocation code more robust against such > failures? Boris, I may have misunderstood your earlier comment: So we don't really need this - we simply need to tell people to use high if it fails with KASLR, AFAICT To imply an iterative approach to crashkernel size discovery. Whereas you may simply have ment: Always use ,high as the old way is broken. > Now let's look at the code: > > The "high" allocation does: > > crash_base = memblock_find_in_range(CRASH_ALIGN, > high ? CRASH_ADDR_HIGH_MAX > : CRASH_ADDR_LOW_MAX, > crash_size, CRASH_ALIGN); > > where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM: > > # define CRASH_ADDR_HIGH_MAXMAXMEM > > The second fallback in the suggested patch does the same: > > + /* > +* crashkernel=X reserve below 4G fails? Try MAXMEM > +*/ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + CRASH_ADDR_HIGH_MAX, > + crash_size, CRASH_ALIGN); > > and yet I get back that falling back to "high" if the first allocation > doesn't succeed is not something we should do by default because of > reasons. But this patch *practically* *does* it. Is your objection only to the second fallback of allocating memory above >= 4GB? Or are you objecting to allocating from (896 .. 4GB) as well? Falling back to allocating < 4GB probably satisfes most of the cases where the original allocation fails. thanks -- - Jerry Hoemann Software Engineer Hewlett Packard Enterprise -
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote: > So even if a system administrator is diligent and tests > that a chosen kdump configuration works, that configuration > might not work on some random reboot 7 months in the future. Jerry, did you read the rest of the thread where I'm *actually* suggesting to make the allocation code more robust against such failures? Doesn't look like it... Now let's look at the code: The "high" allocation does: crash_base = memblock_find_in_range(CRASH_ALIGN, high ? CRASH_ADDR_HIGH_MAX : CRASH_ADDR_LOW_MAX, crash_size, CRASH_ALIGN); where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM: # define CRASH_ADDR_HIGH_MAXMAXMEM The second fallback in the suggested patch does the same: + /* +* crashkernel=X reserve below 4G fails? Try MAXMEM +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + CRASH_ADDR_HIGH_MAX, + crash_size, CRASH_ALIGN); and yet I get back that falling back to "high" if the first allocation doesn't succeed is not something we should do by default because of reasons. But this patch *practically* *does* it. So no, until this hasn't been done cleanly and properly explained, we'll be in a holding pattern. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Thu, Jan 31, 2019 at 11:57:32AM +0100, Borislav Petkov wrote: > On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote: > > As Pingfan/me mentioned in another reply, there are two reasons: > > 1. old kexec-tools can not load kernel to high memory > > 2. ,high will not work on some systems without some amounts of low > > memory so it nees reserve extra low memory, it is bad for people who do > > not want it. > > Let's see: we don't enable high by default because of old kexec-tools > and some systems which do some funky reservations. > > But this patch kinda enables high by trying a couple more regions. > > So we don't really need this - we simply need to tell people to use high > if it fails with KASLR, AFAICT. Borris, The testing I've done shows that the allocation failure caused by KASLR is intermittent. On one SUT that I've seen this issue on, the crash kernel allocation fails less than 10% of the time. So even if a system administrator is diligent and tests that a chosen kdump configuration works, that configuration might not work on some random reboot 7 months in the future. Unfortunately, that will be the time that the system crashes and we won't be able to collect the crash dump due to the crashkernel allocation failure. Jerry -- - Jerry Hoemann Software Engineer Hewlett Packard Enterprise -
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote: > As Pingfan/me mentioned in another reply, there are two reasons: > 1. old kexec-tools can not load kernel to high memory > 2. ,high will not work on some systems without some amounts of low > memory so it nees reserve extra low memory, it is bad for people who do > not want it. Let's see: we don't enable high by default because of old kexec-tools and some systems which do some funky reservations. But this patch kinda enables high by trying a couple more regions. So we don't really need this - we simply need to tell people to use high if it fails with KASLR, AFAICT. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Tue, Jan 29, 2019 at 01:51:45PM +0800, Pingfan Liu wrote: > Sorry, can not catch up with you. Do you suggestion > memblock_find_in_range(0, kernel_start) and > memblock_find_in_range(kernel_end, mem_end)? But the memory is > truncated into fraction by many component which call > memblock_reserve(), besides kernel. extract_kernel() selects the physical address where the place the kernel. You can use that physical address to figure out where to allocate the crash kernel range. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 01/25/19 at 03:08pm, Borislav Petkov wrote: > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote: > > AFAIK, some people prefer to explictly reserve crash memory at high > > region even if it is possible to reserve at low area. May because > > <4G memory is limited on large server, they want to leave this for other > > use. > > > > Yinghai or Vivek should know more about the history, probably they can > > recall some initial reason. > > Yes, just "prefer" is not good enough. There should be a technical > reason why that's there. Got more about this, actually the thing is for large server with very huge memory and also possible a lot of io devices. The UEFI IO drivers could allocate a lot memory under 896M, so it will leave small room for crashkernel=X Also if the memory is huge, then copying the vmcore will be very slow, people want to use big makedumpfile bitmap buffer, and multithread, then kdump kernel needs more memory, they can choose ,high to get more memory. But for common use cases, if one does not need so much kdump memory reserved he/she can just use crashkernel=X. > > Also, if the user doesn't care, then the code should be free to force > "high" and thus probe a different range for allocation. As Pingfan/me mentioned in another reply, there are two reasons: 1. old kexec-tools can not load kernel to high memory 2. ,high will not work on some systems without some amounts of low memory so it nees reserve extra low memory, it is bad for people who do not want it. > > > Good question, still it may be some historical reason, but it is good to > > make them clear and rethink about it after long time. > > > > I also want to understand, need dig the log more. > > Good idea. That would be a very nice cleanup. :-) Let's review these different parameters: > crashkernel=size[KMG][@offset[KMG]] Did not get the initial commit for this, it should be the very old format, and kernel did not find the base address automatically for the size Other arches still use this, for example mips code seems needs explict @offset, see the function mips_parse_crashkernel in arch/mips/kernel/setup.c Probably there are other arches as well which only support this format. > crashkernel=range1:size1[,range2:size2,...][@offset] This is nice param which can help to dynamically reserve different size depends on system memory. > crashkernel=size[KMG],high > crashkernel=size[KMG],low We have talked about these two. It is not graceful, but seems no way to improve it due to the swiotlb issue. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 01/29/19 at 01:25pm, Pingfan Liu wrote: > On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov wrote: > > > > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote: > > > AFAIK, some people prefer to explictly reserve crash memory at high > > > region even if it is possible to reserve at low area. May because > > > <4G memory is limited on large server, they want to leave this for other > > > use. > > > > > > Yinghai or Vivek should know more about the history, probably they can > > > recall some initial reason. > > > Go through the git log, and I found the initial introduction of > crashkernel_high option. Refer to > commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784 > Author: Yinghai Lu > Date: Mon Apr 15 22:23:47 2013 -0700 > > x86, kdump: Retore crashkernel= to allocate under 896M > > Vivek found old kexec-tools does not work new kernel anymore. > > So change back crashkernel= back to old behavoir, and add > crashkernel_high= > to let user decide if buffer could be above 4G, and also new > kexec-tools will > be needed. > > But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with > crashkernel=256M@5G, so I think only very old kexec-tools requires > memory under 896M. Due to -1.few people running latest kernel with > very old kexec-tools to date, -2. crashkernel=X is more popular than > crashkernel=X.high, it should be time to eliminate this limit of > crashkernel=X parameter, otherwise we will run into this bug. > As for crashkernel=,high, I think it is a more professional option for > who cares about the DMA32. On high-end machine, big reserved region is > used for crashkernel(e.g. in this case 384M), which make the crowed > situation under under 4GB memory worse. This seems answers the question why ,high is needed and why crashkernel=X can not allocate from high by default. Another reason for this question is for crashkernel=,high, a separate region under 4G is still needed. I searched some history bug/emails, previously the initial commit does not reserve low memory by default if ,high is used, but later Yinghai saw a regression bug in case iommu disabled. see below commit: commit c729de8fcea37a1c444e81857eace12494c804a9 Author: Yinghai Lu Date: Mon Apr 15 22:23:45 2013 -0700 x86, kdump: Set crashkernel_low automatically Chao said that kdump does does work well on his system on 3.8 without extra parameter, even iommu does not work with kdump. And now have to append crashkernel_low=Y in first kernel to make kdump work. We have now modified crashkernel=X to allocate memory beyong 4G (if available) and do not allocate low range for crashkernel if the user does not specify that with crashkernel_low=Y. This causes regression if iommu is not enabled. Without iommu, swiotlb needs to be setup in first 4G and there is no low memory available to second kernel. [snip] > > > Yes, just "prefer" is not good enough. There should be a technical > > reason why that's there. > > > > Also, if the user doesn't care, then the code should be free to force > > "high" and thus probe a different range for allocation. > > > Do you suggest to remove crashkernel=X,high parameter? I do not think it can be removed, because for ,high we also need extra low memory, it will cause confusion, most people are just fine without this. People can choose to use ,high if they really need it. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Jan 25, 2019 at 6:39 PM Borislav Petkov wrote: > > > > Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of > > crashkernel=X > > s/bugfix, // > OK. > On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote: > > People reported crashkernel=384M reservation failed on a high end server > > with KASLR enabled. In that case there is enough free memory under 896M > > but crashkernel reservation still fails intermittently. > > > > The situation is crashkernel reservation code only finds free region under > > 896 MB with 128M aligned in case no ',high' being used. And KASLR could > > break the first 896M into several parts randomly thus the failure happens. > > This reads very strange. > What about " It turns out that crashkernel reservation code only tries to find a region under 896 MB, aligned on 128M. But KASLR randomly breaks big region inside [0,896M] into smaller pieces, not big enough as demanded in the "crashkernel=X" parameter." > > User has no way to predict and make sure crashkernel=xM working unless > > he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most > > common use case this issue is a serious bug. > > > > And we can't answer questions raised from customer: > > 1) why it doesn't succeed to reserve 896 MB; > > 2) what's wrong with memory region under 4G; > > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. > > Errr, this looks like communication issue. Sounds to me like the text > around crashkernel= in > What about dropping this section in commit log and another patch to fix the document? > Documentation/admin-guide/kernel-parameters.txt > > needs improving? > > > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], > > Avoid having "This patch" or "This commit" in the commit message. It is > tautologically useless. > OK > Also, do > > $ git grep 'This patch' Documentation/process > > for more details. > > > finally above 4G. > > > > Dave Young sent the original post, and I just re-post it with commit log > > If he sent it, he should be the author I guess. > > > improvement as his requirement. > > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > > There was an old discussion below (previously posted by Chao Wang): > > https://lkml.org/lkml/2013/10/15/601 > > All that changelog info doesn't belong in the commit message ... > > > Signed-off-by: Pingfan Liu > > Cc: Dave Young > > Cc: Baoquan He > > Cc: Andrew Morton > > Cc: Mike Rapoport > > Cc: ying...@kernel.org, > > Cc: vgo...@redhat.com > > Cc: Randy Dunlap > > Cc: Borislav Petkov > > Cc: x...@kernel.org > > Cc: linux-kernel@vger.kernel.org > > --- > > but here. > > > v6 -> v7: commit log improvement > > arch/x86/kernel/setup.c | 16 > > 1 file changed, 16 insertions(+) > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index 3d872a5..fa62c81 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > > high ? CRASH_ADDR_HIGH_MAX > >: CRASH_ADDR_LOW_MAX, > > crash_size, CRASH_ALIGN); > > +#ifdef CONFIG_X86_64 > > + /* > > + * crashkernel=X reserve below 896M fails? Try below 4G > > + */ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + (1ULL << 32), > > + crash_size, CRASH_ALIGN); > > + /* > > + * crashkernel=X reserve below 4G fails? Try MAXMEM > > + */ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + CRASH_ADDR_HIGH_MAX, > > + crash_size, CRASH_ALIGN); > > +#endif > > Ok, so this is silly: we know at which physical address KASLR allocated > the kernel so why aren't we querying that and seeing if there's enough > room before it or after it to call memblock_find_in_range() on the > bigger range? > Sorry, can not catch up with you. Do you suggestion memblock_find_in_range(0, kernel_start) and memblock_find_in_range(kernel_end, mem_end)? But the memory is truncated into fraction by many component which call memblock_reserve(), besides kernel. For the left question, Dave has follow the discussion in another email, will follow there. Thanks and regards, Pingfan > Also, why is "high" dealt with separately and why isn't the code > enforcing "high" if the normal reservation fails? > > The presence of high is requiring from our users to pay attention what > to use when the kernel can do all that automatically. Looks like a UI > fail to me. > > And look a
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov wrote: > > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote: > > AFAIK, some people prefer to explictly reserve crash memory at high > > region even if it is possible to reserve at low area. May because > > <4G memory is limited on large server, they want to leave this for other > > use. > > > > Yinghai or Vivek should know more about the history, probably they can > > recall some initial reason. > Go through the git log, and I found the initial introduction of crashkernel_high option. Refer to commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784 Author: Yinghai Lu Date: Mon Apr 15 22:23:47 2013 -0700 x86, kdump: Retore crashkernel= to allocate under 896M Vivek found old kexec-tools does not work new kernel anymore. So change back crashkernel= back to old behavoir, and add crashkernel_high= to let user decide if buffer could be above 4G, and also new kexec-tools will be needed. But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with crashkernel=256M@5G, so I think only very old kexec-tools requires memory under 896M. Due to -1.few people running latest kernel with very old kexec-tools to date, -2. crashkernel=X is more popular than crashkernel=X.high, it should be time to eliminate this limit of crashkernel=X parameter, otherwise we will run into this bug. As for crashkernel=,high, I think it is a more professional option for who cares about the DMA32. On high-end machine, big reserved region is used for crashkernel(e.g. in this case 384M), which make the crowed situation under under 4GB memory worse. > Yes, just "prefer" is not good enough. There should be a technical > reason why that's there. > > Also, if the user doesn't care, then the code should be free to force > "high" and thus probe a different range for allocation. > Do you suggest to remove crashkernel=X,high parameter? Thanks, Pingfan > > Good question, still it may be some historical reason, but it is good to > > make them clear and rethink about it after long time. > > > > I also want to understand, need dig the log more. > > Good idea. That would be a very nice cleanup. :-) > > Thx. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote: > Another reason is in case ,high we will need automatically reserve a > region in low area for swiotlb. So for example one use > crashkernel=256M,high, actual reserved memory is 256M above 4G and > another 256M under 4G for swiotlb. Normally it is not necessary for > most people. Thus we can not make ,high as default. And how is the poor user to figure out that we decided for her/him that swiotlb reservation is something not necessary for most people and thus we fail the crashkernel= reservation? IOW, that "logic" above doesn't make a whole lot of sense to me from user friendliness perspective. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 01/25/19 at 03:08pm, Borislav Petkov wrote: > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote: > > AFAIK, some people prefer to explictly reserve crash memory at high > > region even if it is possible to reserve at low area. May because > > <4G memory is limited on large server, they want to leave this for other > > use. > > > > Yinghai or Vivek should know more about the history, probably they can > > recall some initial reason. > > Yes, just "prefer" is not good enough. There should be a technical > reason why that's there. > > Also, if the user doesn't care, then the code should be free to force > "high" and thus probe a different range for allocation. Another reason is in case ,high we will need automatically reserve a region in low area for swiotlb. So for example one use crashkernel=256M,high, actual reserved memory is 256M above 4G and another 256M under 4G for swiotlb. Normally it is not necessary for most people. Thus we can not make ,high as default. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote: > AFAIK, some people prefer to explictly reserve crash memory at high > region even if it is possible to reserve at low area. May because > <4G memory is limited on large server, they want to leave this for other > use. > > Yinghai or Vivek should know more about the history, probably they can > recall some initial reason. Yes, just "prefer" is not good enough. There should be a technical reason why that's there. Also, if the user doesn't care, then the code should be free to force "high" and thus probe a different range for allocation. > Good question, still it may be some historical reason, but it is good to > make them clear and rethink about it after long time. > > I also want to understand, need dig the log more. Good idea. That would be a very nice cleanup. :-) Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
> > > > Dave Young sent the original post, and I just re-post it with commit log > > If he sent it, he should be the author I guess. Just drop the line, but can change the credit to Chao Wang since he did it initially. But Chao does not work on kexec/kdump any more, and the email address is outdated as well. > > > improvement as his requirement. > > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > > There was an old discussion below (previously posted by Chao Wang): > > https://lkml.org/lkml/2013/10/15/601 > > All that changelog info doesn't belong in the commit message ... > > > Signed-off-by: Pingfan Liu > > Cc: Dave Young > > Cc: Baoquan He > > Cc: Andrew Morton > > Cc: Mike Rapoport > > Cc: ying...@kernel.org, > > Cc: vgo...@redhat.com > > Cc: Randy Dunlap > > Cc: Borislav Petkov > > Cc: x...@kernel.org > > Cc: linux-kernel@vger.kernel.org > > --- > > but here. > > > v6 -> v7: commit log improvement > > arch/x86/kernel/setup.c | 16 > > 1 file changed, 16 insertions(+) > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index 3d872a5..fa62c81 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > > high ? CRASH_ADDR_HIGH_MAX > > : CRASH_ADDR_LOW_MAX, > > crash_size, CRASH_ALIGN); > > +#ifdef CONFIG_X86_64 > > + /* > > +* crashkernel=X reserve below 896M fails? Try below 4G > > +*/ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + (1ULL << 32), > > + crash_size, CRASH_ALIGN); > > + /* > > +* crashkernel=X reserve below 4G fails? Try MAXMEM > > +*/ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + CRASH_ADDR_HIGH_MAX, > > + crash_size, CRASH_ALIGN); > > +#endif > > Ok, so this is silly: we know at which physical address KASLR allocated > the kernel so why aren't we querying that and seeing if there's enough > room before it or after it to call memblock_find_in_range() on the > bigger range? Baoquan may have comments? > > Also, why is "high" dealt with separately and why isn't the code > enforcing "high" if the normal reservation fails? > AFAIK, some people prefer to explictly reserve crash memory at high region even if it is possible to reserve at low area. May because <4G memory is limited on large server, they want to leave this for other use. Yinghai or Vivek should know more about the history, probably they can recall some initial reason. > The presence of high is requiring from our users to pay attention what > to use when the kernel can do all that automatically. Looks like a UI > fail to me. > > And look at all the flavors of crashkernel= : > > crashkernel=size[KMG][@offset[KMG]] > crashkernel=range1:size1[,range2:size2,...][@offset] > crashkernel=size[KMG],high > crashkernel=size[KMG],low > > We couldn't do one so we made 4?!?! > > What for? > > Nowhere in that help text does it explain why a user would care about > high or low or range or offset or the planets alignment... > > So what's up? Good question, still it may be some historical reason, but it is good to make them clear and rethink about it after long time. I also want to understand, need dig the log more. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. Thanks Dave
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
> Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X s/bugfix, // On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote: > People reported crashkernel=384M reservation failed on a high end server > with KASLR enabled. In that case there is enough free memory under 896M > but crashkernel reservation still fails intermittently. > > The situation is crashkernel reservation code only finds free region under > 896 MB with 128M aligned in case no ',high' being used. And KASLR could > break the first 896M into several parts randomly thus the failure happens. This reads very strange. > User has no way to predict and make sure crashkernel=xM working unless > he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most > common use case this issue is a serious bug. > > And we can't answer questions raised from customer: > 1) why it doesn't succeed to reserve 896 MB; > 2) what's wrong with memory region under 4G; > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. Errr, this looks like communication issue. Sounds to me like the text around crashkernel= in Documentation/admin-guide/kernel-parameters.txt needs improving? > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], Avoid having "This patch" or "This commit" in the commit message. It is tautologically useless. Also, do $ git grep 'This patch' Documentation/process for more details. > finally above 4G. > > Dave Young sent the original post, and I just re-post it with commit log If he sent it, he should be the author I guess. > improvement as his requirement. > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > There was an old discussion below (previously posted by Chao Wang): > https://lkml.org/lkml/2013/10/15/601 All that changelog info doesn't belong in the commit message ... > Signed-off-by: Pingfan Liu > Cc: Dave Young > Cc: Baoquan He > Cc: Andrew Morton > Cc: Mike Rapoport > Cc: ying...@kernel.org, > Cc: vgo...@redhat.com > Cc: Randy Dunlap > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: linux-kernel@vger.kernel.org > --- but here. > v6 -> v7: commit log improvement > arch/x86/kernel/setup.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3d872a5..fa62c81 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > high ? CRASH_ADDR_HIGH_MAX >: CRASH_ADDR_LOW_MAX, > crash_size, CRASH_ALIGN); > +#ifdef CONFIG_X86_64 > + /* > + * crashkernel=X reserve below 896M fails? Try below 4G > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + (1ULL << 32), > + crash_size, CRASH_ALIGN); > + /* > + * crashkernel=X reserve below 4G fails? Try MAXMEM > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + CRASH_ADDR_HIGH_MAX, > + crash_size, CRASH_ALIGN); > +#endif Ok, so this is silly: we know at which physical address KASLR allocated the kernel so why aren't we querying that and seeing if there's enough room before it or after it to call memblock_find_in_range() on the bigger range? Also, why is "high" dealt with separately and why isn't the code enforcing "high" if the normal reservation fails? The presence of high is requiring from our users to pay attention what to use when the kernel can do all that automatically. Looks like a UI fail to me. And look at all the flavors of crashkernel= : crashkernel=size[KMG][@offset[KMG]] crashkernel=range1:size1[,range2:size2,...][@offset] crashkernel=size[KMG],high crashkernel=size[KMG],low We couldn't do one so we made 4?!?! What for? Nowhere in that help text does it explain why a user would care about high or low or range or offset or the planets alignment... So what's up? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Sat, Jan 19, 2019 at 9:25 AM Jerry Hoemann wrote: > > On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote: > > People reported a bug on a high end server with many pcie devices, where > > kernel bootup with crashkernel=384M, and kaslr is enabled. Even > > though we still see much memory under 896 MB, the finding still failed > > intermittently. Because currently we can only find region under 896 MB, > > if without ',high' specified. Then KASLR breaks 896 MB into several parts > > randomly, and crashkernel reservation need be aligned to 128 MB, that's > > why failure is found. It raises confusion to the end user that sometimes > > crashkernel=X works while sometimes fails. > > If want to make it succeed, customer can change kernel option to > > "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very > > limited space to behave even though its grammar looks more generic. > > And we can't answer questions raised from customer that confidently: > > 1) why it doesn't succeed to reserve 896 MB; > > 2) what's wrong with memory region under 4G; > > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. > > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], > > finally above 4G. > > While allocating crashkernel from below 4G seems fine, won't we have > problems if the crash kernel gets allocated above 4G because of the SWIOTLB? > It will reserve extra memory below 4G for the swiotlb purpose. You can find the logic in reserve_crashkernel_low() And testing with crashkernel=512M@4G, we will get: cat /proc/iomem | grep Crash aa00-b9ff : Crash kernel 1-11fff : Crash kernel Thanks, Pingfan > thanks > > > > Dave Young sent the original post, and I just re-post it with commit log > > improvement as his requirement. > > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > > There was an old discussion below (previously posted by Chao Wang): > > https://lkml.org/lkml/2013/10/15/601 > > > > Signed-off-by: Pingfan Liu > > Cc: Dave Young > > Cc: Baoquan He > > Cc: Andrew Morton > > Cc: Mike Rapoport > > Cc: ying...@kernel.org, > > Cc: vgo...@redhat.com > > Cc: Randy Dunlap > > --- > > v6 -> v7: fix spelling mistake pointed out by Randy > > arch/x86/kernel/setup.c | 16 > > 1 file changed, 16 insertions(+) > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index 3d872a5..fa62c81 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > > high ? CRASH_ADDR_HIGH_MAX > >: CRASH_ADDR_LOW_MAX, > > crash_size, CRASH_ALIGN); > > +#ifdef CONFIG_X86_64 > > + /* > > + * crashkernel=X reserve below 896M fails? Try below 4G > > + */ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + (1ULL << 32), > > + crash_size, CRASH_ALIGN); > > + /* > > + * crashkernel=X reserve below 4G fails? Try MAXMEM > > + */ > > + if (!high && !crash_base) > > + crash_base = memblock_find_in_range(CRASH_ALIGN, > > + CRASH_ADDR_HIGH_MAX, > > + crash_size, CRASH_ALIGN); > > +#endif > > if (!crash_base) { > > pr_info("crashkernel reservation failed - No suitable > > area found.\n"); > > return; > > -- > > 2.7.4 > > > > > > ___ > > kexec mailing list > > ke...@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > -- > > - > Jerry Hoemann Software Engineer Hewlett Packard Enterprise > -
[PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
People reported crashkernel=384M reservation failed on a high end server with KASLR enabled. In that case there is enough free memory under 896M but crashkernel reservation still fails intermittently. The situation is crashkernel reservation code only finds free region under 896 MB with 128M aligned in case no ',high' being used. And KASLR could break the first 896M into several parts randomly thus the failure happens. User has no way to predict and make sure crashkernel=xM working unless he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most common use case this issue is a serious bug. And we can't answer questions raised from customer: 1) why it doesn't succeed to reserve 896 MB; 2) what's wrong with memory region under 4G; 3) why I have to add ',high', I only require 384 MB, not 3840 MB. This patch tries to get memory region from 896 MB firstly, then [896MB,4G], finally above 4G. Dave Young sent the original post, and I just re-post it with commit log improvement as his requirement. http://lists.infradead.org/pipermail/kexec/2017-October/019571.html There was an old discussion below (previously posted by Chao Wang): https://lkml.org/lkml/2013/10/15/601 Signed-off-by: Pingfan Liu Cc: Dave Young Cc: Baoquan He Cc: Andrew Morton Cc: Mike Rapoport Cc: ying...@kernel.org, Cc: vgo...@redhat.com Cc: Randy Dunlap Cc: Borislav Petkov Cc: x...@kernel.org Cc: linux-kernel@vger.kernel.org --- v6 -> v7: commit log improvement arch/x86/kernel/setup.c | 16 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3d872a5..fa62c81 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) high ? CRASH_ADDR_HIGH_MAX : CRASH_ADDR_LOW_MAX, crash_size, CRASH_ALIGN); +#ifdef CONFIG_X86_64 + /* +* crashkernel=X reserve below 896M fails? Try below 4G +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + (1ULL << 32), + crash_size, CRASH_ALIGN); + /* +* crashkernel=X reserve below 4G fails? Try MAXMEM +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + CRASH_ADDR_HIGH_MAX, + crash_size, CRASH_ALIGN); +#endif if (!crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; -- 2.7.4
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On 01/21/19 at 01:16pm, Pingfan Liu wrote: > People reported crashkernel=384M reservation failed on a high end server > with KASLR enabled. In that case there is enough free memory under 896M > but crashkernel reservation still fails intermittently. > > The situation is crashkernel reservation code only finds free region under > 896 MB with 128M aligned in case no ',high' being used. And KASLR could > break the first 896M into several parts randomly thus the failure happens. > User has no way to predict and make sure crashkernel=xM working unless > he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most > common use case this issue is a serious bug. > > And we can't answer questions raised from customer: > 1) why it doesn't succeed to reserve 896 MB; > 2) what's wrong with memory region under 4G; > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. > > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], > finally above 4G. > > Dave Young sent the original post, and I just re-post it with commit log > improvement as his requirement. > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > There was an old discussion below (previously posted by Chao Wang): > https://lkml.org/lkml/2013/10/15/601 > > Signed-off-by: Pingfan Liu > Cc: Dave Young > Cc: Baoquan He > Cc: Andrew Morton > Cc: Mike Rapoport > Cc: ying...@kernel.org, > Cc: vgo...@redhat.com > Cc: Randy Dunlap > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: linux-kernel@vger.kernel.org Looks good, ack. Acked-by: Baoquan He Thanks Baoquan > --- > v6 -> v7: commit log improvement > arch/x86/kernel/setup.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3d872a5..fa62c81 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > high ? CRASH_ADDR_HIGH_MAX >: CRASH_ADDR_LOW_MAX, > crash_size, CRASH_ALIGN); > +#ifdef CONFIG_X86_64 > + /* > + * crashkernel=X reserve below 896M fails? Try below 4G > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + (1ULL << 32), > + crash_size, CRASH_ALIGN); > + /* > + * crashkernel=X reserve below 4G fails? Try MAXMEM > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + CRASH_ADDR_HIGH_MAX, > + crash_size, CRASH_ALIGN); > +#endif > if (!crash_base) { > pr_info("crashkernel reservation failed - No suitable > area found.\n"); > return; > -- > 2.7.4 >
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote: > People reported a bug on a high end server with many pcie devices, where > kernel bootup with crashkernel=384M, and kaslr is enabled. Even > though we still see much memory under 896 MB, the finding still failed > intermittently. Because currently we can only find region under 896 MB, > if without ',high' specified. Then KASLR breaks 896 MB into several parts > randomly, and crashkernel reservation need be aligned to 128 MB, that's > why failure is found. It raises confusion to the end user that sometimes > crashkernel=X works while sometimes fails. > If want to make it succeed, customer can change kernel option to > "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very > limited space to behave even though its grammar looks more generic. > And we can't answer questions raised from customer that confidently: > 1) why it doesn't succeed to reserve 896 MB; > 2) what's wrong with memory region under 4G; > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], > finally above 4G. While allocating crashkernel from below 4G seems fine, won't we have problems if the crash kernel gets allocated above 4G because of the SWIOTLB? thanks > Dave Young sent the original post, and I just re-post it with commit log > improvement as his requirement. > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > There was an old discussion below (previously posted by Chao Wang): > https://lkml.org/lkml/2013/10/15/601 > > Signed-off-by: Pingfan Liu > Cc: Dave Young > Cc: Baoquan He > Cc: Andrew Morton > Cc: Mike Rapoport > Cc: ying...@kernel.org, > Cc: vgo...@redhat.com > Cc: Randy Dunlap > --- > v6 -> v7: fix spelling mistake pointed out by Randy > arch/x86/kernel/setup.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3d872a5..fa62c81 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > high ? CRASH_ADDR_HIGH_MAX >: CRASH_ADDR_LOW_MAX, > crash_size, CRASH_ALIGN); > +#ifdef CONFIG_X86_64 > + /* > + * crashkernel=X reserve below 896M fails? Try below 4G > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + (1ULL << 32), > + crash_size, CRASH_ALIGN); > + /* > + * crashkernel=X reserve below 4G fails? Try MAXMEM > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + CRASH_ADDR_HIGH_MAX, > + crash_size, CRASH_ALIGN); > +#endif > if (!crash_base) { > pr_info("crashkernel reservation failed - No suitable > area found.\n"); > return; > -- > 2.7.4 > > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec -- - Jerry Hoemann Software Engineer Hewlett Packard Enterprise -
Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
Pingfan, thanks for the post. On 01/15/19 at 04:07pm, Pingfan Liu wrote: > People reported a bug on a high end server with many pcie devices, where > kernel bootup with crashkernel=384M, and kaslr is enabled. Even > though we still see much memory under 896 MB, the finding still failed > intermittently. Because currently we can only find region under 896 MB, > if without ',high' specified. Then KASLR breaks 896 MB into several parts > randomly, and crashkernel reservation need be aligned to 128 MB, that's > why failure is found. It raises confusion to the end user that sometimes > crashkernel=X works while sometimes fails. > If want to make it succeed, customer can change kernel option to > "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very > limited space to behave even though its grammar looks more generic. > And we can't answer questions raised from customer that confidently: > 1) why it doesn't succeed to reserve 896 MB; > 2) what's wrong with memory region under 4G; > 3) why I have to add ',high', I only require 384 MB, not 3840 MB. > This patch tries to get memory region from 896 MB firstly, then [896MB,4G], > finally above 4G. The patch log still looks not very good. It needs some cleanup like paragraph line breaks to make it more readable. For example you can take like below: -- People reported crashkernel=384M reservation failed on a high end server with KASLR enabled. In that case there is enough free memory under 896M but crashkernel reservation still fails intermittently. The situation is crashkernel reservation code only finds free region under 896 MB with 128M aligned in case no ',high' being used. And KASLR could break the first 896M into several parts randomly thus the failure happens. User has no way to predict and make sure crashkernel=xM working unless he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most common use case this issue is a serious bug. And we can't answer questions raised from customer: 1) why it doesn't succeed to reserve 896 MB; 2) what's wrong with memory region under 4G; 3) why I have to add ',high', I only require 384 MB, not 3840 MB. This patch tries to get memory region from 896 MB firstly, then [896MB,4G], finally above 4G. > Dave Young sent the original post, and I just re-post it with commit log > improvement as his requirement. > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html > There was an old discussion below (previously posted by Chao Wang): > https://lkml.org/lkml/2013/10/15/601 I hope someone else can provide review because I posted it previously. But I think previously when I posted it is a good to have improvement, but now it is a real serious bug which need to be fixed. I can review and ack if you can repost with a better log. > > Signed-off-by: Pingfan Liu > Cc: Dave Young > Cc: Baoquan He > Cc: Andrew Morton > Cc: Mike Rapoport > Cc: ying...@kernel.org, > Cc: vgo...@redhat.com > Cc: Randy Dunlap > --- > v6 -> v7: fix spelling mistake pointed out by Randy > arch/x86/kernel/setup.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3d872a5..fa62c81 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) > high ? CRASH_ADDR_HIGH_MAX >: CRASH_ADDR_LOW_MAX, > crash_size, CRASH_ALIGN); > +#ifdef CONFIG_X86_64 > + /* > + * crashkernel=X reserve below 896M fails? Try below 4G > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + (1ULL << 32), > + crash_size, CRASH_ALIGN); > + /* > + * crashkernel=X reserve below 4G fails? Try MAXMEM > + */ > + if (!high && !crash_base) > + crash_base = memblock_find_in_range(CRASH_ALIGN, > + CRASH_ADDR_HIGH_MAX, > + crash_size, CRASH_ALIGN); > +#endif > if (!crash_base) { > pr_info("crashkernel reservation failed - No suitable > area found.\n"); > return; > -- > 2.7.4 > Thanks Dave
[PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
People reported a bug on a high end server with many pcie devices, where kernel bootup with crashkernel=384M, and kaslr is enabled. Even though we still see much memory under 896 MB, the finding still failed intermittently. Because currently we can only find region under 896 MB, if without ',high' specified. Then KASLR breaks 896 MB into several parts randomly, and crashkernel reservation need be aligned to 128 MB, that's why failure is found. It raises confusion to the end user that sometimes crashkernel=X works while sometimes fails. If want to make it succeed, customer can change kernel option to "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very limited space to behave even though its grammar looks more generic. And we can't answer questions raised from customer that confidently: 1) why it doesn't succeed to reserve 896 MB; 2) what's wrong with memory region under 4G; 3) why I have to add ',high', I only require 384 MB, not 3840 MB. This patch tries to get memory region from 896 MB firstly, then [896MB,4G], finally above 4G. Dave Young sent the original post, and I just re-post it with commit log improvement as his requirement. http://lists.infradead.org/pipermail/kexec/2017-October/019571.html There was an old discussion below (previously posted by Chao Wang): https://lkml.org/lkml/2013/10/15/601 Signed-off-by: Pingfan Liu Cc: Dave Young Cc: Baoquan He Cc: Andrew Morton Cc: Mike Rapoport Cc: ying...@kernel.org, Cc: vgo...@redhat.com Cc: Randy Dunlap --- v6 -> v7: fix spelling mistake pointed out by Randy arch/x86/kernel/setup.c | 16 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3d872a5..fa62c81 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void) high ? CRASH_ADDR_HIGH_MAX : CRASH_ADDR_LOW_MAX, crash_size, CRASH_ALIGN); +#ifdef CONFIG_X86_64 + /* +* crashkernel=X reserve below 896M fails? Try below 4G +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + (1ULL << 32), + crash_size, CRASH_ALIGN); + /* +* crashkernel=X reserve below 4G fails? Try MAXMEM +*/ + if (!high && !crash_base) + crash_base = memblock_find_in_range(CRASH_ALIGN, + CRASH_ADDR_HIGH_MAX, + crash_size, CRASH_ALIGN); +#endif if (!crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; -- 2.7.4