Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-28 Thread Pingfan Liu
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu  wrote:
>
> Hi Borislav,
>
> Do you think the following patch is good at present?
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81f9d23..9213073 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -460,7 +460,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_LOW_MAX(512 << 20)
>  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
>  #else
> -# define CRASH_ADDR_LOW_MAX(896UL << 20)
> +# define CRASH_ADDR_LOW_MAX(1 << 32)
>  # define CRASH_ADDR_HIGH_MAX   MAXMEM
>  #endif
>
Or patch lools like:
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..ed0def5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -459,7 +459,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX(512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAX(1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

@@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif

which tries 0-4G, the fall back to 4G above

> For documentation, I will send another patch to improve the description.
>
> Thanks,
> Pingfan
>
> On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov  wrote:
> >
> > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > > If we move to high as default, it will allocate 160M high + 256M low. It
> >
> > We won't move to high by default - we will *fall* back to high if the
> > default allocation fails.
> >
> > > To make the process less fragile maybe we can remove the 896M limitation
> > > and only try <4G then go to high.
> >
> > Sure, the more robust for the user, the better.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-28 Thread Pingfan Liu
Hi Borislav,

Do you think the following patch is good at present?
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81f9d23..9213073 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -460,7 +460,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX(512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAX(1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

For documentation, I will send another patch to improve the description.

Thanks,
Pingfan

On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov  wrote:
>
> On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > If we move to high as default, it will allocate 160M high + 256M low. It
>
> We won't move to high by default - we will *fall* back to high if the
> default allocation fails.
>
> > To make the process less fragile maybe we can remove the 896M limitation
> > and only try <4G then go to high.
>
> Sure, the more robust for the user, the better.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-24 Thread Pingfan Liu
On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov  wrote:
>
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
>
> If that reservation succeeds, we should say something along the lines of
>
> "... requested range failed, reserved  range instead."
>
Maybe I misunderstood you, but does "requested range failed" mean that
user specify the range? If yes, then it should be the duty of user as
you said later, not the duty of kernel"

> And then in Documentation/admin-guide/kernel-parameters.txt above the
> crashkernel= explanations, the allocation strategy of best effort should
> be explained in short. That the kernel will try to allocate high if the
> requested allocation didn't succeed and that the user can tweak the
> allocation with the below options.
>
Yes, it should be improved.

> Bottom line is: the kernel should assist the user and try harder to
> allocate *some* range for a crash kernel when there's no detailed
> specification what that range should be.
>
> *If* the user adds ,low, high, then the kernel should try only that
> specified range because the assumption is that the user knows what she's
> doing.
>
> But if the user simply wants a range for a crash kernel without stating
> where that range should be in particular and it's placement is a don't
> care - as long as there is a range - then the kernel should simply try
> high, etc.
>
We do not know the memory layout of a system, maybe a system with
memory less than 4GB. So it is better to try all the range of system
memory

Thanks,
Pingfan

> Makes sense?
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-20 Thread Pingfan Liu
On Mon, Feb 18, 2019 at 9:48 AM Dave Young  wrote:
>
> On 02/15/19 at 11:24am, Borislav Petkov wrote:
> > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > > Even we make it automatic in kernel, but we have to have some default
> > > value for swiotlb in case crashkernel can not find a free region under 4G.
> > > So this default value can not work for every use cases, people need
> > > manually use crashkernel=,low and crashkernel=,high in case
> > > crashkernel=X does not work.
> >
> > Why would the user need to find swiotlb range? The kernel has all the
> > information it requires at its finger tips in order to decide properly.
> >
> > The user wants a crashkernel range, the kernel tries the low range =>
> > no workie, then it tries the next range => workie but needs to allocate
> > swiotlb range so that DMA can happen too. Doh, then the kernel does
> > allocate that too.
>
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
>
Agreed, I think it is hard to decide the reserved size automatically.
We do not know the requirement for memory of ZONE_DMA32 at boot time.
The requirement depends on how many DMA32 devices, and the dynamic
payload of them.

> Let's add more people to seek for comments.
>
> >
> > Why would the user need to do anything here?!
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-20 Thread Pingfan Liu
On Wed, Feb 20, 2019 at 5:41 PM Dave Young  wrote:
>
> On 02/20/19 at 09:32am, Borislav Petkov wrote:
> > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > > It is ideal if kernel can do it automatically, but I'm not sure if
> > > kernel can predict the swiotlb reserved size automatically.
> >
> > Do you see how even more absurd this gets?
> >
> > If the kernel cannot know the swiotlb reserved size automatically, how
> > is the normal user even supposed to know?!
> >
I think swiotlb is bounce-buffer, if we enlarge it, we can get better
performance. Default size should be enough for platform to work. But
in case of reserving low memory for crashkernel, things are different.
The reserve low memory = swiotlb_size_or_default() + DMA32 memory for
devices. And the 2nd item in the right of the equation varies, based
on machine type and dynamic payload

> > I see swiotlb_size_or_default() so we have a sane default which we fall
> > back to. Now where's the problem with that?
>
> Good question, I expect some answer from people who know more about the
> background.  It would be good to have some actual test results, Pingfan
> is trying to do some tests.
>
Not following the idea, I do not think the following test result can
tell much. (We need various type of machine to get a final result.)
I do a quick test on "HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10",
command line "crashkernel=180M,high crashkernel=64M,low" can work for
the 2nd kernel. Although it complained some memory shortage issue:
[7.655591] fbcon: mgadrmfb (fb0) is primary device
[7.655639] Console: switching to colour frame buffer device 128x48
[7.660609] systemd-udevd: page allocation failure: order:0, mode:0x280d4
[7.660611] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted
3.10.0-957.el7.x86_64 #1
[7.660612] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 06/20/2018
[7.660612] Call Trace:
[7.660621]  [] dump_stack+0x19/0x1b
[7.660625]  [] warn_alloc_failed+0x110/0x180
[7.660628]  [] __alloc_pages_slowpath+0x6b6/0x724
[7.660631]  [] __alloc_pages_nodemask+0x405/0x420
[7.660633]  [] alloc_pages_current+0x98/0x110
[7.660638]  [] ttm_pool_populate+0x3d2/0x4b0 [ttm]
[7.660641]  [] ttm_tt_populate+0x7d/0x90 [ttm]
[7.660644]  [] ttm_bo_kmap+0x124/0x240 [ttm]
[7.660648]  [] ? __wake_up_sync_key+0x4f/0x60
[7.660650]  [] mga_dirty_update+0x25e/0x310 [mgag200]
[7.660653]  [] mga_imageblit+0x2f/0x40 [mgag200]
[7.660657]  [] soft_cursor+0x1ba/0x260
[7.660659]  [] bit_cursor+0x663/0x6a0
[7.660662]  [] ? console_trylock+0x19/0x70
[7.660664]  [] fbcon_cursor+0x13d/0x1c0
[7.660665]  [] ? bit_clear+0x120/0x120
[7.660668]  [] hide_cursor+0x2e/0xa0
[7.660669]  [] redraw_screen+0x188/0x270
[7.660671]  [] do_bind_con_driver+0x316/0x340
[7.660672]  [] do_take_over_console+0x49/0x60
[7.660674]  [] do_fbcon_takeover+0x63/0xd0
[7.660675]  [] fbcon_event_notify+0x61d/0x730
[7.660678]  [] notifier_call_chain+0x4f/0x70
[7.660681]  [] __blocking_notifier_call_chain+0x4d/0x70
[7.660683]  [] blocking_notifier_call_chain+0x16/0x20
[7.660684]  [] fb_notifier_call_chain+0x1b/0x20
[7.660686]  [] register_framebuffer+0x1f6/0x340
[7.660690]  []
__drm_fb_helper_initial_config_and_unlock+0x252/0x3e0 [drm_kms_helper]
[7.660694]  []
drm_fb_helper_initial_config+0x3e/0x50 [drm_kms_helper]
[7.660697]  [] mgag200_fbdev_init+0xe3/0x100 [mgag200]
[7.660699]  [] mgag200_modeset_init+0x154/0x1d0 [mgag200]
[7.660701]  [] mgag200_driver_load+0x41d/0x5b0 [mgag200]
[7.660708]  [] drm_dev_register+0x15f/0x1f0 [drm]
[7.660711]  [] ? pci_enable_device_flags+0xe8/0x140
[7.660718]  [] drm_get_pci_dev+0x8a/0x1a0 [drm]
[7.660720]  [] mga_pci_probe+0x9b/0xc0 [mgag200]
[7.660722]  [] local_pci_probe+0x4a/0xb0
[7.660723]  [] pci_device_probe+0x109/0x160
[7.660726]  [] driver_probe_device+0xc5/0x3e0
[7.660727]  [] __driver_attach+0x93/0xa0
[7.660728]  [] ? __device_attach+0x50/0x50
[7.660730]  [] bus_for_each_dev+0x75/0xc0
[7.660731]  [] driver_attach+0x1e/0x20
[7.660733]  [] bus_add_driver+0x200/0x2d0
[7.660734]  [] driver_register+0x64/0xf0
[7.660735]  [] __pci_register_driver+0xa5/0xc0
[7.660737]  [] ? 0xc012cfff
[7.660739]  [] mgag200_init+0x39/0x1000 [mgag200]
[7.660742]  [] do_one_initcall+0xba/0x240
[7.660745]  [] load_module+0x272c/0x2bc0
[7.660748]  [] ? ddebug_proc_write+0x100/0x100
[7.660750]  [] SyS_init_module+0xef/0x140
[7.660752]  [] system_call_fastpath+0x22/0x27
[7.660753] Mem-Info:
[7.660756] active_anon:3364 inactive_anon:6661 isolated_anon:0
[7.660756]  active_file:0 inactive_file:0 isolated_file:0
[7.660756]  unevictable:0 dirty:0 writeback:0 unstable:0
[7.660756]  slab_reclaimable:1492 slab_unreclaimable:3116
[7.660756]  mapped:1223 shmem:8449 pagetables:179 bounce:0
[7.660756]