Re: [PATCH v6 0/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-06-11 Thread Catalin Marinas
On Sun, Jun 11, 2023 at 08:15:07PM +0800, Baoquan He wrote:
> On 06/09/23 at 08:30pm, Catalin Marinas wrote:
> > On Mon, 15 May 2023 14:02:57 +0800, Baoquan He wrote:
> > > In v5 patch, Catalin helped review and acked the patch. However, an
> > > uninitialized local varilable is warned out by static checker when Will
> > > tried to merge the patch. And Will complained the code flow in
> > > reserve_crashkernel() is hard to follow, required to refactor. While
> > > when I tried to do the refactory, I feel it's not easy, the existing
> > > several cases causes that.
> > > 
> > > [...]
> > 
> > Applied to arm64 (for-next/kdump).
> > 
> > I reworte some of the paragraphs in the documentation patch, removed
> > some sentences to make it easier to read (some details were pretty
> > obvious). Please have a look, if you think I missed something important,
> > just send a patch on top. Thanks.
> > 
> > [1/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high
> >   https://git.kernel.org/arm64/c/6c4dcaddbd36
> > [2/2] Documentation: add kdump.rst to present crashkernel reservation on 
> > arm64
> >   https://git.kernel.org/arm64/c/03dc0e05407f
> 
> Could you help add below code change into the document patch commit? I
> forgot adding it and got warning report from lkp test robot.
> 
> https://lore.kernel.org/oe-kbuild-all/202306110549.ynh2juok-...@intel.com/
> 
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index ae21f8118830..dcfebddb6088 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -25,6 +25,7 @@ ARM64 Architecture
>  sve
>  tagged-address-abi
>  tagged-pointers
> +kdump

I've seen the warning as well. Please send a patch fixing this as I try
to avoid rebasing. Also we keep this part in alphabetical order.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 0/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-06-09 Thread Catalin Marinas
On Mon, 15 May 2023 14:02:57 +0800, Baoquan He wrote:
> In v5 patch, Catalin helped review and acked the patch. However, an
> uninitialized local varilable is warned out by static checker when Will
> tried to merge the patch. And Will complained the code flow in
> reserve_crashkernel() is hard to follow, required to refactor. While
> when I tried to do the refactory, I feel it's not easy, the existing
> several cases causes that.
> 
> [...]

Applied to arm64 (for-next/kdump).

I reworte some of the paragraphs in the documentation patch, removed
some sentences to make it easier to read (some details were pretty
obvious). Please have a look, if you think I missed something important,
just send a patch on top. Thanks.

[1/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high
  https://git.kernel.org/arm64/c/6c4dcaddbd36
[2/2] Documentation: add kdump.rst to present crashkernel reservation on arm64
  https://git.kernel.org/arm64/c/03dc0e05407f

-- 
Catalin


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-04-13 Thread Catalin Marinas
On Thu, Apr 13, 2023 at 03:45:50PM +0800, Baoquan He wrote:
> I am OK with this version, or the version with min(SZ_4G,
> arm64_dma_phys_limit), or v4. Please help point out if I got your idea
> correctly. Thanks a lot.

I think we should stick to this patch. The disabling of the ZONE_DMA(32)
is fairly specialised and you are right that we should not introduce an
artificial 4GB crashkernel boundary on such systems. The slight
confusion may be that ,high triggers a search above 4GB where there's
not such boundary but this would match the documentation anyway.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-04-12 Thread Catalin Marinas
On Fri, Apr 07, 2023 at 10:32:38AM +0800, Baoquan He wrote:
> On 04/07/23 at 10:24am, Baoquan He wrote:
> ..
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 66e70ca47680..307263c01292 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
> >  
> >  #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
> >  #define CRASH_ADDR_HIGH_MAX(PHYS_MASK + 1)
> > +#define CRASH_HIGH_SEARCH_BASE SZ_4G
> >  
> >  #define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
> >  
> > @@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned 
> > long long low_size)
> >   */
> >  static void __init reserve_crashkernel(void)
> >  {
> > -   unsigned long long crash_base, crash_size;
> > -   unsigned long long crash_low_size = 0;
> > +   unsigned long long crash_base, crash_size, search_base;
> > unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > +   unsigned long long crash_low_size = 0;
> > char *cmdline = boot_command_line;
> > -   int ret;
> > bool fixed_base = false;
> > +   bool high = false;
> > +   int ret;
> >  
> > if (!IS_ENABLED(CONFIG_KEXEC_CORE))
> > return;
> > @@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void)
> > else if (ret)
> > return;
> >  
> > +   search_base = CRASH_HIGH_SEARCH_BASE;
> 
> Here, I am hesitant if a conditional check is needed as below. On
> special system where both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
> are disabled, there's only low memory, means its arm64_dma_phys_limit
> equals to (PHYS_MASK + 1). In this case, whatever the crashkernel= is,
> it can search the whole system memory for available crashkernel region.
> Maybe it's fine since it's not big deal, the memory regoin can be found
> anyway.
> 
>   crash_max = CRASH_ADDR_HIGH_MAX;
>   if (crash_max != CRASH_ADDR_LOW_MAX)
>   search_base = CRASH_HIGH_SEARCH_BASE;

Does x86 do anything different here or they just can't disable
ZONE_DMA32? I'd be tempted to instead define CRASH_ADDR_LOW_MAX as
min(SZ_4G, arm64_dma_phys_limit) so that the crashkernel=,high semantics
are still preserved irrespective of how the kernel was built.

There's also the difference between what the current kernel vs the kdump
kernel. I don't think there's a strong requirement that they have the
same config options, in which case it may be safer to just honour the 4G
boundary.

Otherwise the patch looks fine. Whether you want to add the min limit
above:

Reviewed-by: Catalin Marinas 

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] arm64: kdump : take off the protection on crashkernel memory region

2023-03-24 Thread Catalin Marinas
On Fri, Mar 24, 2023 at 09:18:35PM +0800, Baoquan He wrote:
> Baoquan He (3):
>   arm64: kdump : take off the protection on crashkernel memory region
>   arm64: kdump: do not map crashkernel region specifically
>   arm64: kdump: defer the crashkernel reservation for platforms with no
> DMA memory zones
> 
>  arch/arm64/include/asm/kexec.h|  6 -
>  arch/arm64/include/asm/memory.h   |  5 
>  arch/arm64/kernel/machine_kexec.c | 20 --
>  arch/arm64/mm/init.c  |  6 +
>  arch/arm64/mm/mmu.c   | 43 ---
>  5 files changed, 1 insertion(+), 79 deletions(-)

This series works for me and it has a negative diffstat as well (though
I'm sure people will try to bring it back ;)).

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-03-24 Thread Catalin Marinas
On Fri, Mar 24, 2023 at 10:08:17PM +0800, Baoquan He wrote:
> On 03/23/23 at 05:25pm, Catalin Marinas wrote:
> > On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
> > > crashkernel=size
> > > 1)first attempt:  low memory under arm64_dma_phys_limit
> > > 2)fallback:   finding memory above 4G
> > 
> > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
> > current behaviour for RPi4.
> 
> Then for RPi4, with case 2), it will find memory above
> arm64_dma_phys_limit, namely 1G. Then it will get two memory regions,
> one could be in [1G, 4G], another is below 4G. I am fine with this, as
> long as it won't cause confusion that people may think two low memory
> regions you mentioned earlier. Please help confirm if I understand your
> suggestioin correctly. I will start making patch with this clarified.

Yes, you understood correctly. While it may still be slightly confusing,
if the user is not specific about high and low on the cmdline, we just
allow the kernel to find the best places. I assume distros will just use
,high and get a consistent behaviour on all platforms.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-03-23 Thread Catalin Marinas
On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
> On 03/17/23 at 06:05pm, Catalin Marinas wrote:
> > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> > > we leave it to crashkernel=size@offset syntax. Two reasons for this:
> > > 1) crashkernel is needed on enterprise platform, such as workstation or
> > > server. While RPi4 is obviously not the target. I contacted several RPi4
> > > players in Redhat and my friends, none of them ever played kdump
> > > testing. If they really have to, crashkernel=size@offset is enough for
> > > them to set.
> > 
> > I'd like crashkernel=size (without @offset) on RPi4 to still do the
> > right thing: a low allocation at least as we had until recently (or
> > high+low where high here is maybe above 1GB). IOW, no regression for
> > this crashkernel=size case. We can then change the explicit
> > crashkernel=x,high to mean only above 4GB irrespective of the platform
> > and crashkernel=x,low to be below arm64_dma_phys_limit.
> 
> Since crashkernel=,high and crashkernel=size fallback was added in arm64
> recently, with my understanding, you are suggesting:
> 
> on arm64:
> RPi4:
> crashkernel=size
> 0~1G: low memory (no regression introduced)

And, if not enough low memory, fall back to memory above 1GB (for RPi4;
it would be above 4GB for any other system).

> crashkernel=size,high
> 0~1G: low memory | 4G~top: high memory

Yes.

> Other normal system:
> crashkernel=size|crashkernel=size,high
> 0~4G: low memory | 4G~top: high memory

Yes.

IOW, specifying 'high' only forces the high allocation above 4GB instead
of arm64_dma_phys_limit, irrespective of the platform. If no 'high'
specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for
the rest).

> > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> > > problem of base page mapping for the whole linear mapping if crsahkernel=
> > > is set in kernel parameter shown in [1] at bottom. 
> > 
> > That's a different problem ;). I should re-read that thread, forgot most
> > of the details but I recall one of the counter arguments was that there
> > isn't a strong case to unmap the crashkernel reservation. Now, if we
> > place crashdump kernel image goes in the 'high' reservation, can we not
> > leave the 'low' reservation mapped? We don't really care about it as it
> > wouldn't have any meaningful code/data to be preserved. If the 'high'
> > one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
> 
> Yes, this looks ideal. While it only works when crashkernel=,high case and
> it succeeds to reserve a memory region for the specified size of crashkernel
> high memory. At below, we have 4 cases of crashkernel= syntax:
> 
> crashkernel=size
> 1)first attempt:  low memory under arm64_dma_phys_limit
> 2)fallback:   finding memory above 4G

(2) should be 'finding memory above arm64_dma_phys_limit' to keep the
current behaviour for RPi4.

> crashkernel=size,high
> 3)first attempt:  finding memory above 4G
> 4)fallback:   low memory under arm64_dma_phys_limit

Yes.

> case 3) works with your suggestion. However, 1), 2), 4) all need to
> defer to bootmem_init(). With these cases and different handling,
> reserve_crashkernel() could be too complicated.

Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
still can't move the full crashkernel reservation early. Well, we could
do it in two steps: (a) early attempt at crashkernel reservation above
4G if 'high' was specified and we avoid mapping it if successful and (b)
do the late crashkernel reservation below arm64_dma_phys_limit and skip
unmapping as being too late. This way most server-like platforms would
get a reservation above 4G, unmapped.

> I am wondering if we can cancel the protection of crashkernel memory
> region on arm64 for now. In earlier discussion, people questioned if the
> protection is necessary on arm64. After comparison, I would rather take
> away the protection method of crashkernel region since they try to
> protect in a chance in one million , while the base page mapping for the
> whole linear mapping is mitigating arm64 high end server always.

This works for me. We can add the protection later for addresses above
4GB only as mentioned above.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-03-17 Thread Catalin Marinas
On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> On 03/16/23 at 05:35pm, Catalin Marinas wrote:
> > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> > > since we decide to take 4G as the low memory limit when doing 'high'
> > > reserving or falling back. This is just like what we have been doing in
> > > x86_64. Not sure if I follow you correctly.
> > 
> > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
> > ZONE_DMA will be empty. But we can leave the 'high' reservation above
> > 4GB (if available). The downside is that we won't get anything between
> > 1GB and 4GB unless explicitly specified with @offset.
> 
> Ah, I see. You want to have high reservation like below when RPi4.
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 307763f97549..9e558fadf483 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = 
> PHYS_MASK + 1;
>  
>  #define CRASH_ADDR_LOW_MAX   arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX  (PHYS_MASK + 1)
> +#define CRASH_HIGH_SEARCH_BASE   SZ_4G
>  
>  #define DEFAULT_CRASH_KERNEL_LOW_SIZE(128UL << 20)
>  
> @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void)
>   else if (ret)
>   return;
>  
> - search_base = CRASH_ADDR_LOW_MAX;
> + search_base = CRASH_HIGH_SEARCH_BASE;
>   crash_max = CRASH_ADDR_HIGH_MAX;
>   high = true;
>   } else if (ret || !crash_size) {
> @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void)
>*/
>   if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>   crash_max = CRASH_ADDR_HIGH_MAX;
> - search_base = CRASH_ADDR_LOW_MAX;
> + search_base = CRASH_HIGH_SEARCH_BASE;
>   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>   goto retry;
>   }

This would probably do _if_ these are the semantics that we want (well,
more discussion below).

> > I'm not entirely sure what you want to achieve: avoiding the 'high'
> > reservation going across an arbitrary boundary (1GB or 4GB) that the
> > user may not be aware of or just avoiding the 'high' reservation going
> > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
> > fails, should it fall back to below 1GB reservation or to somewhere
> > between 1GB and 4GB, making sure it doesn't cross any of these two
> > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
> > would look like two 'low' allocations below 4GB.
> 
> Currently, with the existing code, the high and low memory is like below
> on arm64:
> RPi4:
> 0~1G: low memory | 1G~top: high memory
> 
> Other normal system:
> 0~4G: low memory | 4G~top: high memory

Yes, that's the current behaviour.

> Now you want RPi4 to have low and high crahskernel reservation like
> this: 
> 0~1G: low memory | 4G~top: high memory

I don't necessarily want this, I just want to clarify what we actually
need to fix. Your initial reasoning for this patch was that the user
gets confused by the high allocation going across the 4G boundary. But
without knowing of the 1GB vs 4GB boundary on RPi4, one can still get
confused by the high allocation across the 4GB boundary on RPi4.

> It's also fine. However, as far as I know, RPi4 usually only has 8G
> memory.

That's the maximum. Lots of them just come with 4GB or less.

> And arm64 only has top down memblock allocation. Its high memory
> above 4G will be fragmentary, the crahskernel high reservation will fail
> if the value is big.
> 
> In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> we leave it to crashkernel=size@offset syntax. Two reasons for this:
> 1) crashkernel is needed on enterprise platform, such as workstation or
> server. While RPi4 is obviously not the target. I contacted several RPi4
> players in Redhat and my friends, none of them ever played kdump
> testing. If they really have to, crashkernel=size@offset is enough for
> them to set.

I'd like crashkernel=size (without @offset) on RPi4 to still do the
right thing: a low allocation at least as we had until recently (or
high+low where high here is maybe above 1GB). IOW, no regression for
this crashkernel=size case. We can then change the explicit
crashkernel=x,high to mean only above 4GB irrespective of the platform
and crashkerne

Re: [PATCH v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-03-16 Thread Catalin Marinas
On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> On 03/15/23 at 02:52pm, Catalin Marinas wrote:
> > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> > > + /*
> > > +  * For crashkernel=size[KMG], if the first attempt was for
> > > +  * low memory, fall back to high memory, the minimum required
> > > +  * low memory will be reserved later.
> > > +  */
> > > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
> > >   crash_max = CRASH_ADDR_HIGH_MAX;
> > > + search_base = CRASH_ADDR_LOW_MAX;
> > >   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> > >   goto retry;
> > >   }
> > 
> > So I'm more tempted to set the search_base to 4G here rather than
> > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
> > memory below 4G will fall back to low allocation. But RPi4 is the odd
> > one out, so I think we can live with this.
> 
> I totally agree with you that we should take 4G as the fixed boundary of
> low and high memory because kdump is aimed at workstation and server
> platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]]
> to specify a region if people have to use.
> 
> [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end
> https://lore.kernel.org/all/20220828005545.94389-1-...@redhat.com/T/#u
> 
> Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> since we decide to take 4G as the low memory limit when doing 'high'
> reserving or falling back. This is just like what we have been doing in
> x86_64. Not sure if I follow you correctly.

On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
ZONE_DMA will be empty. But we can leave the 'high' reservation above
4GB (if available). The downside is that we won't get anything between
1GB and 4GB unless explicitly specified with @offset.

I'm not entirely sure what you want to achieve: avoiding the 'high'
reservation going across an arbitrary boundary (1GB or 4GB) that the
user may not be aware of or just avoiding the 'high' reservation going
across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
fails, should it fall back to below 1GB reservation or to somewhere
between 1GB and 4GB, making sure it doesn't cross any of these two
boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
would look like two 'low' allocations below 4GB.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-03-15 Thread Catalin Marinas
On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crashkernel high region crossing low and high memory boudary will bring
> issues:
[...]
> Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
> system, or 4G if other normal systems.

I'm mostly ok with the reworking but I'm not sure about the non-fixed
low memory boundary. As I mentioned on v2, the user doesn't (need to)
know about the ZONE_DMA limit on a specific platform, that's supposed to
be fairly transparent. So on RPi4, specifying 'high' still allows
allocation below 4G which some users may treat as 'low'. The
kernel-parameters.txt doc also only talks about the 4G limit.

> + /*
> +  * For crashkernel=size[KMG], if the first attempt was for
> +  * low memory, fall back to high memory, the minimum required
> +  * low memory will be reserved later.
> +  */
> + if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>   crash_max = CRASH_ADDR_HIGH_MAX;
> + search_base = CRASH_ADDR_LOW_MAX;
>   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>   goto retry;
>   }

So I'm more tempted to set the search_base to 4G here rather than
CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
memory below 4G will fall back to low allocation. But RPi4 is the odd
one out, so I think we can live with this.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-02-03 Thread Catalin Marinas
On Fri, Feb 03, 2023 at 03:57:23PM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crossing 4G boudary of crashkernel high region will bring issues:
> 
> 1) For crashkernel=x,high, if getting crashkernel high region across
> 4G boudary, then user will see two memory regions under 4G, and one
> memory region above 4G. The two crashkernel low memory regions are
> confusing.

Looking at your patch, I just realised that the 4G boundary between
'low' and 'high' reservations is not always true. On RPi4, that would be
1GB, the limit of ZONE_DMA. Are there user-space tools that rely on this
32-bit boundary? If they do, they'd get confused on RPi4, not sure they
have the notion of the actual ZONE_DMA that the kernel has enabled. If
we do want ,high to mean always 4G or higher, we'd need to change the
logic a bit so that the search_base starts from 4G rather than
CRASH_ADDR_LOW_MAX. We could leave the latter when ,high was not
specified.

> 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
> and y <= 128M, when crashkernel high region crosses 4G boudary and the
> part below 4G of crashkernel high reservation is bigger than y, the
> expected crahskernel low reservation will be skipped. But the expected
> crashkernel high reservation is shrank and could not satisfy user space
> requirement.

I guess if the user passes both high and low, we should honour that and
ignore any y <= 128M checks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-02-01 Thread Catalin Marinas
On Wed, Feb 01, 2023 at 01:57:17PM +0800, Baoquan He wrote:
> On 01/24/23 at 05:36pm, Catalin Marinas wrote:
> > On Tue, Jan 17, 2023 at 11:49:20AM +0800, Baoquan He wrote:
> > > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> > > suitable memory region up down. If the 'xM' of crashkernel high memory
> > > is reserved from high memory successfully, it will try to reserve
> > > crashkernel low memory later accoringly. Otherwise, it will try to search
> > > low memory area for the 'xM' suitable region.
> > > 
> > > While we observed an unexpected case where a reserved region crosses the
> > > high and low meomry boundary. E.g on a system with 4G as low memory end,
> > > user added the kernel parameters like: 'crashkernel=512M,high', it could
> > > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> > > This looks very strange because we have two low memory regions
> > > [4G-126M, 4G] and [1G, 1G+128M]. Much explanation need be given to tell
> > > why that happened.
> > > 
> > > Here, for crashkernel=xM,high, search the high memory for the suitable
> > > region above the high and low memory boundary. If failed, try reserving
> > > the suitable region below the boundary. Like this, the crashkernel high
> > > region will only exist in high memory, and crashkernel low region only
> > > exists in low memory. The reservation behaviour for crashkernel=,high is
> > > clearer and simpler.
> > 
> > Well, I guess it depends on how you look at the 'high' option: is it
> > permitting to go into high addresses or forcing high addresses only?
> > IIUC the x86 implementation has a similar behaviour to the arm64 one, it
> > allows allocation across boundary.
> 
> Hmm, x86 has no chance to allocate a memory region across 4G boundary
> because it reserves many small regions to map firmware, pci bus, etc
> near 4G. E.g one x86 system has /proc/iomem as below. I haven't seen a
> x86 system which doesn't look like this.
> 
> [root@ ~]# cat /proc/iomem 
[...]
> fffc- : Reserved
> 1-13fff : System RAM

Ah, that's why we don't see this problem on x86.

Alright, for consistency I'm fine with having the same logic on arm64. I
guess we don't need the additional check on whether the 'high'
allocation reserved at least 128MB in the 'low' range. If it succeeded
and the start is below 4GB, it's guaranteed that it got the full
allocation in the 'low' range. I haven't checked whether your patch
cleaned this up already, if not please do in the next version.

And as already asked, please fold the comments with the same patch, it's
easier to read.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

2023-01-24 Thread Catalin Marinas
On Tue, Jan 17, 2023 at 11:49:20AM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region up down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> This looks very strange because we have two low memory regions
> [4G-126M, 4G] and [1G, 1G+128M]. Much explanation need be given to tell
> why that happened.
> 
> Here, for crashkernel=xM,high, search the high memory for the suitable
> region above the high and low memory boundary. If failed, try reserving
> the suitable region below the boundary. Like this, the crashkernel high
> region will only exist in high memory, and crashkernel low region only
> exists in low memory. The reservation behaviour for crashkernel=,high is
> clearer and simpler.

Well, I guess it depends on how you look at the 'high' option: is it
permitting to go into high addresses or forcing high addresses only?
IIUC the x86 implementation has a similar behaviour to the arm64 one, it
allows allocation across boundary.

What x86 seems to do though is that if crash_base of the high allocation
is below 4G, it gives up on further low allocation. On arm64 we had this
initially but improved it slightly to check whether the low allocation
is of sufficient size. In your example above, it is 126MB instead of
128MB, hence an explicit low allocation.

Is the only problem that some users get confused? I don't see this as a
significant issue. However, with your patch, there is a potential
failure if there isn't sufficient memory to accommodate the request in
either high or low ranges.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/2] arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones

2022-11-16 Thread Catalin Marinas
On Wed, Nov 16, 2022 at 08:10:44PM +0800, Zhen Lei wrote:
> For crashkernel=X without '@offset', select a region within DMA zones
> first, and fall back to reserve region above DMA zones. This allows
> users to use the same configuration on multiple platforms.
> 
> Signed-off-by: Zhen Lei 
> Acked-by: Baoquan He 

Reviewed-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/2] arm64: kdump: Provide default size when crashkernel=Y,low is not specified

2022-11-16 Thread Catalin Marinas
On Wed, Nov 16, 2022 at 08:10:43PM +0800, Zhen Lei wrote:
> Try to allocate at least 128 MiB low memory automatically for the case
> that crashkernel=,high is explicitly specified, while crashkenrel=,low
> is omitted. This allows users to focus more on the high memory
> requirements of their business rather than the low memory requirements
> of the crash kernel booting.
> 
> Signed-off-by: Zhen Lei 
> Acked-by: Baoquan He 

Reviewed-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization

2022-11-15 Thread Catalin Marinas
On Tue, Nov 15, 2022 at 08:18:21PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/11/15 19:58, Will Deacon wrote:
> > On Mon, Jul 11, 2022 at 05:03:17PM +0800, Zhen Lei wrote:
> >> v2 --> v3:
> >> 1. Discard patch 3 in v2, a cleanup patch.
> > 
> > Do you plan to respin this series, addressing the various comments on v3?
> 
> Yes, I haven't figured out where to make DEFAULT_CRASH_KERNEL_LOW_SIZE 
> generic.

Do we need to? I'd just go with something like 128MB, specific to arm64,
and we can increase it later if anyone comes up with a good argument.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/2] arm64: kdump: Provide default size when crashkernel=Y,low is not specified

2022-11-07 Thread Catalin Marinas
On Mon, Jul 11, 2022 at 05:03:18PM +0800, Zhen Lei wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 2522b11e593f239..65a2c3a22a4b57d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -843,7 +843,7 @@
>   available.
>   It will be ignored if crashkernel=X is specified.
>   crashkernel=size[KMG],low
> - [KNL, X86-64] range under 4G. When crashkernel=X,high
> + [KNL, X86-64, ARM64] range under 4G. When 
> crashkernel=X,high
>   is passed, kernel could allocate physical memory region
>   above 4G, that cause second kernel crash on system
>   that require some amount of low memory, e.g. swiotlb
> @@ -857,12 +857,6 @@
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
>  
> - [KNL, ARM64] range in low memory.
> - This one lets the user specify a low range in the
> - DMA zone for the crash dump kernel.
> - It will be ignored when crashkernel=X,high is not used
> - or memory reserved is located in the DMA zones.
> -
>   cryptomgr.notests
>   [KNL] Disable crypto self-tests
>  
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 339ee84e5a61a0b..5390f361208ccf7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -96,6 +96,14 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = 
> PHYS_MASK + 1;
>  #define CRASH_ADDR_LOW_MAX   arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX  (PHYS_MASK + 1)
>  
> +/*
> + * This is an empirical value in x86_64 and taken here directly. Please
> + * refer to the code comment in reserve_crashkernel_low() of x86_64 for more
> + * details.
> + */
> +#define DEFAULT_CRASH_KERNEL_LOW_SIZE\
> + max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20)
> +
>  static int __init reserve_crashkernel_low(unsigned long long low_size)
>  {
>   unsigned long long low_base;
> @@ -147,7 +155,9 @@ static void __init reserve_crashkernel(void)
>* is not allowed.
>*/
>   ret = parse_crashkernel_low(cmdline, 0, _low_size, 
> _base);
> - if (ret && (ret != -ENOENT))
> + if (ret == -ENOENT)
> + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> + else if (ret)
>   return;

BTW, since we want a default low allocation, I think we should change
the checking logic slightly. Currently we have:

if ((crash_base >= CRASH_ADDR_LOW_MAX) &&
 crash_low_size && reserve_crashkernel_low(crash_low_size)) {
...

If crash_base is just below CRASH_ADDR_LOW_MAX, we deem it sufficient
but a crashkernel trying to allocate 64MB of swiotlb may fail. So maybe
change this to crash_base >= CRASH_ADDR_LOW_MAX - crash_low_size.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/2] arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones

2022-11-07 Thread Catalin Marinas
On Mon, Jul 11, 2022 at 05:03:19PM +0800, Zhen Lei wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 5390f361208ccf7..8539598f9e58b4d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -138,6 +138,7 @@ static void __init reserve_crashkernel(void)
>   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>   char *cmdline = boot_command_line;
>   int ret;
> + bool fixed_base;
>  
>   if (!IS_ENABLED(CONFIG_KEXEC_CORE))
>   return;
> @@ -166,15 +167,28 @@ static void __init reserve_crashkernel(void)
>   return;
>   }
>  
> + fixed_base = !!crash_base;
>   crash_size = PAGE_ALIGN(crash_size);
>  
>   /* User specifies base address explicitly. */
> - if (crash_base)
> + if (fixed_base)
>   crash_max = crash_base + crash_size;

Not a fan of '!!', it is converted automatically. If you don't like the
conversion, just initialise fixed_base to false and here:

if (crash_base) {
fixed_base = true;
crash_max = crash_base + crash_size;
}

> +retry:
>   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>  crash_base, crash_max);
>   if (!crash_base) {
> + /*
> +  * Attempt to fully allocate low memory failed, fall back
> +  * to high memory, the minimum required low memory will be
> +  * reserved later.
> +  */

I'm not sure this comment makes sense. If !crash_base, it doesn't mean
the kernel failed to fully allocate low memory. crash_max here could be
CRASH_ADDR_HIGH_MAX if crashkerne=X,high was specified. Maybe says
something like "If the first attempt was for low memory, fall back to
high ..."

> + if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
> + crash_max = CRASH_ADDR_HIGH_MAX;
> + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> + goto retry;
> + }

The retry logic looks fine, it only happens once as crash_max is
updated.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/2] arm64: kdump: Provide default size when crashkernel=Y,low is not specified

2022-11-07 Thread Catalin Marinas
On Mon, Jul 11, 2022 at 05:03:18PM +0800, Zhen Lei wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 339ee84e5a61a0b..5390f361208ccf7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -96,6 +96,14 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = 
> PHYS_MASK + 1;
>  #define CRASH_ADDR_LOW_MAX   arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX  (PHYS_MASK + 1)
>  
> +/*
> + * This is an empirical value in x86_64 and taken here directly. Please
> + * refer to the code comment in reserve_crashkernel_low() of x86_64 for more
> + * details.
> + */
> +#define DEFAULT_CRASH_KERNEL_LOW_SIZE\
> + max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20)

I agree with Will here, we need a better comment and we might as well
change the default value to something else until someone tells us that
the default is not large enough. The default swiotlb size is 64M, so we
need to cover that. The extra 8MB for any additional low allocations are
ok as well but the 256MB doesn't make much sense to me, or at least not
together with the rest.

If the main kernel got a command line option for a larger swiotlb, does
the crash kernel boot with the same command line? If not, we can just go
for a fixed 128M value here, which is double the default swiotlb buffer.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/2] arm64: kdump: Function supplement and performance optimization

2022-10-14 Thread Catalin Marinas
On Thu, Oct 13, 2022 at 06:46:35PM +0800, Baoquan He wrote:
> On 10/06/22 at 09:55am, john.p.donne...@oracle.com wrote:
> > What is the progress of this series ?
> > 
> > Without this patch set we are seeing  larger crashkernel=896M failures on
> > Arm  with Linux-6.0.rc7.  This larger value is needed for
> > iSCSI booted systems with certain network adapters.
> 
> This change is located in arch/arm64 folder, I have pinged arm64
> maintainer to consider merging this patchset. Not sure if they are
> still thinking, or ignore this.
> 
> Hi Catalin, Will,
> 
> Ping again!
> 
> Do you have plan to accept this patchset? It's very important for
> crashkernel setting on arm64 with a simple and default syntax.

I'll look at it once the merging window closes. I saw discussions on
this thread and I ignored it until you all agreed ;).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2-RESEND] arm64/kexec: Fix missing extra range for crashkres_low.

2022-08-31 Thread Catalin Marinas
On Wed, Aug 31, 2022 at 10:49:30AM +0900, Yun Levi wrote:
> Like crashk_res, Calling crash_exclude_mem_range function with
> crashk_low_res area would need extra crash_mem range too.
> 
> Add one more extra cmem slot in case of crashk_low_res is used.
> 
> Signed-off-by: Levi Yun 
> Fixes: 944a45abfabc ("arm64: kdump: Reimplement crashkernel=X")
> Cc: sta...@vger.kernel.org

To be even more precise, it should be:

Cc:  # 5.19.x

> Acked-by: Baoquan He 
> Reviewed-by: Catalin Marinas 
> ---
>  arch/arm64/kernel/machine_kexec_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/machine_kexec_file.c
> b/arch/arm64/kernel/machine_kexec_file.c
> index 889951291cc0..a11a6e14ba89 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -47,7 +47,7 @@ static int prepare_elf_headers(void **addr, unsigned long 
> *sz)
>   u64 i;
>   phys_addr_t start, end;
> 
> - nr_ranges = 1; /* for exclusion of crashkernel region */
> + nr_ranges = 2; /* for exclusion of crashkernel region */
>   for_each_mem_range(i, , )
>   nr_ranges++;

Something happened to the whitespace here, so the patch can't be
applied.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] arm64/kexec: Fix missing extra range for crashkres_low.

2022-08-30 Thread Catalin Marinas
On Tue, Aug 30, 2022 at 06:28:39PM +0900, Levi Yun wrote:
> Like crashk_res, Calling crash_exclude_mem_range function with
> crashk_low_res area would need extra crash_mem range too.
> 
> Add one more extra cmem range slot in case of crashk_low_res is used.
> 
> Signed-off-by: Levi Yun 

Reviewed-by: Catalin Marinas 

Does this need a Fixes tag and cc stable?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 5/5] arm64: kdump: Don't defer the reservation of crash high memory

2022-06-23 Thread Catalin Marinas
On Wed, Jun 22, 2022 at 04:35:16PM +0800, Baoquan He wrote:
> On 06/21/22 at 07:04pm, Catalin Marinas wrote:
> > The problem with splitting is that you can end up with two entries in
> > the TLB for the same VA->PA mapping (e.g. one for a 4KB page and another
> > for a 2MB block). In the lucky case, the CPU will trigger a TLB conflict
> > abort (but can be worse like loss of coherency).
> 
> Thanks for this explanation. Is this a drawback of arm64 design? X86
> code do the same thing w/o issue, is there way to overcome this on
> arm64 from hardware or software side?

It is a drawback of the arm64 implementations. Having multiple TLB
entries for the same VA would need additional logic in hardware to
detect, so the microarchitects have pushed back. In ARMv8.4, some
balanced was reached with FEAT_BBM so that the only visible side-effect
is a potential TLB conflict abort that could be resolved by software.

> I ever got a arm64 server with huge memory, w or w/o crashkernel setting 
> have different bootup time. And the more often TLB miss and flush will
> cause performance cost. It is really a pity if we have very powerful
> arm64 cpu and system capacity, but bottlenecked by this drawback.

Is it only the boot time affected or the runtime performance as well?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 5/5] arm64: kdump: Don't defer the reservation of crash high memory

2022-06-23 Thread Catalin Marinas
On Wed, Jun 22, 2022 at 08:03:21PM +0800, Kefeng Wang wrote:
> On 2022/6/22 2:04, Catalin Marinas wrote:
> > On Tue, Jun 21, 2022 at 02:24:01PM +0800, Kefeng Wang wrote:
> > > On 2022/6/21 13:33, Baoquan He wrote:
> > > > On 06/13/22 at 04:09pm, Zhen Lei wrote:
> > > > > If the crashkernel has both high memory above DMA zones and low memory
> > > > > in DMA zones, kexec always loads the content such as Image and dtb to 
> > > > > the
> > > > > high memory instead of the low memory. This means that only high 
> > > > > memory
> > > > > requires write protection based on page-level mapping. The allocation 
> > > > > of
> > > > > high memory does not depend on the DMA boundary. So we can reserve the
> > > > > high memory first even if the crashkernel reservation is deferred.
> > > > > 
> > > > > This means that the block mapping can still be performed on other 
> > > > > kernel
> > > > > linear address spaces, the TLB miss rate can be reduced and the system
> > > > > performance will be improved.
> > > > Ugh, this looks a little ugly, honestly.
> > > > 
> > > > If that's for sure arm64 can't split large page mapping of linear
> > > > region, this patch is one way to optimize linear mapping. Given kdump
> > > > setting is necessary on arm64 server, the booting speed is truly
> > > > impacted heavily.
> > > Is there some conclusion or discussion that arm64 can't split large page
> > > mapping?
> > > 
> > > Could the crashkernel reservation (and Kfence pool) be splited 
> > > dynamically?
> > > 
> > > I found Mark replay "arm64: remove page granularity limitation from
> > > KFENCE"[1],
> > > 
> > >    "We also avoid live changes from block<->table mappings, since the
> > >    archtitecture gives us very weak guarantees there and generally 
> > > requires
> > >    a Break-Before-Make sequence (though IIRC this was tightened up
> > >    somewhat, so maybe going one way is supposed to work). Unless it's
> > >    really necessary, I'd rather not split these block mappings while
> > >    they're live."
> > The problem with splitting is that you can end up with two entries in
> > the TLB for the same VA->PA mapping (e.g. one for a 4KB page and another
> > for a 2MB block). In the lucky case, the CPU will trigger a TLB conflict
> > abort (but can be worse like loss of coherency).
> Thanks for your explanation,
> > Prior to FEAT_BBM (added in ARMv8.4), such scenario was not allowed at
> > all, the software would have to unmap the range, TLBI, remap. With
> > FEAT_BBM (level 2), we can do this without tearing the mapping down but
> > we still need to handle the potential TLB conflict abort. The handler
> > only needs a TLBI but if it touches the memory range being changed it
> > risks faulting again. With vmap stacks and the kernel image mapped in
> > the vmalloc space, we have a small window where this could be handled
> > but we probably can't go into the C part of the exception handling
> > (tracing etc. may access a kmalloc'ed object for example).
> 
> So if without FEAT_BBM,we can only guarantee BBM sequence via
> "unmap the range, TLBI, remap" or the following option,

Yes, that's the break-before-make sequence.

> and with FEAT_BBM (level 2), we could have easy way to avoid TLB
> conflict for some vmalloc space, but still hard to deal with other
> scence?

It's not too hard in theory. Basically there's a small risk of getting a
TLB conflict abort for the mappings you change without a BBM sequence (I
think it's nearly non-existed when going from large block to smaller
pages, though the architecture states that it's still possible). Since
we only want to do this for the linear map and the kernel and stack are
in the vmalloc space, we can handle such trap as an safety measure (it
just needs a TLBI). It may help to tweak a model to force it to generate
such conflict aborts, otherwise we'd not be able to test the code.

It's possible that such trap is raised at EL2 if a guest caused the
conflict abort (the architecture left this as IMP DEF). The hypervisors
may need to be taught to do a TLBI VMALLS12E1 instead of killing the
guest. I haven't checked what KVM does.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 5/5] arm64: kdump: Don't defer the reservation of crash high memory

2022-06-21 Thread Catalin Marinas
On Tue, Jun 21, 2022 at 02:24:01PM +0800, Kefeng Wang wrote:
> On 2022/6/21 13:33, Baoquan He wrote:
> > On 06/13/22 at 04:09pm, Zhen Lei wrote:
> > > If the crashkernel has both high memory above DMA zones and low memory
> > > in DMA zones, kexec always loads the content such as Image and dtb to the
> > > high memory instead of the low memory. This means that only high memory
> > > requires write protection based on page-level mapping. The allocation of
> > > high memory does not depend on the DMA boundary. So we can reserve the
> > > high memory first even if the crashkernel reservation is deferred.
> > > 
> > > This means that the block mapping can still be performed on other kernel
> > > linear address spaces, the TLB miss rate can be reduced and the system
> > > performance will be improved.
> > 
> > Ugh, this looks a little ugly, honestly.
> > 
> > If that's for sure arm64 can't split large page mapping of linear
> > region, this patch is one way to optimize linear mapping. Given kdump
> > setting is necessary on arm64 server, the booting speed is truly
> > impacted heavily.
> 
> Is there some conclusion or discussion that arm64 can't split large page
> mapping?
> 
> Could the crashkernel reservation (and Kfence pool) be splited dynamically?
> 
> I found Mark replay "arm64: remove page granularity limitation from
> KFENCE"[1],
> 
>   "We also avoid live changes from block<->table mappings, since the
>   archtitecture gives us very weak guarantees there and generally requires
>   a Break-Before-Make sequence (though IIRC this was tightened up
>   somewhat, so maybe going one way is supposed to work). Unless it's
>   really necessary, I'd rather not split these block mappings while
>   they're live."

The problem with splitting is that you can end up with two entries in
the TLB for the same VA->PA mapping (e.g. one for a 4KB page and another
for a 2MB block). In the lucky case, the CPU will trigger a TLB conflict
abort (but can be worse like loss of coherency).

Prior to FEAT_BBM (added in ARMv8.4), such scenario was not allowed at
all, the software would have to unmap the range, TLBI, remap. With
FEAT_BBM (level 2), we can do this without tearing the mapping down but
we still need to handle the potential TLB conflict abort. The handler
only needs a TLBI but if it touches the memory range being changed it
risks faulting again. With vmap stacks and the kernel image mapped in
the vmalloc space, we have a small window where this could be handled
but we probably can't go into the C part of the exception handling
(tracing etc. may access a kmalloc'ed object for example).

Another option is to do a stop_machine() (if multi-processor at that
point), disable the MMUs, modify the page tables, re-enable the MMU but
it's also complicated.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64: kdump: Do not allocate crash low memory if not needed

2022-05-16 Thread Catalin Marinas
On Wed, 11 May 2022 11:20:32 +0800, Zhen Lei wrote:
> When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low"
> memory is not required in the following corner cases:
> 1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means
>that the devices can access any memory.
> 2. If the system memory is small, the crash high memory may be allocated
>from the DMA zones. If that happens, there's no need to allocate
>another crash low memory because there's already one.
> 
> [...]

Applied to arm64 (for-next/crashkernel), thanks!

[1/1] arm64: kdump: Do not allocate crash low memory if not needed
  https://git.kernel.org/arm64/c/8f0f104e2ab6

-- 
Catalin


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 0/6] support reserving crashkernel above 4G on arm64 kdump

2022-05-07 Thread Catalin Marinas
On Fri, 6 May 2022 19:43:56 +0800, Zhen Lei wrote:
> Changes since [v23]:
> Do two minor updates:
> 1. Change the value of CRASH_ADDR_HIGH_MAX from memblock.current_limit to 
> (PHYS_MASK + 1).
> 2. The 'high' and 'low' are only take effect when crashkernel=X[@offset] does 
> not exist.
>If the value of crashkernel=X is invalid, the reservation of crash fails, 
> even if 'high'
>and 'low' are configured correctly.
> 
> [...]

Applied to arm64 (for-next/crashkernel), finally ;). Thanks!

Only patches on top please for the fall-back implementation.

[1/6] kdump: return -ENOENT if required cmdline option does not exist
  https://git.kernel.org/arm64/c/2e5920bb073a
[2/6] arm64: Use insert_resource() to simplify code
  https://git.kernel.org/arm64/c/e6b394425c61
[3/6] arm64: kdump: Reimplement crashkernel=X
  https://git.kernel.org/arm64/c/944a45abfabc
[4/6] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  https://git.kernel.org/arm64/c/fb319e77a0e7
[5/6] of: Support more than one crash kernel regions for kexec -s
  https://git.kernel.org/arm64/c/8af6b91f5834
[6/6] docs: kdump: Update the crashkernel description for arm64
  https://git.kernel.org/arm64/c/5832f1ae5060

-- 
Catalin


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 3/6] arm64: kdump: Reimplement crashkernel=X

2022-05-07 Thread Catalin Marinas
On Sat, May 07, 2022 at 07:10:32AM +0800, Baoquan He wrote:
> On 05/06/22 at 07:43pm, Zhen Lei wrote:
> ..  
> > @@ -118,8 +162,7 @@ static void __init reserve_crashkernel(void)
> > if (crash_base)
> > crash_max = crash_base + crash_size;
> >  
> > -   /* Current arm64 boot protocol requires 2MB alignment */
> > -   crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> > +   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >crash_base, crash_max);
> > if (!crash_base) {
> > pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > @@ -127,6 +170,11 @@ static void __init reserve_crashkernel(void)
> > return;
> > }
> >  
> 
> There's corner case missed, e.g
> 1) ,high and ,low are specified, CONFIG_ZONE_DMA|DMA32 is not enabled;
> 2) ,high and ,low are specified, the whole system memory is under 4G.

My view of ,low is that it should only used to override the default
ZONE_DMA allocation if that one is not suitable. If no ZONE_DMA exists
or everything is ZONE_DMA, ignore it altogether. That's a specialist
case for people that know more about the memory layout, otherwise
crashkernel=X works in most case with crashkernel=X,high as an
alternative to allow high allocation.

> I would suggest merging this series, Lei can add this corner case
> handling on top. Since this is a newly added support, we don't have
> to make it one step. Doing step by step can make reviewing easier.

I agree.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 6/6] docs: kdump: Update the crashkernel description for arm64

2022-05-07 Thread Catalin Marinas
On Sat, May 07, 2022 at 11:30:59AM +0800, Baoquan He wrote:
> On 05/07/22 at 09:41am, Leizhen (ThunderTown) wrote:
> > On 2022/5/7 7:14, Baoquan He wrote:
> > > On 05/06/22 at 07:44pm, Zhen Lei wrote:
> > >> Now arm64 has added support for "crashkernel=X,high" and
> > >> "crashkernel=Y,low". Unlike x86, crash low memory is not allocated if
> > >> "crashkernel=Y,low" is not specified.
> > >>
> > >> Signed-off-by: Zhen Lei 
> > >> ---
> > >>  Documentation/admin-guide/kernel-parameters.txt | 9 +++--
> > >>  1 file changed, 7 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > >> b/Documentation/admin-guide/kernel-parameters.txt
> > >> index 3f1cc5e317ed4a5..aa44c61114aa4b8 100644
> > >> --- a/Documentation/admin-guide/kernel-parameters.txt
> > >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> > >> @@ -808,7 +808,7 @@
> > >>  Documentation/admin-guide/kdump/kdump.rst for 
> > >> an example.
> > >>  
> > >>  crashkernel=size[KMG],high
> > >> -[KNL, X86-64] range could be above 4G. Allow 
> > >> kernel
> > >> +[KNL, X86-64, ARM64] range could be above 4G. 
> > >> Allow kernel
> > >>  to allocate physical memory region from top, so 
> > >> could
> > >>  be above 4G if system have more than 4G ram 
> > >> installed.
> > >>  Otherwise memory region will be allocated below 
> > >> 4G, if
> > >> @@ -821,7 +821,7 @@
> > >>  that require some amount of low memory, e.g. 
> > >> swiotlb
> > >>  requires at least 64M+32K low memory, also 
> > >> enough extra
> > >>  low memory is needed to make sure DMA buffers 
> > >> for 32-bit
> > >> -devices won't run out. Kernel would try to 
> > >> allocate at
> > >> +devices won't run out. Kernel would try to 
> > >> allocate
> > >>  at least 256M below 4G automatically.
> > >>  This one let user to specify own low range 
> > >> under 4G
> > >>  for second kernel instead.
> > >> @@ -829,6 +829,11 @@
> > >>  It will be ignored when crashkernel=X,high is 
> > >> not used
> > >>  or memory reserved is below 4G.
> > >>  
> > >> +[KNL, ARM64] range in low memory.
> > >> +This one let user to specify a low range in DMA 
> > >> zone for
> > >   ^ not needed,
> > > Maybe Catalin can fix it when merging.
> > 
> > Delete "This one let user to" or the entire sentence? I understand it to be 
> > the former.
> 
> Oh, I mean the 'to' is not needed. "This one let user specify ". The
> 'to' is a grammer mistake.

Slightly more correct is "This one lets the user specify..."

I can fix them up when applying.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v23 3/6] arm64: kdump: Reimplement crashkernel=X

2022-05-06 Thread Catalin Marinas
On Fri, May 06, 2022 at 09:16:08PM +0800, Baoquan He wrote:
> On 05/06/22 at 11:22am, Leizhen (ThunderTown) wrote:
> ..  
> > >> @@ -118,8 +159,7 @@ static void __init reserve_crashkernel(void)
> > >>  if (crash_base)
> > >>  crash_max = crash_base + crash_size;
> > >>  
> > >> -/* Current arm64 boot protocol requires 2MB alignment */
> > >> -crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> > >> +crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> > >> crash_base, crash_max);
> > >>  if (!crash_base) {
> > >>  pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > > 
> > > I personally like this but let's see how the other thread goes. I guess
> > 
> > Me too. This fallback complicates code logic more than just a little.
> > I'm not sure why someone would rather add fallback than change the bootup
> > options to crashkernel=X,[high|low]. Perhaps fallback to high/low is a 
> > better
> > compatible and extended mode when crashkernel=X fails to reserve memory. And
> > the code logic will be much clearer.
> 
> The fallback does complicates code, while it was not made at the
> beginning, but added later. The original crahskernel=xM can only reserve
> low memory under 896M on x86 to be back compatible with the case in which
> normal kernel is x86_64, while kdump kernel could be i386. Then customer
> complained why crashkernel=xM can't be put anywhere so that they don't
> need to know the details of limited low memory and huge high memory fact 
> in system.
> 
> The implementation of fallback is truly complicated, but its use is
> quite simple. And it makes crashkernel reservation setting simple.
> Most of users don't need to know crashkernel=,high, ,low things, unless
> the crashkernel region is too big. Nobody wants to take away 1G or more
> from low memory for kdump just in case bad thing happens, while normal
> kernel itself is seriously impacted by limited low memory.

IIUC, that's exactly what happens even on x86, it may take away a
significant chunk of the low memory. Let's say we have 1.2GB of 'low'
memory (below 4GB) on an arm64 platform. A crashkernel=1G would succeed
in a low allocation, pretty much affecting the whole system. It would
only fall back to 'high' _if_ you pass something like crashkernel=1.2G
so that the low allocation fails. So if I got this right, I find the
fall-back from crashkernel=X pretty useless, we shouldn't even try it.

It makes more sense if crashkernel=X,high is a hint to attempt a high
allocation first with a default low (overridden by a ,low option) or
even fall-back to low if there's no memory above 4GB.

Could you please have a look at Zhen Lei's latest series without any
fall-backs? I'd like to queue that if you are happy with it. We can then
look at adding some fall-back options on top.

IMO, we should only aim for:

crashkernel=X   ZONE_DMA allocation, no fall-back
crashkernel=X,high  hint for high allocation, small default
low, fall back to low if alloc fails
crashkernel=X,low   control the default low allocation, only
high is passed

With the above, I'd expect admins to just go for crashkernel=X,high on
modern hardware with up to date kexec tools and it does the right thing.
The crashkernel=X can lead to unexpected results if it eats up all the
low memory. Let's say this option is for backwards compatibility only.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v23 3/6] arm64: kdump: Reimplement crashkernel=X

2022-05-06 Thread Catalin Marinas
On Fri, May 06, 2022 at 11:22:51AM +0800, Leizhen (ThunderTown) wrote:
> On 2022/5/6 1:01, Catalin Marinas wrote:
> > On Thu, May 05, 2022 at 05:18:42PM +0800, Zhen Lei wrote:
> >> From: Chen Zhou 
> >>
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel in DMA zone, which
> >> will fail when there is not enough low memory.
> >> 2. If reserving crashkernel above DMA zone, in this case, crash dump
> >> kernel will fail to boot because there is no low memory available
> >> for allocation.
> >>
> >> To solve these issues, introduce crashkernel=X,[high,low].
> >> The "crashkernel=X,high" is used to select a region above DMA zone, and
> >> the "crashkernel=Y,low" is used to allocate specified size low memory.
> > 
> > Thanks for posting the simplified version, though the discussion with
> > Baoquan is still ongoing. AFAICT there is no fallback if crashkernel=
> > fails. The advantage with this series is cleaner code, we set the limits
> > during parsing and don't have to adjust them if some of the first
> > allocation failed.
> 
> Yes, I'm currently implementing it in the simplest version, providing only
> the most basic functions. Because the conclusions of this part of the 
> discussion
> are clear. I think I can send the fallback, default low size, and mapping 
> optimization
> patches separately after this basic version is merged. These three functions 
> can
> be discussed separately.

This works for me. If we decide to go for fallbacks, it can be done as a
separate patch.

> >> +  ret = parse_crashkernel_high(cmdline, 0, _size, 
> >> _base);
> >> +  if (ret || !crash_size)
> >> +  return;
> >> +
> >> +  /*
> >> +   * crashkernel=Y,low can be specified or not, but invalid value
> >> +   * is not allowed.
> >> +   */
> >> +  ret = parse_crashkernel_low(cmdline, 0, _low_size, 
> >> _base);
> >> +  if (ret && (ret != -ENOENT))
> >> +  return;
> >> +
> >> +  crash_max = CRASH_ADDR_HIGH_MAX;
> >> +  }
> >>  
> >>crash_size = PAGE_ALIGN(crash_size);
> >>  
> >> @@ -118,8 +159,7 @@ static void __init reserve_crashkernel(void)
> >>if (crash_base)
> >>crash_max = crash_base + crash_size;
> >>  
> >> -  /* Current arm64 boot protocol requires 2MB alignment */
> >> -  crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> >> +  crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >>   crash_base, crash_max);
> >>if (!crash_base) {
> >>pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > 
> > I personally like this but let's see how the other thread goes. I guess
> 
> Me too. This fallback complicates code logic more than just a little.
> I'm not sure why someone would rather add fallback than change the bootup
> options to crashkernel=X,[high|low]. Perhaps fallback to high/low is a better
> compatible and extended mode when crashkernel=X fails to reserve memory. And
> the code logic will be much clearer.
> 
> //parse crashkernel=X //To simplify the discussion, Ignore [@offset]
> crash_base = memblock_phys_alloc_range()
> if (!crash_base || /* crashkernel=X is not specified */) {
>   //parse crashkernel=X,[high,low]
>   //reserve high/low memory
> }
> 
> So that, the following three modes are supported:
> 1) crashkernel=X[@offset]
> 2) crashkernel=X,high crashkernel=X,low
> 3) crashkernel=X[@offset] crashkernel=X,high [crashkernel=Y,low]

The whole interface isn't great but if we add fall-back options, I'd
rather stick close to what x86 does. IOW, if crashkernel=X is provided,
ignore explicit high/low (so 3 does not exist).

(if I had added it from the beginning, I'd have removed 'high'
completely and allow crashkernel=X to fall-back to 'high' with an
optional explicit 'low' or 'dma' if the default is not sufficient; but I
think there's too much bikeshedding already)

> > if we want a fallback, it would come just before the check the above:
> > 
> > if (!crash_base && crash_max != CRASH_ADDR_HIGH_MAX) {
> > /* attempt high allocation with default low */
> > if (!crash_low_size)
> > crash_low_size = some default;
> > crash_max = CRASH_ADDR_LOW_MAX;
> 
> crash_max = CRASH_ADDR_HIGH_MAX; We should fallback to high memory now.

Yes, that's the idea.

Anyway, please post the current series with the minor updates I
mentioned and we can add a fallback patch (or two) on top.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v23 3/6] arm64: kdump: Reimplement crashkernel=X

2022-05-05 Thread Catalin Marinas
On Thu, May 05, 2022 at 05:18:42PM +0800, Zhen Lei wrote:
> From: Chen Zhou 
> 
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel in DMA zone, which
> will fail when there is not enough low memory.
> 2. If reserving crashkernel above DMA zone, in this case, crash dump
> kernel will fail to boot because there is no low memory available
> for allocation.
> 
> To solve these issues, introduce crashkernel=X,[high,low].
> The "crashkernel=X,high" is used to select a region above DMA zone, and
> the "crashkernel=Y,low" is used to allocate specified size low memory.

Thanks for posting the simplified version, though the discussion with
Baoquan is still ongoing. AFAICT there is no fallback if crashkernel=
fails. The advantage with this series is cleaner code, we set the limits
during parsing and don't have to adjust them if some of the first
allocation failed.

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 51863f1448c6989..11406f3e1443168 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -90,6 +90,32 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
>  phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
>  #endif
>  
> +/* Current arm64 boot protocol requires 2MB alignment */
> +#define CRASH_ALIGN  SZ_2M
> +
> +#define CRASH_ADDR_LOW_MAX   arm64_dma_phys_limit
> +#define CRASH_ADDR_HIGH_MAX  memblock.current_limit

Better use memblock_get_current_limit() if you need to or just
MEMBLOCK_ALLOC_ANYWHERE, memblock.current_limit is just a memblock
internal. But I think we can go for (PHYS_MASK + 1) if you need
something other than MEMBLOCK_ALLOC_ANYWHERE, memblock knows what to
allocate anyway.

> +static int __init reserve_crashkernel_low(unsigned long long low_size)
> +{
> + unsigned long long low_base;
> +
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
> CRASH_ADDR_LOW_MAX);
> + if (!low_base) {
> + pr_err("cannot allocate crashkernel low memory 
> (size:0x%llx).\n", low_size);
> + return -ENOMEM;
> + }
> +
> + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld 
> MB)\n",
> + low_base, low_base + low_size, low_size >> 20);
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end   = low_base + low_size - 1;
> + insert_resource(_resource, _low_res);
> +
> + return 0;
> +}
> +
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
> @@ -100,17 +126,32 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = 
> PHYS_MASK + 1;
>  static void __init reserve_crashkernel(void)
>  {
>   unsigned long long crash_base, crash_size;
> - unsigned long long crash_max = arm64_dma_phys_limit;
> + unsigned long long crash_low_size = 0;
> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> + char *cmdline = boot_command_line;
>   int ret;
>  
>   if (!IS_ENABLED(CONFIG_KEXEC_CORE))
>   return;
>  
> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> + /* crashkernel=X[@offset] */
> + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
>   _size, _base);
> - /* no crashkernel= or invalid value specified */
> - if (ret || !crash_size)
> - return;
> + if (ret || !crash_size) {

I think we should check for ret == -ENOENT only. If the crashkernel=
exists but is malformed or the size is 0, we shouldn't bother with
high/low at all.

> + ret = parse_crashkernel_high(cmdline, 0, _size, 
> _base);
> + if (ret || !crash_size)
> + return;
> +
> + /*
> +  * crashkernel=Y,low can be specified or not, but invalid value
> +  * is not allowed.
> +  */
> + ret = parse_crashkernel_low(cmdline, 0, _low_size, 
> _base);
> + if (ret && (ret != -ENOENT))
> + return;
> +
> + crash_max = CRASH_ADDR_HIGH_MAX;
> + }
>  
>   crash_size = PAGE_ALIGN(crash_size);
>  
> @@ -118,8 +159,7 @@ static void __init reserve_crashkernel(void)
>   if (crash_base)
>   crash_max = crash_base + crash_size;
>  
> - /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>  crash_base, crash_max);
>   if (!crash_base) {
>   pr_warn("cannot allocate crashkernel (size:0x%llx)\n",

I personally like this but let's see how the other thread goes. I guess
if we want a fallback, it would come just before the check the above:

if (!crash_base && crash_max != CRASH_ADDR_HIGH_MAX) {
/* attempt high allocation with default low */
if (!crash_low_size)

Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X

2022-05-05 Thread Catalin Marinas
On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote:
> On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> > So, to recap, IIUC you are fine with:
> > 
> > crashkernel=Y   - allocate within ZONE_DMA with fallback
> >   above with a default in ZONE_DMA (like
> >   x86, 256M or swiotlb size)
> 
> Ack to this one.
> 
> 
> > crashkernel=Y,high  - allocate from above ZONE_DMA
> 
> Not exactly. If there's only ZONE_DMA, crashkernel,high will
> be reserved in ZONE_DMA, and crashkernel,low will be ignored.
> Other than this, ack.

Yes, that's fine.

> > crashkernel=Y,low   - allocate within ZONE_DMA
> 
> Ack to this one.
> > 
> > 'crashkernel' overrides the high and low while the latter two can be
> > passed independently.
> 
> crashkernel=,high can be passed independently, then a crashkernel=,low
> is needed implicitly. If people don't want crashkernel=,low
> explicitly, crashkernel=0,low need be specified.

I find this complicating the interface. I don't know the background to
the x86 implementation but we diverge already on arm64 since we talk
about ZONE_DMA rather than 4G limit (though for most platforms these
would be the same).

I guess we could restate the difference between crashkernel= and
crashkernel=,high as the hint to go for allocation above ZONE_DMA first.

> An independent crashkernel=,low makes no sense. Crashkernel=,low
> should be paird with crashkernel=,high.

You could argue that crashkernel=,low gives the current crashkernel=
behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it
may have some value on its own.

> My personal opinion according to the existed senmantics on x86.
> Otherwise, the guidance of crashkernel= |,high|,low reservation
> will be complicated to write.

It's more that I find the current semantics unnecessarily confusing. But
even reading the x86_64 text it's not that clear. For example the
default low allocation for crashkernel= and crashkernel=,high is only
mentioned in the crashkernel=,low description.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X

2022-05-03 Thread Catalin Marinas
On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> > On 2022/4/29 11:24, Baoquan He wrote:
> >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> >>> On 2022/4/28 11:52, Baoquan He wrote:
> >>>> On 04/28/22 at 11:40am, Baoquan He wrote:
> >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> >>>>>> There will be some difference as the 4G limit doesn't always hold for
> >>>>>> arm64 (though it's true in most cases). Anyway, we can probably 
> >>>>>> simplify
> >>>>>> things a bit while following the documented behaviour:
> >>>>>>
> >>>>>>crashkernel=Y   - current behaviour within ZONE_DMA
> >>>>>>crashkernel=Y,high  - allocate from above ZONE_DMA
> >>>>>>crashkernel=Y,low   - allocate within ZONE_DMA
[...]
> >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> >>>>> about this version. And I have the same concerns about them which comes
> >>>>> from below points:
> >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
> >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
> >>>>> deploy/configure kdump on different machines of different ARCHes in the
> >>>>> same LAB. I think we should try to avoid the confusion.

I guess by all arches you mean just x86 here. Since the code is not
generic, all arches do their own stuff.

> > OK, I plan to remove optimization, fallback and default low size, to follow 
> > the
> > suggestion of Catalin first. But there's one minor point of contention.
> > 
> > 1)Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> > 2)Both "crashkernel=X,high" and "crashkernel=X,low" are present.
> >or
> >   Allow "crashkernel=X,high" to be present alone. Unlike x86, the 
> > default low size is zero.
> > 
> > I prefer 2), how about you?

(2) works for me as well. We keep these simple as "expert" options and
allow crashkernel= to fall back to 'high' if not sufficient memory in
ZONE_DMA. That would be a slight change from the current behaviour but,
as Zhen Lei said, with the old tools it's just moving the error around,
the crashkernel wouldn't be available in either case.

> >>>>> 2) Fallback behaviour is important to our distros. The reason is we will
> >>>>> provide default value with crashkernel=xxxM along kernel of distros. In
> >>>>> this case, we hope the reservation will succeed by all means. The ,high
> >>>>> and ,low is an option if customer likes to take with expertise.

OK, that's good feedback.

So, to recap, IIUC you are fine with:

crashkernel=Y   - allocate within ZONE_DMA with fallback
  above with a default in ZONE_DMA (like
  x86, 256M or swiotlb size)
crashkernel=Y,high  - allocate from above ZONE_DMA
crashkernel=Y,low   - allocate within ZONE_DMA

'crashkernel' overrides the high and low while the latter two can be
passed independently.

> >>>>> After going through arm64 memory init code, I got below summary about
> >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> >>>>> can make use of it to facilitate to simplify code.
> >>>>> 
> >>>>> DMA  DMA32  
> >>>>>   NORMAL
> >>>>> 1)Raspberry Pi4 0~1G 3G~4G  
> >>>>>   (above 4G)
> >>>>> 2)Normal machine0~4G 0  
> >>>>>   (above 4G)
> >>>>> 3)Special machine   (above 4G)~MAX
> >>>>> 4)No DMA|DMA32  
> >>>>>   (above 4G)~MAX
> >>>
> >>> arm64_memblock_init()
> >>>   reserve_crashkernel()---   0a30c53573b0 ("arm64: 
> >>> mm: Move reserve_crashkernel() into mem_init()")
> >> We don't need different code for this place of reservation as you are
> >> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> >&

Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X

2022-04-27 Thread Catalin Marinas
On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 20:32, Catalin Marinas wrote:
> > I think one could always pass a default command line like:
> > 
> > crashkernel=1G,high crashkernel=128M,low
> > 
> > without much knowledge of the SoC memory layout.
> 
> Yes, that's what the end result is. The user specify crashkernel=128M,low
> and the implementation ensure the 128M low memory is allocated from DMA zone.
> We use arm64_dma_phys_limit as the upper limit for crash low memory.
> 
> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
> +   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> +   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>crash_base, crash_max);
> 
> > Another option is to only introduce crashkernel=Y,low and, when that is
> > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > 'high' option at all:
> > 
> > crashkernel=1G  - all within ZONE_DMA
> > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA
> >   1G above ZONE_DMA
> > 
> > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > the 'low' option.
> 
> I think although the code is hard to make generic, the interface is better to
> be relatively uniform. A user might have to maintain both x86 and arm64, and
> so on. It's not a good thing that the difference is too big.

There will be some difference as the 4G limit doesn't always hold for
arm64 (though it's true in most cases). Anyway, we can probably simplify
things a bit while following the documented behaviour:

crashkernel=Y   - current behaviour within ZONE_DMA
crashkernel=Y,high  - allocate from above ZONE_DMA
crashkernel=Y,low   - allocate within ZONE_DMA

There is no fallback from crashkernel=Y.

The question is whether we still want a default low allocation if
crashkernel=Y,low is missing but 'high' is present. If we add this, I
think we'd be consistent with kernel-parameters.txt for the 'low'
description. A default 'low' is probably not that bad but I'm tempted to
always mandate both 'high' and 'low'.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X

2022-04-27 Thread Catalin Marinas
On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 2:02, Catalin Marinas wrote:
> > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
> >>  /*
> >>   * reserve_crashkernel() - reserves memory for crash kernel
> >>   *
> >>   * This function reserves memory area given in "crashkernel=" kernel 
> >> command
> >>   * line parameter. The memory reserved is used by dump capture kernel when
> >>   * primary kernel is crashing.
> >> + *
> >> + * NOTE: Reservation of crashkernel,low is special since its existence
> >> + * is not independent, need rely on the existence of crashkernel,high.
> >> + * Here, four cases of crashkernel low memory reservation are summarized:
> >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel 
> >> low
> >> + *memory takes Y;
> >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> >> + *take the default crashkernel low memory size;
> >> + * 3) crashkernel=X is specified, while fallback to get a memory region
> >> + *in high memory, take the default crashkernel low memory size;
> >> + * 4) crashkernel='invalid value',low is specified, failed the whole
> >> + *crashkernel reservation and bail out.
> > 
> > Following the x86 behaviour made sense when we were tried to get that
> > code generic. Now that we moved the logic under arch/arm64, we can
> > diverge a bit. I lost track of the original (v1/v2) proposal but I
> > wonder whether we still need the fallback to high for crashkernel=Y.
> 
> I don't think anyone has raised this demand yet! If it weren't for the
> fact that crashkernel=X appeared earlier, it would probably have been
> enough for a combination of crashkernel=X,high and crashkernel=Y,low.
> 
> In fact, I also tend not to support "fallback to high for crashkernel=Y".
> I took over this from Chen Zhou. In the absence of any objection, I had
> to inherit. Now that you've brought it up, I'm happy to delete it.
> Supporting this feature complicates the code logic a lot. The point is,
> it's not fully backwards compatible yet. For example, someone may want
> crashkernel=3G to report failure, but the the new support make it work.

BTW, prior to v20, this patch had this line:

crashk_low_res.name = "Crash kernel (low)";

I can't find it anymore. Do the kexec tools need to distinguish between
low and high or they can cope with multiple "Crash kernel" entries?

> > Maybe simpler, no fallbacks:
> > 
> > crashkernel=Y - keep the current behaviour, ignore high,low
> > crashkernel=Y,high - allocate above ZONE_DMA
> > crashkernel=Y,low - allocate within ZONE_DMA
> > 
> > From your proposal, the difference is that the Y,high option won't
> > have any default ZONE_DMA fallback, one would have to explicitly pass
> > the Y,low option if needed.
> 
> I agree with you. Now we don't need code generic, so there is no need to
> carry the historical burden of other ARCHs. arm64 does not need to delve
> into that empirical value(the default size of crash low memory).
> 
> > Just a thought, maybe it makes the code simpler. But I'm open to
> > discussion if there are good arguments for the proposed (x86-like)
> > behaviour. One argument could be for crashkernel=Y to fall back to high
> > if distros don't want to bother with high/low settings.
> 
> I think distros should take precedence over "crashkernel=Y,high". After all,
> ZONE_DMA memory is more valuable than high memory.

My point is whether an admin configuring the kernel command line needs
to know the layout of ZONE_DMA etc. to figure out how much to pass in
high and low. The fallbacks in this case have some value but they also
complicate the code logic. The 4GB limit does not always make sense
either for some platforms (RPi4 has a ZONE_DMA limit of 1GB).

I think one could always pass a default command line like:

crashkernel=1G,high crashkernel=128M,low

without much knowledge of the SoC memory layout.

Another option is to only introduce crashkernel=Y,low and, when that is
passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
'high' option at all:

crashkernel=1G  - all within ZONE_DMA
crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA
  1G above ZONE_DMA

If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
the 'low' option.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X

2022-04-26 Thread Catalin Marinas
On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
>   * This function reserves memory area given in "crashkernel=" kernel command
>   * line parameter. The memory reserved is used by dump capture kernel when
>   * primary kernel is crashing.
> + *
> + * NOTE: Reservation of crashkernel,low is special since its existence
> + * is not independent, need rely on the existence of crashkernel,high.
> + * Here, four cases of crashkernel low memory reservation are summarized:
> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
> + *memory takes Y;
> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> + *take the default crashkernel low memory size;
> + * 3) crashkernel=X is specified, while fallback to get a memory region
> + *in high memory, take the default crashkernel low memory size;
> + * 4) crashkernel='invalid value',low is specified, failed the whole
> + *crashkernel reservation and bail out.

Following the x86 behaviour made sense when we were tried to get that
code generic. Now that we moved the logic under arch/arm64, we can
diverge a bit. I lost track of the original (v1/v2) proposal but I
wonder whether we still need the fallback to high for crashkernel=Y.
Maybe simpler, no fallbacks:

crashkernel=Y - keep the current behaviour, ignore high,low
crashkernel=Y,high - allocate above ZONE_DMA
crashkernel=Y,low - allocate within ZONE_DMA

>From your proposal, the difference is that the Y,high option won't
have any default ZONE_DMA fallback, one would have to explicitly pass
the Y,low option if needed.

Just a thought, maybe it makes the code simpler. But I'm open to
discussion if there are good arguments for the proposed (x86-like)
behaviour. One argument could be for crashkernel=Y to fall back to high
if distros don't want to bother with high/low settings.

Another thing I may have asked in the past, what happens if we run a new
kernel with these patches with old kexec user tools. I suspect the
crashkernel=Y with the fallback to high will confuse the tools.

BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the
crashkernel above 4G. Let's get the crashkernel reservations sorted
first, it's been around for too long.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 4/9] arm64: kdump: Don't force page-level mappings for memory above 4G

2022-04-26 Thread Catalin Marinas
On Thu, Apr 14, 2022 at 07:57:15PM +0800, Zhen Lei wrote:
> @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp)
>   for_each_mem_range(i, , ) {
>   if (start >= end)
>   break;
> +
> +#ifdef CONFIG_KEXEC_CORE
> + if (eflags && (end >= SZ_4G)) {
> + /*
> +  * The memory block cross the 4G boundary.
> +  * Forcibly use page-level mappings for memory under 4G.
> +  */
> + if (start < SZ_4G) {
> + __map_memblock(pgdp, start, SZ_4G - 1,
> +pgprot_tagged(PAGE_KERNEL), 
> flags | eflags);
> + start  = SZ_4G;
> + }
> +
> + /* Page-level mappings is not mandatory for memory 
> above 4G */
> + eflags = 0;
> + }
> +#endif

That's a bit tricky if a SoC has all RAM above 4G. IIRC AMD Seattle had
this layout. See max_zone_phys() for how we deal with this, basically
extending ZONE_DMA to the whole range if RAM starts above 4GB. In that
case, crashkernel reservation would fall in the range above 4GB.

BTW, we changed the max_zone_phys() logic with commit 791ab8b2e3db
("arm64: Ignore any DMA offsets in the max_zone_phys() calculation").

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-15 Thread Catalin Marinas
On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > From: Chen Zhou 
> > > > 
> > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > 
> > > A big WHY is missing here to explain why the lower bound of the
> > > allocation range needs to be 16M and why was 0 wrong?
> > 
> > I asked the same here:
> > 
> > https://lore.kernel.org/r/20210224143547.gb28...@arm.com
> > 
> > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > lower part, so that's equivalent in practice to starting from
> > CRASH_ALIGN.
> 
> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> Considering the existing alignment CRASH_ALIGN which is 16M, we
> definitely have no chance to get memory starting from 0. So starting
> from 16M can skip the useless memblock searching, and make the
> crashkernel low reservation consisten with crashkernel reservation on
> allocation code.

That's the x86 assumption. Is it valid for other architectures once the
code has been made generic in patch 6? It should be ok for arm64, RAM
tends to start from higher up but other architectures may start using
this common code.

If you want to keep the same semantics as before, just leave it as 0.
It's not that the additional lower bound makes the search slower.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-14 Thread Catalin Marinas
On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > From: Chen Zhou 
> > 
> > The lower bounds of crash kernel reservation and crash kernel low
> > reservation are different, use the consistent value CRASH_ALIGN.
> 
> A big WHY is missing here to explain why the lower bound of the
> allocation range needs to be 16M and why was 0 wrong?

I asked the same here:

https://lore.kernel.org/r/20210224143547.gb28...@arm.com

IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
lower part, so that's equivalent in practice to starting from
CRASH_ALIGN.

Anyway, I agree the commit log should describe this.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

2021-12-13 Thread Catalin Marinas
On Mon, Dec 13, 2021 at 08:37:48AM -0600, john.p.donne...@oracle.com wrote:
> On 12/10/21 12:55 AM, Zhen Lei wrote:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> > 
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> > 
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> > 
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> > 
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> > 
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> > 
> > This patchset contains the following 10 patches:
> > 
> > 0001-0004 are some x86 cleanups which prepares for making 
> > functionsreserve_crashkernel[_low]() generic.
> > 0005 makes functions reserve_crashkernel[_low]() generic.
> > 0006-0007 reimplements arm64 crashkernel=X.
> > 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> > 0010 updates the doc.
> > 
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> >"Tested-by: Dave Kleikamp " for patch 1-9
> > - Add "Reviewed-by: Rob Herring " for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> >ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> >contents of X86 and ARM64 do not overlap, and reduce unnecessary
> >temporary differences.
> > 
> > Changes since [v15]
> > -  Aggregate the processing of "linux,usable-memory-range" into one 
> > function.
> > Only patch 9-10 have been updated.
> > 
> > Changes since [v14]
> > - Recovering the requirement that the CrashKernel memory regions on X86
> >only requires 1 MiB alignment.
> > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
> >by patch 6 was introduced by patch 5 in v14.
> > - As with crashk_res, crashk_low_res is also processed by
> >crash_exclude_mem_range() in patch 7.
> > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom 
> > linux,usable-memory-range handling")
> >has removed the architecture-specific code, extend the property 
> > "linux,usable-memory-range"
> >in the platform-agnostic FDT core code. See patch 9.
> > - Discard the x86 description update in the document, because the 
> > description
> >has been updated by commit b1f4c363666c ("Documentation: kdump: update 
> > kdump guide").
> > - Change "arm64" to "ARM64" in Doc.
> > 
> > 
> > Changes since [v13]
> > - Rebased on top of 5.11-rc5.
> > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> > Since reserve_crashkernel[_low]() implementations are quite similar on
> > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> > arch/Kconfig and select this by X86 and ARM64.
> > - Some minor cleanup.
> > 
> > Changes since [v12]
> > - Rebased on top of 5.10-rc1.
> > - Keep CRASH_ALIGN as 16M suggested by Dave.
> > - Drop patch "kdump: add threshold for the required memory".
> > - Add Tested-by from John.
> > 
> > Changes since [v11]
> > - Rebased on top of 5.9-rc4.
> > - Make the function reserve_crashkernel() of x86 generic.
> > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> > and arm64 use the generic version to reimplement crashkernel=X.
> > 
> > Changes since [v10]
> > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> > 
> > Changes since [v9]
> > - Patch 1 add Acked-by from Dave.
> > - Update patch 5 according to Dave's comments.
> > - Update chosen schema.
> > 
> > Changes since [v8]
> > - Reuse DT property "linux,usable-memory-range".
> > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> > low
> > memory region.
> > - Fix kdump broken with ZONE_DMA reintroduced.
> > - Update chosen schema.
> > 
> > Changes since [v7]

Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump

2021-12-13 Thread Catalin Marinas
On Fri, Dec 10, 2021 at 01:53:59PM -0600, john.p.donne...@oracle.com wrote:
> On 12/8/21 11:13 AM, Catalin Marinas wrote:
> > On Tue, Nov 23, 2021 at 08:46:35PM +0800, Zhen Lei wrote:
> > > Chen Zhou (10):
> > >x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> > >x86: kdump: make the lower bound of crash kernel reservation
> > >  consistent
> > >x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > >  reserve_crashkernel()
> > >x86: kdump: move xen_pv_domain() check and insert_resource() to
> > >  setup_arch()
> > >x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > >arm64: kdump: introduce some macros for crash kernel reservation
> > >arm64: kdump: reimplement crashkernel=X
> > >x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
> > >of: fdt: Add memory for devices by DT property
> > >  "linux,usable-memory-range"
> > >kdump: update Documentation about crashkernel
> > > 
> > > Zhen Lei (1):
> > >of: fdt: Aggregate the processing of "linux,usable-memory-range"
> > 
> > Apart from a minor comment I made on patch 8 and some comments from Rob
> > that need addressing, the rest looks fine to me.
> > 
> > Ingo stated in the past that he's happy to ack the x86 changes as long
> > as there's no functional change (and that's the case AFAICT). Ingo, does
> > your conditional ack still stand?
> > 
> > In terms of merging, I'm happy to take it all through the arm64 tree
> > with acks from the x86 maintainers. Alternatively, with the change I
> > mentioned for patch 8, the first 5 patches could be queued via the tip
> > tree on a stable branch and I can base the rest of the arm64 on top.
> > 
> > Thomas, Ingo, Peter, any preference?
> > 
> > Thanks.
> 
> If you notice the trend over the past year , some of additional review
> requests are because the submitter had to rebase to the next version.

Normally the reviews stand even if the series is rebased. You only drop
them if there were non-trivial changes to the patches, not just because
of a rebase.

Ingo acked the x86 changes in the past but there were some functional
changes to the minimum alignment, so the series had to be reworked and
the ack not applied.

https://lore.kernel.org/r/20200806133627.gb2077...@gmail.com

> Can we get this acked and placed in a build so others can test and start
> using it ?

If Ingo's conditional ack (no functional changes, just a move) still
stands over the five x86 patches now, I'm happy to queue them all via
the arm64 tree. The latest series here:

https://lore.kernel.org/r/20211210065533.2023-1-thunder.leiz...@huawei.com

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v16 00/11] support reserving crashkernel above 4G on arm64 kdump

2021-12-08 Thread Catalin Marinas
On Tue, Nov 23, 2021 at 08:46:35PM +0800, Zhen Lei wrote:
> Chen Zhou (10):
>   x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
>   x86: kdump: make the lower bound of crash kernel reservation
> consistent
>   x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel()
>   x86: kdump: move xen_pv_domain() check and insert_resource() to
> setup_arch()
>   x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>   arm64: kdump: introduce some macros for crash kernel reservation
>   arm64: kdump: reimplement crashkernel=X
>   x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
>   of: fdt: Add memory for devices by DT property
> "linux,usable-memory-range"
>   kdump: update Documentation about crashkernel
> 
> Zhen Lei (1):
>   of: fdt: Aggregate the processing of "linux,usable-memory-range"

Apart from a minor comment I made on patch 8 and some comments from Rob
that need addressing, the rest looks fine to me.

Ingo stated in the past that he's happy to ack the x86 changes as long
as there's no functional change (and that's the case AFAICT). Ingo, does
your conditional ack still stand?

In terms of merging, I'm happy to take it all through the arm64 tree
with acks from the x86 maintainers. Alternatively, with the change I
mentioned for patch 8, the first 5 patches could be queued via the tip
tree on a stable branch and I can base the rest of the arm64 on top.

Thomas, Ingo, Peter, any preference?

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v16 08/11] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config

2021-12-08 Thread Catalin Marinas
On Tue, Nov 23, 2021 at 08:46:43PM +0800, Zhen Lei wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 26b8ed11639da46..19256aa924c3b2c 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,6 +24,9 @@ config KEXEC_ELF
>  config HAVE_IMA_KEXEC
>   bool
>  
> +config ARCH_WANT_RESERVE_CRASH_KERNEL
> + bool
> +
>  config SET_FS
>   bool
>  
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c4207cf9bb17ffb..4b99efa36da3793 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -95,6 +95,7 @@ config ARM64
>   select ARCH_WANT_FRAME_POINTERS
>   select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
> && !ARM64_VA_BITS_36)
>   select ARCH_WANT_LD_ORPHAN_WARN
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   select ARCH_WANTS_NO_INSTR
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
>   select ARM_AMBA
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7399327d1eff79d..528034b4276ecf8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -12,6 +12,7 @@ config X86_32
>   depends on !64BIT
>   # Options that are inherently 32-bit kernel only:
>   select ARCH_WANT_IPC_PARSE_VERSION
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   select CLKSRC_I8253
>   select CLONE_BACKWARDS
>   select GENERIC_VDSO_32
> @@ -28,6 +29,7 @@ config X86_64
>   select ARCH_HAS_GIGANTIC_PAGE
>   select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
>   select ARCH_USE_CMPXCHG_LOCKREF
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   select HAVE_ARCH_SOFT_DIRTY
>   select MODULES_USE_ELF_RELA
>   select NEED_DMA_MAP_STATE
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 4dc2643fcbccf99..b23cfc0ca8905fd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -321,9 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
>   * - Crashkernel reservation --
>   */
>  
> -#ifdef CONFIG_KEXEC_CORE
> -
> -#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
>  static int __init reserve_crashkernel_low(void)
>  {
>  #ifdef CONFIG_64BIT
> @@ -451,8 +449,7 @@ void __init reserve_crashkernel(void)
>   crashk_res.start = crash_base;
>   crashk_res.end   = crash_base + crash_size - 1;
>  }
> -#endif
> -#endif /* CONFIG_KEXEC_CORE */
> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */

Nitpick mostly but it may simplify the patches if the x86, arch/Kconfig
and crash_core.c changes here could be moved to patch 5. The remaining
select for arm64 should be moved to patch 7 and drop the #if change in
that patch.

This way we can keep the x86 patches on a separate branch.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 4/5] arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2021-12-03 Thread Catalin Marinas
On Fri, Dec 03, 2021 at 01:11:56PM +0800, Jisheng Zhang wrote:
> Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> by a check for "IS_ENABLED(CONFIG_BLK_DEV_INITRD)", to simplify the
> code and increase compile coverage.
> 
> Signed-off-by: Jisheng Zhang 

As long as it still compiles:

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 00/15] arm64: MMU enabled kexec relocation

2021-09-29 Thread Catalin Marinas
On Thu, Sep 16, 2021 at 07:13:10PM -0400, Pasha Tatashin wrote:
> Pasha Tatashin (15):
>   arm64: kernel: add helper for booted at EL2 and not VHE
>   arm64: trans_pgd: hibernate: Add trans_pgd_copy_el2_vectors
>   arm64: hibernate: abstract ttrb0 setup function
>   arm64: kexec: flush image and lists during kexec load time
>   arm64: kexec: skip relocation code for inplace kexec
>   arm64: kexec: Use dcache ops macros instead of open-coding
>   arm64: kexec: pass kimage as the only argument to relocation function
>   arm64: kexec: configure EL2 vectors for kexec
>   arm64: kexec: relocate in EL1 mode
>   arm64: kexec: use ld script for relocation function
>   arm64: kexec: install a copy of the linear-map
>   arm64: kexec: keep MMU enabled during kexec relocation
>   arm64: kexec: remove the pre-kexec PoC maintenance
>   arm64: kexec: remove cpu-reset.h
>   arm64: trans_pgd: remove trans_pgd_map_page()

FWIW, I'm fine with this series. I think the only concern raised by
James and not addressed is the possibility of the failure of the memory
allocation for copying the page tables during kexec. If anyone will
complain in the future, we can look into adding a fallback mechanism to
do the relocation with the MMU off.

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 15/15] arm64: trans_pgd: remove trans_pgd_map_page()

2021-09-29 Thread Catalin Marinas
On Thu, Sep 16, 2021 at 07:13:25PM -0400, Pasha Tatashin wrote:
> The intend of trans_pgd_map_page() was to map contiguous range of VA
> memory to the memory that is getting relocated during kexec. However,
> since we are now using linear map instead of contiguous range this
> function is not needed
> 
> Suggested-by: Pingfan Liu 
> Signed-off-by: Pasha Tatashin 
> ---
>  arch/arm64/include/asm/trans_pgd.h |  5 +--
>  arch/arm64/mm/trans_pgd.c  | 57 --
>  2 files changed, 1 insertion(+), 61 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/trans_pgd.h 
> b/arch/arm64/include/asm/trans_pgd.h
> index 7b04d32b102c..033d400a4ea4 100644
> --- a/arch/arm64/include/asm/trans_pgd.h
> +++ b/arch/arm64/include/asm/trans_pgd.h
> @@ -15,7 +15,7 @@
>  /*
>   * trans_alloc_page
>   *   - Allocator that should return exactly one zeroed page, if this
> - * allocator fails, trans_pgd_create_copy() and trans_pgd_map_page()
> + * allocator fails, trans_pgd_create_copy() and trans_pgd_idmap_page()
>   * return -ENOMEM error.
>   *
>   * trans_alloc_arg
> @@ -30,9 +30,6 @@ struct trans_pgd_info {
>  int trans_pgd_create_copy(struct trans_pgd_info *info, pgd_t **trans_pgd,
> unsigned long start, unsigned long end);
>  
> -int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
> -void *page, unsigned long dst_addr, pgprot_t pgprot);

So this function never got used in mainline after commit 7018d467ff2d
("arm64: trans_pgd: hibernate: idmap the single page that holds the copy
page routines"). I guess it's because we merged part of v10 of this
series and in v12 you dropped the contiguous VA range in favour of a
copy of the linear map.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v16 00/15] arm64: MMU enabled kexec relocation

2021-09-16 Thread Catalin Marinas
On Thu, Aug 26, 2021 at 11:03:21AM -0400, Pavel Tatashin wrote:
> On Tue, Aug 24, 2021 at 2:06 PM Catalin Marinas  
> wrote:
> > > Enable MMU during kexec relocation in order to improve reboot performance.
> > >
> > > If kexec functionality is used for a fast system update, with a minimal
> > > downtime, the relocation of kernel + initramfs takes a significant portion
> > > of reboot.
> > >
> > > The reason for slow relocation is because it is done without MMU, and thus
> > > not benefiting from D-Cache.
> >
> > The performance improvements are indeed significant on some platforms
> > (going from 7s to ~40ms), so I think the merging the series is worth it.
> > Some general questions so I better understand the impact:
> >
> > - Is the kdump path affected in any way? IIUC that doesn't need any
> >   relocation but we should also make sure we don't create the additional
> >   page table unnecessarily (should keep as much memory intact as
> >   possible). Maybe that's already handled.
> 
> Because kdump does not need relocation, we do not reserve pages for
> the page table in the kdump reboot case. In fact, with this series,
> kdump reboot becomes more straightforward as we skip the relocation
> function entirely, and jump directly into the crash kernel (or
> purgatory if kexec tools loaded them).
> 
> > - What happens if trans_pgd_create_copy() fails to allocate memory. Does
> >   it fall back to an MMU-off relocation?
> 
> In case we are so low on memory that trans_pgd_create_copy() fails to
> allocate the linear map that uses the large pages (the size of the
> page table is tiny) the kexec fails during kexec load time (not during
> reboot time), as out of memory. The MMU enabled kexec reboot is always
> on, and we should not have several ways to do kexec reboot as it makes
> the kexec reboot unpredictable in terms of performance, and also prone
> to bugs by having a common MMU enabled path and less common path when
> we are low on memory which is never tested.

I think this makes sense, especially since it will fail during the kexec
load time rather than reboot.

I'm ok in principle with this series but I'd need to convince James
Morse to have a another look since he followed it more closely than me.
Could you please rebase it against 5.15-rc1?

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v16 00/15] arm64: MMU enabled kexec relocation

2021-08-24 Thread Catalin Marinas
Hi Pavel,

This series is still missing reviews from those who understand kexec
better than me.

On Mon, Aug 02, 2021 at 05:53:53PM -0400, Pavel Tatashin wrote:
> Enable MMU during kexec relocation in order to improve reboot performance.
> 
> If kexec functionality is used for a fast system update, with a minimal
> downtime, the relocation of kernel + initramfs takes a significant portion
> of reboot.
> 
> The reason for slow relocation is because it is done without MMU, and thus
> not benefiting from D-Cache.

The performance improvements are indeed significant on some platforms
(going from 7s to ~40ms), so I think the merging the series is worth it.
Some general questions so I better understand the impact:

- Is the kdump path affected in any way? IIUC that doesn't need any
  relocation but we should also make sure we don't create the additional
  page table unnecessarily (should keep as much memory intact as
  possible). Maybe that's already handled.

- What happens if trans_pgd_create_copy() fails to allocate memory. Does
  it fall back to an MMU-off relocation?

And I presume this series does not introduce any changes to the kexec
tools ABI.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 8/9] arm64: kdump: Remove custom linux,usable-memory-range handling

2021-08-23 Thread Catalin Marinas
On Wed, Aug 11, 2021 at 10:51:06AM +0200, Geert Uytterhoeven wrote:
> Remove the architecture-specific code for handling the
> "linux,usable-memory-range" property under the "/chosen" node in DT, as
> the platform-agnostic FDT core code already takes care of this.
> 
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 7/9] arm64: kdump: Remove custom linux,elfcorehdr handling

2021-08-23 Thread Catalin Marinas
On Wed, Aug 11, 2021 at 10:51:05AM +0200, Geert Uytterhoeven wrote:
> Remove the architecture-specific code for handling the
> "linux,elfcorehdr" property under the "/chosen" node in DT, as the
> platform-agnostic handling in the FDT core code already takes care of
> this.
> 
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/4] arm64: Make kexec_file_load honor iomem reservations

2021-05-27 Thread Catalin Marinas
On Wed, May 26, 2021 at 08:05:27PM +0100, Marc Zyngier wrote:
> This series is a complete departure from the approach I initially sent
> almost a month ago[1]. Instead of trying to teach EFI, ACPI and other
> subsystem to use memblock, I've decided to stick with the iomem
> resource tree and use that exclusively for arm64.
> 
> This means that my current approach is (despite what I initially
> replied to both Dave and Catalin) to provide an arm64-specific
> implementation of arch_kexec_locate_mem_hole() which walks the
> resource tree and excludes ranges of RAM that have been registered for
> any odd purpose. This is exactly what the userspace implementation
> does, and I don't really see a good reason to diverge from it.
> 
> Again, this allows my Synquacer board to reliably use kexec_file_load
> with as little as 256M, something that would always fail before as it
> would overwrite most of the reserved tables.
> 
> Obviously, this is now at least 5.14 material. Given how broken
> kexec_file_load is for non-crash kernels on arm64 at the moment,
> should we at least disable it in 5.13 and all previous stable kernels?

I think it makes sense to disable it in the current and earlier kernels.

For this series:

Acked-by: Catalin Marinas 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 4/4] arm64: kexec_image: Implement arch_kexec_locate_mem_hole()

2021-05-27 Thread Catalin Marinas
On Wed, May 26, 2021 at 08:05:31PM +0100, Marc Zyngier wrote:
> Provide an arm64-specific implementation for arch_kexec_locate_mem_hole(),
> using the resource tree instead of memblock, and respecting
> the reservations added by EFI.
> 
> This ensures that kexec_file is finally reliable.
> 
> Reported-by: Moritz Fischer 
> Signed-off-by: Marc Zyngier 

It would have been clearer if __walk_iomem_res_desc() was able to do
such child res excluding callback (if asked via a new flag/arg) directly
but it's too late in the day to figure out if it's possible. It would
save us from another callback in the arch code. But if it's not possible
or you want to stick to this approach, fine by me:

Reviewed-by: Catalin Marinas 

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/4] kernel/resource: Populate child pointer in find_next_iomem_res()

2021-05-27 Thread Catalin Marinas
On Wed, May 26, 2021 at 08:05:29PM +0100, Marc Zyngier wrote:
> When find_next_iomem_res() returns a resource, it doesn't
> populate the child pointer (but does so with the parent).
> 
> As we are about to need to arse child resources, populate
> this pointer as well.

Did you mean "parse"? ;)

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations

2021-05-19 Thread Catalin Marinas
On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> It recently became apparent that using kexec with kexec_file_load() on
> arm64 is pretty similar to playing Russian roulette.
> 
> Depending on the amount of memory, the HW supported and the firmware
> interface used, your secondary kernel may overwrite critical memory
> regions without which the secondary kernel cannot boot (the GICv3 LPI
> tables being a prime example of such reserved regions).
> 
> It turns out that there is at least two ways for reserved memory
> regions to be described to kexec: /proc/iomem for the userspace
> implementation, and memblock.reserved for kexec_file. And of course,
> our LPI tables are only reserved using the resource tree, leading to
> the aforementioned stamping. Similar things could happen with ACPI
> tables as well.

So which one of these (/proc/iomem and memblock.reserved) would be the
correct option? If none of them, is their intersection any better?
Looking at the default kexec_locate_mem_hole(), it uses the resources
tree if !CONFIG_ARCH_KEEP_MEMBLOCK, otherwise memblock.

PowerPC implements its own arch_kexec_locate_mem_hole() to skip specific
arch regions. We could do something similar for arm64 if the arch code
knows where the LPI reservation is or the ACPI tables.

If we conclude that we need some intersection of resource reservations
and memblock, maybe we should change the default kexec_locate_mem_hole()
implementation to check for both (e.g. start with the resource tree and
only consider a range valid if not in memblock.reserved).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v14 02/11] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-02-25 Thread Catalin Marinas
On Thu, Feb 25, 2021 at 03:08:46PM +0800, Baoquan He wrote:
> On 02/24/21 at 02:35pm, Catalin Marinas wrote:
> > On Sat, Jan 30, 2021 at 03:10:16PM +0800, Chen Zhou wrote:
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index da769845597d..27470479e4a3 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -439,7 +439,8 @@ static int __init reserve_crashkernel_low(void)
> > >   return 0;
> > >   }
> > >  
> > > - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
> > > CRASH_ADDR_LOW_MAX);
> > > + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> > > + CRASH_ADDR_LOW_MAX);
> > >   if (!low_base) {
> > >   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> > > smaller size.\n",
> > >  (unsigned long)(low_size >> 20));
> > 
> > Is there any reason why the lower bound can't be 0 in all low cases
> > here? (Sorry if it's been already discussed, I lost track)
> 
> Seems like a good question.
> 
> This reserve_crashkernel_low(), paired with reserve_crashkernel_high(), is
> used to reserve memory under 4G so that kdump kernel owns memory for dma
> buffer allocation. In that case, kernel usually is loaded in high
> memory. In x86_64, kernel loading need be aligned to 16M because of
> CONFIG_PHYSICAL_START, please see commit 32105f7fd8faa7b ("x86: find
> offset for crashkernel reservation automatically"). But for crashkernel
> low memory, there seems to be no reason to ask for 16M alignment, if
> it's taken as dma buffer memory.
> 
> So we can make a different alignment for low memory only, e.g 2M. But
> 16M alignment consistent with crashkernel,high is also fine to me. The
> only affect is smaller alignment can increase the possibility of
> crashkernel low reservation.

I don't mind the 16M alignment in both low and high base. But is there
any reason that the lower bound (third argument) cannot be 0 in both
reserve_crashkernel() (the low attempt) and reserve_crashkernel_low()
cases? The comment in reserve_crashkernel() only talks about the 4G
upper bound but not why we need a 16M lower bound.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v14 08/11] arm64: kdump: reimplement crashkernel=X

2021-02-24 Thread Catalin Marinas
On Sat, Jan 30, 2021 at 03:10:22PM +0800, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 
> To solve these issues, change the behavior of crashkernel=X and
> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
> in DMA zone, and fall back to high allocation if it fails.
> We can also use "crashkernel=X,high" to select a region above DMA zone,
> which also tries to allocate at least 256M in DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> 
> Another minor change, there may be two regions reserved for crash
> dump kernel, in order to distinct from the high region and make no
> effect to the use of existing kexec-tools, rename the low region as
> "Crash kernel (low)".

I think we discussed this but I don't remember the conclusion. Is this
only renamed conditionally so that we don't break current kexec-tools?

IOW, assuming that the full crashkernel region is reserved below 4GB,
does the "(low)" suffix still appear or it's only if a high region is
additionally reserved?

> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 3f6ecae0bc68..f0caed0cb5e1 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
>  static inline void crash_post_resume(void) {}
>  #endif
>  
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif

Why not have this in some generic header?

> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index c18aacde8bb0..69c592c546de 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>   kernel_data.end <= res->end)
>   request_resource(res, _data);
>  #ifdef CONFIG_KEXEC_CORE
> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
> + /*
> +  * Userspace will find "Crash kernel" or "Crash kernel (low)"
> +  * region in /proc/iomem.
> +  * In order to distinct from the high region and make no effect
> +  * to the use of existing kexec-tools, rename the low region as
> +  * "Crash kernel (low)".
> +  */
> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> + crashk_low_res.end <= res->end) {
> + crashk_low_res.name = "Crash kernel (low)";
> + request_resource(res, _low_res);
> + }
>   if (crashk_res.end && crashk_res.start >= res->start &&
>   crashk_res.end <= res->end)
>   request_resource(res, _res);

My reading of the new generic reserve_crashkernel() is that
crashk_low_res will only be populated if crask_res is above 4GB. If
that's correct, I'm fine with the renaming here since current systems
would not get a renamed low reservation (as long as they don't change
the kernel cmdline).

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 912f64f505f7..d20f5c444ebf 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -61,66 +62,11 @@ EXPORT_SYMBOL(memstart_addr);
>   */
>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>  
> -#ifdef CONFIG_KEXEC_CORE
> -/*
> - * reserve_crashkernel() - reserves memory for crash kernel
> - *
> - * This function reserves memory area given in "crashkernel=" kernel command
> - * line parameter. The memory reserved is used by dump capture kernel when
> - * primary kernel is crashing.
> - */
> +#ifndef CONFIG_KEXEC_CORE
>  static void __init reserve_crashkernel(void)
>  {
[...]
>  }
> +#endif

Can we not have the dummy reserve_crashkernel() in the generic code as
well and avoid the #ifndef here?

>  #ifdef CONFIG_CRASH_DUMP
>  static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> @@ -446,6 +392,14 @@ void __init bootmem_init(void)
>* reserved, so do it here.
>*/
>   reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> + /*
> +  * The low region is intended to be used for crash dump kernel devices,
> +  * just mark the low region as "nomap" simply.
> +  */
> + if (crashk_low_res.end)
> + memblock_mark_nomap(crashk_low_res.start, 
> resource_size(_low_res));
> +#endif

Do we do something similar for crashk_res?

Also, I can see we call crash_exclude_mem_range() only for crashk_res.
Do we need to do this for crashk_low_res as well?

-- 
Catalin


Re: [PATCH v14 02/11] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-02-24 Thread Catalin Marinas
On Sat, Jan 30, 2021 at 03:10:16PM +0800, Chen Zhou wrote:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index da769845597d..27470479e4a3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -439,7 +439,8 @@ static int __init reserve_crashkernel_low(void)
>   return 0;
>   }
>  
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
> CRASH_ADDR_LOW_MAX);
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
>   if (!low_base) {
>   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> smaller size.\n",
>  (unsigned long)(low_size >> 20));

Is there any reason why the lower bound can't be 0 in all low cases
here? (Sorry if it's been already discussed, I lost track)

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v14 01/11] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2021-02-24 Thread Catalin Marinas
On Sat, Jan 30, 2021 at 03:10:15PM +0800, Chen Zhou wrote:
> Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
> alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but
> function reserve_crashkernel() also used 1M alignment. So just
> replace hard-coded alignment 1M with macro CRASH_ALIGN.
[...]
> @@ -510,7 +507,7 @@ static void __init reserve_crashkernel(void)
>   } else {
>   unsigned long long start;
>  
> - start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> + start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, 
> crash_base,
> crash_base + crash_size);
>   if (start != crash_base) {
>   pr_info("crashkernel reservation failed - memory is in 
> use.\n");

There is a small functional change here for x86. Prior to this patch,
crash_base passed by the user on the command line is allowed to be 1MB
aligned. With this patch, such reservation will fail.

Is the current behaviour a bug in the current x86 code or it does allow
1MB-aligned reservations?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-07 Thread Catalin Marinas
On Wed, Oct 07, 2020 at 12:37:49PM +0530, Bhupesh Sharma wrote:
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas  
> wrote:
> > On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > > I think my earlier email with the test results on this series bounced
> > > off the mailing list server (for some weird reason), but I still see
> > > several issues with this patchset. I will add specific issues in the
> > > review comments for each patch again, but overall, with a crashkernel
> > > size of say 786M, I see the following issue:
> > >
> > > # cat /proc/cmdline
> > > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> 
> > > rd.lvm.lv=<..snip..> crashkernel=786M
> > >
> > > I see two regions of size 786M and 256M reserved in low and high
> > > regions respectively, So we reserve a total of 1042M of memory, which
> > > is an incorrect behaviour:
> > >
> > > # dmesg | grep -i crash
> > > [0.00] Reserving 256MB of low memory at 2816MB for crashkernel 
> > > (System low RAM: 768MB)
> > > [0.00] Reserving 786MB of memory at 654158MB for crashkernel 
> > > (System RAM: 130816MB)
> > > [0.00] Kernel command line: 
> > > BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ 
> > > root=/dev/mapper/rhel_ampere--hr330a--03-root ro 
> > > rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap 
> > > crashkernel=786M cma=1024M
> > >
> > > # cat /proc/iomem | grep -i crash
> > >   b000-bfff : Crash kernel (low)
> > >   bfcbe0-bffcff : Crash kernel
> >
> > As Chen said, that's the intended behaviour and how x86 works. The
> > requested 768M goes in the high range if there's not enough low memory
> > and an additional buffer for swiotlb is allocated, hence the low 256M.
> 
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?

Similar in the sense of the fallback to high memory and some low memory
allocation but the amounts can vary per architecture.

> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
> 
> /*
>  * two parts from kernel/dma/swiotlb.c:
>  * -swiotlb size: user-specified with swiotlb= or default.
>  *
>  * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>  * to 8M for other buffers that may need to stay low too. Also
>  * make sure we allocate enough extra low memory so that we
>  * don't run out of DMA buffers for 32-bit devices.
>  */
> low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> 
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?

I think it makes sense to set a low_size = 0 if
swiotlb_size_or_default() is 0. The assumption would be that if the main
kernel doesn't need an swiotlb, the crashdump one wouldn't need it
either. But this probably needs the ZONE_DMA for non-RPi4 platforms
addressed as well (expanded to the whole ZONE_DMA32).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 9/9] kdump: update Documentation about crashkernel

2020-10-07 Thread Catalin Marinas
On Tue, Oct 06, 2020 at 10:10:54AM +0800, chenzhou wrote:
> On 2020/10/6 1:19, Catalin Marinas wrote:
> > On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
> >> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> >> b/Documentation/admin-guide/kdump/kdump.rst
> >> index 2da65fef2a1c..549611abc581 100644
> >> --- a/Documentation/admin-guide/kdump/kdump.rst
> >> +++ b/Documentation/admin-guide/kdump/kdump.rst
> > [...]
> >> @@ -316,8 +325,18 @@ Boot into System Kernel
> >> kernel will automatically locate the crash kernel image within the
> >> first 512MB of RAM if X is not given.
> >>  
> >> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> >> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
> >> (0x20).
> >> +   On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
> >> +   fall back to high allocation if it fails. And go for high allocation
> >> +   directly if the required size is too large.
> >> +   We can also use "crashkernel=X,high" to select a high region above
> >> +   DMA zone, which also tries to allocate at least 256M low memory in
> >> +   DMA zone automatically.
> >> +   "crashkernel=Y,low" can be used to allocate specified size low memory
> >> +   in DMA zone.
> >> +   For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
> > I don't think we should mention non-RPi4 explicitly here. I don't even
> > understand what the suggestion is since the only way is to disable
> > ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
> > here.
> How about like this:
> If the kernel config ZONE_DMA is disabled, just try low allocation in DMA32 
> zone
> and high allocation above DMA32 zone.

Something like: "allocate 256M low memory in the DMA zone automatically
(or the DMA32 zone if CONFIG_ZONE_DMA is disabled)".

I'd keep it short.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-06 Thread Catalin Marinas
On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
> 
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> 
> crashkernel=786M
> 
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
> 
> # dmesg | grep -i crash
> [0.00] Reserving 256MB of low memory at 2816MB for crashkernel 
> (System low RAM: 768MB)
> [0.00] Reserving 786MB of memory at 654158MB for crashkernel (System 
> RAM: 130816MB)
> [0.00] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ 
> root=/dev/mapper/rhel_ampere--hr330a--03-root ro 
> rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap 
> crashkernel=786M cma=1024M
> 
> # cat /proc/iomem | grep -i crash
>   b000-bfff : Crash kernel (low)
>   bfcbe0-bffcff : Crash kernel

As Chen said, that's the intended behaviour and how x86 works. The
requested 768M goes in the high range if there's not enough low memory
and an additional buffer for swiotlb is allocated, hence the low 256M.

We could (as an additional patch), subtract the 256M from the high
allocation so that you'd get a low 256M and a high 512M, not sure it's
worth it. Note that with a "crashkernel=768M,high" option, you still get
the additional low 256M, otherwise the crashkernel won't be able to
boot as there's no memory in ZONE_DMA. In the explicit ",high" request
case, I'm not sure subtracted the 256M is more intuitive.

In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
to cover the entire 32-bit address space (i.e. identical to the current
ZONE_DMA32).

> IMO, we should test this feature more before including this in 5.11

Definitely. That's one of the reasons we haven't queued it yet. So any
help with testing here is appreciated.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-10-05 Thread Catalin Marinas
On Fri, Sep 18, 2020 at 05:06:37PM +0800, chenzhou wrote:
> On 2020/9/18 16:59, chenzhou wrote:
> > On 2020/9/18 15:25, Baoquan He wrote:
> >> On 09/07/20 at 09:47pm, Chen Zhou wrote:
> >>> To make the functions reserve_crashkernel[_low]() as generic,
> >>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> >>>
> >>> Signed-off-by: Chen Zhou 
> >>> ---
> >>>  arch/x86/kernel/setup.c | 11 ++-
> >>>  1 file changed, 6 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >>> index d7fd90c52dae..71a6a6e7ca5b 100644
> >>> --- a/arch/x86/kernel/setup.c
> >>> +++ b/arch/x86/kernel/setup.c
> >>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> >>>   unsigned long total_low_mem;
> >>>   int ret;
> >>>  
> >>> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >>> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> >> Just note that the replacement has been done in another patch from Mike
> >> Rapoport, partially. He seems to have done reserve_crashkernel_low()
> >> part, there's one left in reserve_crashkernel(), you might want to check
> >> that. 
> >>
> >> Mike's patch which is from a patchset has been merged into Andrew's next
> >> tree.
> >>
> >> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
> >> Author: Mike Rapoport 
> >> Date:   Wed Aug 26 15:22:32 2020 +1000
> >>
> >> x86/setup: simplify reserve_crashkernel()
> As Baoquan said, some functions have been changed in the next tree,
> if i need to rebase on top of the next tree.

Please rebase at 5.10-rc1 when the x86 change will probably be in and
aim to queue this series for 5.11.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 9/9] kdump: update Documentation about crashkernel

2020-10-05 Thread Catalin Marinas
On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..549611abc581 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
[...]
> @@ -316,8 +325,18 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> -   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
> +   fall back to high allocation if it fails. And go for high allocation
> +   directly if the required size is too large.
> +   We can also use "crashkernel=X,high" to select a high region above
> +   DMA zone, which also tries to allocate at least 256M low memory in
> +   DMA zone automatically.
> +   "crashkernel=Y,low" can be used to allocate specified size low memory
> +   in DMA zone.
> +   For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.

I don't think we should mention non-RPi4 explicitly here. I don't even
understand what the suggestion is since the only way is to disable
ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
here.

> +   Use "crashkernel=Y@X" if you really have to reserve memory from
> +   specified start address X. Note that the start address of the kernel,
> +   X if explicitly specified, must be aligned to 2MiB (0x20).
>  
>  Load the Dump-capture Kernel
>  
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index a1068742a6df..f7df572d8f64 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -727,6 +727,10 @@
>   [KNL, X86-64] Select a region under 4G first, and
>   fall back to reserve region above 4G when '@offset'
>   hasn't been specified.
> + [KNL, arm64] Try low allocation in DMA zone, fall back
> + to high allocation if it fails when '@offset' hasn't 
> been
> + specified. For non-RPi4 platforms, change DMA zone to
> + DMA32 zone.

Same here, unclear what "change DMA zone to DMA32 zone" means.

>   See Documentation/admin-guide/kdump/kdump.rst for 
> further details.
>  
>   crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -743,6 +747,8 @@
>   Otherwise memory region will be allocated below 4G, if
>   available.
>   It will be ignored if crashkernel=X is specified.
> + [KNL, arm64] range in high memory.
> + Allow kernel to allocate physical memory region from 
> top.
>   crashkernel=size[KMG],low
>   [KNL, X86-64] range under 4G. When crashkernel=X,high
>   is passed, kernel could allocate physical memory region
> @@ -751,13 +757,16 @@
>   requires at least 64M+32K low memory, also enough extra
>   low memory is needed to make sure DMA buffers for 32-bit
>   devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
>   This one let user to specify own low range under 4G
>   for second kernel instead.
>   0: to disable low allocation.
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
> -
> + [KNL, arm64] range in low memory.
> + This one let user to specify a low range in DMA zone for
> + crash dump kernel. For non-RPi4 platforms, change DMA 
> zone
> + to DMA32 zone.

And again here.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 6/9] arm64: kdump: reimplement crashkernel=X

2020-10-05 Thread Catalin Marinas
On Mon, Sep 07, 2020 at 09:47:42PM +0800, Chen Zhou wrote:
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 53acbeca4f57..1b24072f2bae 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>   kernel_data.end <= res->end)
>   request_resource(res, _data);
>  #ifdef CONFIG_KEXEC_CORE
> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
> + /*
> +  * Userspace will find "Crash kernel" or "Crash kernel (low)"
> +  * region in /proc/iomem.
> +  * In order to distinct from the high region and make no effect
> +  * to the use of existing kexec-tools, rename the low region as
> +  * "Crash kernel (low)".
> +  */
> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> + crashk_low_res.end <= res->end) {
> + crashk_low_res.name = "Crash kernel (low)";
> + request_resource(res, _low_res);
> + }

With the changes in this series (including the above), how do the
current kexec-tools behave? Do they pick just the high region and the
loaded kernel will subsequently fail to boot?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 7/9] kdump: add threshold for the required memory

2020-10-05 Thread Catalin Marinas
On Mon, Sep 07, 2020 at 09:47:43PM +0800, Chen Zhou wrote:
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 3f735cb37ace..d11d597a470d 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -378,6 +378,15 @@ int __init reserve_crashkernel_low(void)
>  }
>  
>  #if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> +
> +/*
> + * Add a threshold for required memory size of crashkernel. If required 
> memory
> + * size is greater than threshold, just go for high allocation directly. The
> + * value of threshold is set as half of the total low memory.
> + */
> +#define REQUIRED_MEMORY_THRESHOLD(memblock_mem_size(CRASH_ADDR_LOW_MAX 
> >> \
> + PAGE_SHIFT) >> 1)
> +
>  #ifdef CONFIG_KEXEC_CORE
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
> @@ -422,7 +431,7 @@ void __init reserve_crashkernel(void)
>* So try low memory first and fall back to high memory
>* unless "crashkernel=size[KMG],high" is specified.
>*/
> - if (!high)
> + if (!high && crash_size <= REQUIRED_MEMORY_THRESHOLD)
>   crash_base = memblock_find_in_range(CRASH_ALIGN,
>   CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);

Since any change now is affecting the x86 semantics slightly, I'd
suggest you drop this patch. We can add it later if needed, once the
core changes are in.

Thinking about this, if one requires a crashkernel reservation that
allocates all of the ZONE_DMA, it would probably be noticed and explicit
,high/,low options can be used.

Note that we are also trying to make ZONE_DMA full 32-bit on non-RPi4
hardware.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-05 Thread Catalin Marinas
On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> On 9/7/20 8:47 AM, Chen Zhou wrote:
> > Chen Zhou (9):
> >x86: kdump: move CRASH_ALIGN to 2M
> >x86: kdump: make the lower bound of crash kernel reservation
> >  consistent
> >x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> >  reserve_crashkernel[_low]()
> >x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> >arm64: kdump: introduce some macroes for crash kernel reservation
> >arm64: kdump: reimplement crashkernel=X
> >kdump: add threshold for the required memory
> >arm64: kdump: add memory for devices by DT property
> >  linux,usable-memory-range
> >kdump: update Documentation about crashkernel
[...]
> I did a brief unit-test on 5.9-rc4.
> 
> Please add:
> 
> Tested-by:  John Donnelly 

Thanks for testing.

> This activity is over a year old. It needs accepted.

It's getting there, hopefully in 5.11. There are some minor tweaks to
address.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-02 Thread Catalin Marinas
On Sat, Aug 01, 2020 at 09:08:56PM +0800, Chen Zhou wrote:
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..4b58f97351d5 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -299,7 +299,15 @@ Boot into System Kernel
> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
> starting at physical address 0x0100 (16MB) for the dump-capture 
> kernel.
>  
> -   On x86 and x86_64, use "crashkernel=64M@16M".
> +   On x86 use "crashkernel=64M@16M".
> +
> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
> +   fall back to reserve region above 4G.
> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> +   also tries to allocate at least 256M below 4G automatically and
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   Use "crashkernel=Y@X" if you really have to reserve memory from specified
> +   start address X.
>  
> On ppc64, use "crashkernel=128M@32M".
>  
> @@ -316,8 +324,15 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> -   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   On arm64, use "crashkernel=X" to try low allocation in ZONE_DMA, and
> +   fall back to high allocation if it fails. And go for high allocation
> +   directly if the required size is too large. If crash_base is outside

I wouldn't mention crash_base in the admin guide. That's an
implementation detail really and admins are not supposed to read the
source code to make sense of the documentation. ZONE_DMA is also a
kernel internal, so you'd need to define what it is for arm64. At least
the DMA and DMA32 zones are printed during kernel boot.

> +   ZONE_DMA, try to allocate at least 256M in ZONE_DMA automatically.
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
> +   Use "crashkernel=Y@X" if you really have to reserve memory from
> +   specified start address X. Note that the start address of the kernel,
> +   X if explicitly specified, must be aligned to 2MiB (0x20).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-02 Thread Catalin Marinas
On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> high allocation if it fails.
> 
> If requized size X is too large and leads to very little free memory
> in ZONE_DMA after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
> 
> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> specified size low memory.

Except for the threshold to keep zone ZONE_DMA memory,
reserve_crashkernel() looks very close to the x86 version. Shall we try
to make this generic as well? In the first instance, you could avoid the
threshold check if it takes an explicit ",high" option.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-02 Thread Catalin Marinas
On Tue, Aug 18, 2020 at 03:07:04PM +0800, chenzhou wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a8e34d97a894..4df18c7ea438 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -147,7 +147,7 @@ static void __init reserve_crashkernel(void)
> }
> memblock_reserve(crash_base, crash_size);
>  
> -   if (crash_base >= CRASH_ADDR_LOW_MAX) {
> +   if (memstart_addr < CRASH_ADDR_LOW_MAX && crash_base >= 
> CRASH_ADDR_LOW_MAX) {
> const char *rename = "Crash kernel (low)";

Since CRASH_ADDR_LOW_MAX is defined as arm64_dma32_phys_limit and such
limit is always greater than memstart_addr, this additional check
doesn't do anything. See my other reply on how ZONE_DMA32 is created on
arm64.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-01 Thread Catalin Marinas
On Tue, Aug 18, 2020 at 03:07:04PM +0800, chenzhou wrote:
> On 2020/8/10 14:03, Dave Young wrote:
> >>> Previously I remember we talked about to use similar logic as X86, but I
> >>> remember you mentioned on some arm64 platform there could be no low
> >>> memory at all.  Is this not a problem now for the fallback?  Just be
> >>> curious, thanks for the update, for the common part looks good.
> >>
> >> Did you mean this discuss: https://lkml.org/lkml/2019/12/27/122?
> > I meant about this reply instead :)
> > https://lkml.org/lkml/2020/1/16/616
> 
> Sorry for not repley in time, I was on holiday last week.
> 
> The platform James mentioned may exist for which have no devices and
> need no low memory.

If there is no memory below 4GB, the arm64 kernel assumes that the
32-bit devices will have some DMA offsets shifting the addresses to the
bottom of the available RAM. So even if RAM starts above 4GB, we
ZONE_DMA32 will be allocated in the bottom 4GB of the high memory (and
if the hardware designers forgot to shift those DMA accesses, we don't
have to support the platform ;)).

So the arm64 notion of low memory differs slightly from the x86 one.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 2/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-09-01 Thread Catalin Marinas
On Thu, Aug 06, 2020 at 03:36:27PM +0200, Ingo Molnar wrote:
> 
> * Chen Zhou  wrote:
> 
> > In preparation for supporting reserve_crashkernel_low in arm64 as
> > x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> > 
> > BTW, move x86_64 CRASH_ALIGN to 2M suggested by Dave. CONFIG_PHYSICAL_ALIGN
> > can be selected from 2M to 16M, move to the same as arm64.
> > 
> > Signed-off-by: Chen Zhou 
> > ---
> >  arch/x86/include/asm/kexec.h | 24 ++
> >  arch/x86/kernel/setup.c  | 86 +++-
> >  include/linux/crash_core.h   |  3 ++
> >  include/linux/kexec.h|  2 -
> >  kernel/crash_core.c  | 74 +++
> >  kernel/kexec_core.c  | 17 ---
> >  6 files changed, 107 insertions(+), 99 deletions(-)
> 
> Since the changes are centered around arm64, I suppose the arm64 tree 
> will carry this patchset?
> 
> Assuming that this is a 100% invariant moving of code that doesn't 
> regress on x86:
> 
>   Acked-by: Ingo Molnar 

Thanks Ingo. The only difference I see is that CRASH_ALIGN has been
reduced to 2M here from 16M for x86. Would this break configs that have
PHYSICAL_ALIGN > 2M?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-29 Thread Catalin Marinas
On Wed, Jul 29, 2020 at 10:14:32PM +0800, chenzhou wrote:
> On 2020/7/29 19:58, Catalin Marinas wrote:
> > On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
> >> How about like this:
> >> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
> >>behaviour to ZONE_DMA allocations.
> >> 2. For this patch series, make the reserve_crashkernel_low() to
> >>ZONE_DMA allocations.
> > 
> > So you mean rebasing your series on top of Bhupesh's? I guess you can
> > combine the two, I really don't care which way as long as we fix both
> > issues and agree on the crashkernel= semantics. I think with some tweaks
> > we can go with your series alone.
> >
> > IIUC from the x86 code (especially the part you #ifdef'ed out for
> > arm64), if ",low" is not passed (so just standard crashkernel=X), it
> > still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
> > rest can go in a high region. Why can't we do something similar on
> > arm64? Of course, you can keep the ",low" argument for explicit
> > allocation but I don't want to mandate it.
> 
> It is a good idea to combine the two.
> 
> For parameter crashkernel=X, we do like this:
> 1. allocate some low memory in ZONE_DMA(or ZONE_DMA32 if CONFIG_ZONE_DMA=n)
> 2. allocate X size memory in a high region
> 
> ",low" argument can be used to specify the low memory.
> 
> Do i understand correctly?

Yes, although we could follow the x86 approach:

1. Try low (ZONE_DMA for arm64) allocation, fallback to high allocation
   if it fails.

2. If crash_base is outside ZONE_DMA, call reserve_crashkernel_low()
   which either honours the ,low option or allocates some small amount
   in ZONE_DMA.

If at some point we have platforms failing step 2, we'll look at
changing ZONE_DMA to the full 4GB on non-RPi4 platforms.

It looks to me like x86 ignores the ,low option if the first step
managed to get some low memory. Shall we do the same on arm64?

> > So with an implicit ZONE_DMA allocation similar to the x86 one, we
> > probably don't need Bhupesh's series at all. In addition, we can limit
> > crashkernel= to the first 4G with a fall-back to high like x86 (not sure
> > if memblock_find_in_range() is guaranteed to search in ascending order).
> > I don't think we need an explicit ",high" annotation.
> >
> > So with the above, just a crashkernel=1G gives you at least 256MB in
> > ZONE_DMA followed by the rest anywhere, with a preference for
> > ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
> > mostly intact from x86 (less #ifdef's).
> 
> Yes. We can let crashkernel=X  try to reserve low memory and fall back to use 
> high memory
> if failing to find a low range.

The only question is whether we need to preserve some more ZONE_DMA on
the current system. If for example we pass a crashkernel=512M and some
cma=, we may end up with very little free memory in ZONE_DMA. That's
mostly an issue for RPi4 since other platforms would work with
ZONE_DMA32. We could add a threshold and go for high allocation directly
if the required size is too large.

> About the function reserve_crashkernel_low(), if we put it in arch/arm64, 
> there is some common
> code with x86_64. Some suggestions about this?

If we can use this function almost intact, just move it in a common
place. But if it gets sprinkled with #ifdef CONFIG_ARM64, I'd rather
duplicate it. I'd still prefer to move it to a common place if possible.

You can go a step further and also move the x86 reserve_crashkernel() to
common code. I don't think there a significant difference between arm64
and x86 here. You'd have to define arch-specific specific
CRASH_ADDR_LOW_MAX etc.

Also patches moving code should not have any functional change. The
CRASH_ALIGN change from 16M to 2M on x86 should be a separate patch as
it needs to be acked by the x86 maintainers (IIRC, Ingo only acked the
function move if there was no functional change; CRASH_ALIGN is used for
the start address, not just alignment, on x86).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-29 Thread Catalin Marinas
Hi Chen,

On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
> On 2020/7/28 1:30, Catalin Marinas wrote:
> > Anyway, there are two series solving slightly different issues with
> > kdump reservations:
> >
> > 1. This series which relaxes the crashkernel= allocation to go anywhere
> >in the accessible space while having a dedicated crashkernel=X,low
> >option for ZONE_DMA.
> >
> > 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
> >ZONE_DMA.
> >
> > For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
> > Existing crashkernel= uses may no longer work, depending on where the
> > allocation falls. Option (2) above is a quick fix assuming that the
> > crashkernel reservation is small enough. What's a typical crashkernel
> > option here? That series is probably more prone to reservation failures.
> >
> > Option (1), i.e. this series, doesn't solve the problem raised by
> > Bhupesh unless one uses the crashkernel=X,low argument. It can actually
> > make it worse even for ZONE_DMA32 since the allocation can go above 4G
> > (assuming that we change the ZONE_DMA configuration to only limit it to
> > 1GB on RPi4).
> >
> > I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
> > allocations. If this is too small for typical kdump, we can look into
> > expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
> > list). In addition, if Chen thinks allocations above 4G are still needed
> > or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
> > ",high" option to explicitly require such access.
> 
> Thanks for your reply and exhaustive explanation.
> 
> In our ARM servers, we need to to reserve a large chunk for kdump(512M
> or 1G), there is no enough low memory. So we proposed this patch
> series "support reserving crashkernel above 4G on arm64 kdump" In
> April 2019.

Trying to go through the discussions last year, hopefully things get
clearer.

So prior to the ZONE_DMA change, you still couldn't reserve 1G in the
first 4GB? It shouldn't be sparsely populated during early boot.

> I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier 
> versions.
> Suggested by James, to simplify, we call reserve_crashkernel_low() at the 
> beginning of
> reserve_crashkernel() and then relax the arm64_dma32_phys_limit if 
> reserve_crashkernel_low()
> allocated something.
> That is, just the parameter "crashkernel=X,low" is ok and i deleted 
> "crashkernel=X,high".

The problem I see is that with your patches we diverge from x86
behaviour (and the arm64 behaviour prior to the ZONE_DMA reduction) as
we now require that crashkernel=X,low is always passed if you want
something in ZONE_DMA (and you do want, otherwise the crashdump kernel
fails to boot).

My main requirement is that crashkernel=X, without any suffix, still
works which I don't think is guaranteed with your patches (well,
ignoring RPi4 ZONE_DMA). Bhupesh's series is a quick fix but doesn't
solve your large allocation requirements (that may have worked prior to
the ZONE_DMA change).

> After the ZONE_DMA introduced in December 2019, the issue occurred as
> you said above. In fact, we didn't have RPi4 machine.

You don't even need to have a RPi4 machine, ZONE_DMA has been set to 1GB
unconditionally. And while we could move it back to 4GB on non-RPi4
hardware, I'd rather have a solution that fixes kdump for RPi4 as well.

> Originally, i suggested to fix this based on this patch series and
> used the dedicated option.
> 
> According to your clarify, for typical kdump, there are other
> solutions. In this case, "keep the crashkernel= behaviour to ZONE_DMA
> allocations" looks much better.
> 
> How about like this:
> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
>behaviour to ZONE_DMA allocations.
> 2. For this patch series, make the reserve_crashkernel_low() to
>ZONE_DMA allocations.

So you mean rebasing your series on top of Bhupesh's? I guess you can
combine the two, I really don't care which way as long as we fix both
issues and agree on the crashkernel= semantics. I think with some tweaks
we can go with your series alone.

IIUC from the x86 code (especially the part you #ifdef'ed out for
arm64), if ",low" is not passed (so just standard crashkernel=X), it
still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
rest can go in a high region. Why can't we do something similar on
arm64? Of course, you can keep the ",low" argument for explicit
allocation but I don't want to mandate it.

So with an implicit ZONE_DMA allocation similar to the x86 one, we
probably don't need Bhupesh's

Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-27 Thread Catalin Marinas
On Fri, Jul 03, 2020 at 11:58:15AM +0800, Chen Zhou wrote:
> commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
> broken the arm64 kdump. If the memory reserved for crash dump kernel
> falled in ZONE_DMA32, the devices in crash dump kernel need to use
> ZONE_DMA will alloc fail.
> 
> This patch addressed the above issue based on "reserving crashkernel
> above 4G". Originally, we reserve low memory below 4G, and now just need
> to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
> if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
> in crash dump kernel, it is a good choice to use parameters
> "crashkernel=X crashkernel=Y,low".
> 
> Signed-off-by: Chen Zhou 
> ---
>  kernel/crash_core.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a7580d291c37..e8ecbbc761a3 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
>   unsigned long long base, low_base = 0, low_size = 0;
>   unsigned long total_low_mem;
>   int ret;
> + phys_addr_t crash_max = 1ULL << 32;
>  
>   total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>  
> @@ -352,7 +353,11 @@ int __init reserve_crashkernel_low(void)
>   return 0;
>   }
>  
> - low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +#ifdef CONFIG_ARM64
> + if (IS_ENABLED(CONFIG_ZONE_DMA))
> + crash_max = arm64_dma_phys_limit;
> +#endif
> + low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
>   if (!low_base) {
>   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> smaller size.\n",
>  (unsigned long)(low_size >> 20));

Given the number of #ifdefs we end up with in this function, I think
it's better to simply copy to the code to arch/arm64 and tailor it
accordingly.

Anyway, there are two series solving slightly different issues with
kdump reservations:

1. This series which relaxes the crashkernel= allocation to go anywhere
   in the accessible space while having a dedicated crashkernel=X,low
   option for ZONE_DMA.

2. Bhupesh's series [1] forcing crashkernel=X allocations only from
   ZONE_DMA.

For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
Existing crashkernel= uses may no longer work, depending on where the
allocation falls. Option (2) above is a quick fix assuming that the
crashkernel reservation is small enough. What's a typical crashkernel
option here? That series is probably more prone to reservation failures.

Option (1), i.e. this series, doesn't solve the problem raised by
Bhupesh unless one uses the crashkernel=X,low argument. It can actually
make it worse even for ZONE_DMA32 since the allocation can go above 4G
(assuming that we change the ZONE_DMA configuration to only limit it to
1GB on RPi4).

I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
allocations. If this is too small for typical kdump, we can look into
expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
list). In addition, if Chen thinks allocations above 4G are still needed
or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
",high" option to explicitly require such access.

[1] http://lists.infradead.org/pipermail/kexec/2020-July/020777.html

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-07-03 Thread Catalin Marinas
On Fri, Jul 03, 2020 at 12:55:03AM +0530, Bhupesh Sharma wrote:
> On Fri, May 15, 2020 at 2:44 PM Bhupesh Sharma  wrote:
> > On Thu, Apr 30, 2020 at 10:05 AM Bhupesh Sharma  wrote:
> > > On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas  
> > > wrote:
> > > >
> > > > On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > > > > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  
> > > > > wrote:
> > > > > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > > > > >  arch/arm64/configs/defconfig | 1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/configs/defconfig 
> > > > > > > b/arch/arm64/configs/defconfig
> > > > > > > index 24e534d85045..fa122f4341a2 100644
> > > > > > > --- a/arch/arm64/configs/defconfig
> > > > > > > +++ b/arch/arm64/configs/defconfig
> > > > > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > > > > >  CONFIG_NUMA=y
> > > > > > >  CONFIG_SECCOMP=y
> > > > > > >  CONFIG_KEXEC=y
> > > > > > > +CONFIG_KEXEC_FILE=y
> > > > > > >  CONFIG_CRASH_DUMP=y
> > > > > > >  CONFIG_XEN=y
> > > > > > >  CONFIG_COMPAT=y
> > > > > > > --
> > > > > > > 2.7.4
> > > > >
> > > > > Thanks a lot  Mark.
> > > > >
> > > > > Hi Catalin, Will,
> > > > >
> > > > > Can you please help pick this patch in the arm tree. We have an
> > > > > increasing number of user-cases from distro users
> > > > > who want to use kexec_file_load() as the default interface for
> > > > > kexec/kdump on arm64.
> > > >
> > > > We could pick it up if it doesn't conflict with the arm-soc tree. They
> > > > tend to pick most of the defconfig changes these days (and could as well
> > > > pick this one).
> > >
> > > Thanks Catalin.
> > > (+Cc Arnd)
> > >
> > > Hi Arnd,
> > >
> > > Can you please help pick this change via the arm-soc tree?
> >
> > Ping. Any updates on this defconfig patch.
> 
> Ping. Seems there is no reply from Arnd on this patch.
> Can you please help pull in this one as well. It has been pending for
> quite some time now.

I can queue it for 5.9.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2020-07-02 Thread Catalin Marinas
On Thu, 14 May 2020 00:22:35 +0530, Bhupesh Sharma wrote:
> Apologies for the delayed update. Its been quite some time since I
> posted the last version (v5), but I have been really caught up in some
> other critical issues.
> 
> Changes since v5:
> 
> - v5 can be viewed here:
>   http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> - Addressed review comments from James Morse and Boris.
> - Added Tested-by received from John on v5 patchset.
> - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
>   patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
>   applied.
> 
> [...]

Applied to arm64 (for-next/vmcoreinfo), thanks!

[1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
  https://git.kernel.org/arm64/c/1d50e5d0c505
[2/2] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
  https://git.kernel.org/arm64/c/bbdbc11804ff

-- 
Catalin


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2020-07-02 Thread Catalin Marinas
On Thu, Jul 02, 2020 at 08:08:55PM +0800, Dave Young wrote:
> Hi Catalin,
> On 07/02/20 at 12:00pm, Catalin Marinas wrote:
> > On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > index 9f1557b98468..18175687133a 100644
> > > --- a/kernel/crash_core.c
> > > +++ b/kernel/crash_core.c
> > > @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> > >   VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
> > >   VMCOREINFO_STRUCT_SIZE(mem_section);
> > >   VMCOREINFO_OFFSET(mem_section, section_mem_map);
> > > + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
> > >  #endif
> > >   VMCOREINFO_STRUCT_SIZE(page);
> > >   VMCOREINFO_STRUCT_SIZE(pglist_data);
> > 
> > I can queue this patch via the arm64 tree (together with the second one)
> > but I'd like an ack from the kernel/crash_core.c maintainers. They don't
> > seem to have been cc'ed either (only the kexec list).
> 
> For the VMCOREINFO part, I'm fine with the changes, but since I do not
> understand the arm64 pieces so I would like to leave to arm64 people to
> review.  If arm64 bits are good enough, feel free to add:
> 
> Acked-by: Dave Young 

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2020-07-02 Thread Catalin Marinas
On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 9f1557b98468..18175687133a 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>   VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
>   VMCOREINFO_STRUCT_SIZE(mem_section);
>   VMCOREINFO_OFFSET(mem_section, section_mem_map);
> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
>  #endif
>   VMCOREINFO_STRUCT_SIZE(page);
>   VMCOREINFO_STRUCT_SIZE(pglist_data);

I can queue this patch via the arm64 tree (together with the second one)
but I'd like an ack from the kernel/crash_core.c maintainers. They don't
seem to have been cc'ed either (only the kexec list).

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-04-28 Thread Catalin Marinas
On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  wrote:
> > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > >  arch/arm64/configs/defconfig | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > > index 24e534d85045..fa122f4341a2 100644
> > > --- a/arch/arm64/configs/defconfig
> > > +++ b/arch/arm64/configs/defconfig
> > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > >  CONFIG_NUMA=y
> > >  CONFIG_SECCOMP=y
> > >  CONFIG_KEXEC=y
> > > +CONFIG_KEXEC_FILE=y
> > >  CONFIG_CRASH_DUMP=y
> > >  CONFIG_XEN=y
> > >  CONFIG_COMPAT=y
> > > --
> > > 2.7.4
> 
> Thanks a lot  Mark.
> 
> Hi Catalin, Will,
> 
> Can you please help pick this patch in the arm tree. We have an
> increasing number of user-cases from distro users
> who want to use kexec_file_load() as the default interface for
> kexec/kdump on arm64.

We could pick it up if it doesn't conflict with the arm-soc tree. They
tend to pick most of the defconfig changes these days (and could as well
pick this one).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64: mm: Remove MAX_USER_VA_BITS definition

2019-11-06 Thread Catalin Marinas
On Tue, Nov 05, 2019 at 03:26:46AM +0530, Bhupesh Sharma wrote:
> commit 9b31cf493ffa ("arm64: mm: Introduce MAX_USER_VA_BITS definition")
> introduced the MAX_USER_VA_BITS definition, which was used to support
> the arm64 mm use-cases where the user-space could use 52-bit virtual
> addresses whereas the kernel-space would still could a maximum of 48-bit
> virtual addressing.
> 
> But, now with commit b6d00d47e81a ("arm64: mm: Introduce 52-bit Kernel
> VAs"), we removed the 52-bit user/48-bit kernel kconfig option and hence
> there is no longer any scenario where user VA != kernel VA size
> (even with CONFIG_ARM64_FORCE_52BIT enabled, the same is true).
> 
> Hence we can do away with the MAX_USER_VA_BITS macro as it is equal to
> VA_BITS (maximum VA space size) in all possible use-cases. Note that
> even though the 'vabits_actual' value would be 48 for arm64 hardware
> which don't support LVA-8.2 extension (even when CONFIG_ARM64_VA_BITS_52
> is enabled), VA_BITS would still be set to a value 52. Hence this change
> would be safe in all possible VA address space combinations.
> 
> Cc: James Morse 
> Cc: Mark Rutland 
> Cc: Will Deacon 
> Cc: Steve Capper 
> Cc: Catalin Marinas 
> Cc: Ard Biesheuvel 
> Cc: linux-ker...@vger.kernel.org
> Cc: kexec@lists.infradead.org
> Signed-off-by: Bhupesh Sharma 

Queued for 5.5. Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 00/12] 52-bit kernel + user VAs

2019-06-10 Thread Catalin Marinas
On Mon, Jun 10, 2019 at 04:10:50PM +0530, Bhupesh Sharma wrote:
> On 05/28/2019 09:40 PM, Steve Capper wrote:
> >   2) How can this memory map be advertised to kdump tools/documentation?
> >  I was planning on getting the kernel VA structure agreed on, then I
> >  would add the relevant exports/documentation.
> 
> Indeed, in the absence of corresponding changes to the Documentation
> section, it is hard to visualize the changes being made in the memory
> map.

We used to have some better documentation in the arm64 memory.txt until
commit 08375198b010 ("arm64: Determine the vmalloc/vmemmap space at
build time based on VA_BITS") which removed it in favour of what the
kernel was printing. Subsequently, the kernel VA layout printing was
also removed. It would be nice to bring back the memory.txt, even if it
is for a single configuration as per defconfig.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] arm64: Fix power-of-ten vs. power-of-two prefixes in comments etc.

2017-04-18 Thread Catalin Marinas
On Mon, Apr 17, 2017 at 01:54:49PM +0200, Geert Uytterhoeven wrote:
> On Mon, Apr 17, 2017 at 1:12 AM, Simon Horman <ho...@verge.net.au> wrote:
> > On Thu, Apr 13, 2017 at 01:18:58PM +0100, David Woodhouse wrote:
> >> Less important than in user-visible messages, but still good practice as
> >> there's still no excuse for ARM64 code to look like it was written before
> >> 1996.
> >>
> >> Signed-off-by: David Woodhouse <d...@amazon.co.uk>
> >
> > Hi David,
> >
> > I'd be happy to take the Renesas portions of this change if they were
> > broken out into a separate patch. The reason I would prefer a separate
> > patch is to avoid the possibility of conflicts, even trivial ones.
> 
> s/possibility/reality/, unless this series goes in before v4.12-rc1.

Maybe the arm-soc guys can take both patches on top of the other series
they pull, in which case, for both patches:

Acked-by: Catalin Marinas <catalin.mari...@arm.com>

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v35 02/14] memblock: add memblock_cap_memory_range()

2017-04-05 Thread Catalin Marinas
On Mon, Apr 03, 2017 at 11:23:55AM +0900, AKASHI Takahiro wrote:
> Add memblock_cap_memory_range() which will remove all the memblock regions
> except the memory range specified in the arguments. In addition, rework is
> done on memblock_mem_limit_remove_map() to re-implement it using
> memblock_cap_memory_range().
> 
> This function, like memblock_mem_limit_remove_map(), will not remove
> memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
> later as "device memory."
> See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
> address the mem limit issue").
> 
> This function is used, in a succeeding patch in the series of arm64 kdump
> suuport, to limit the range of usable memory, or System RAM, on crash dump
> kernel.
> (Please note that "mem=" parameter is of little use for this purpose.)
> 
> Signed-off-by: AKASHI Takahiro <takahiro.aka...@linaro.org>
> Reviewed-by: Will Deacon <will.dea...@arm.com>
> Acked-by: Catalin Marinas <catalin.mari...@arm.com>
> Acked-by: Dennis Chen <dennis.c...@arm.com>
> Cc: linux...@kvack.org
> Cc: Andrew Morton <a...@linux-foundation.org>
> Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c| 44 +---
>  2 files changed, 30 insertions(+), 15 deletions(-)

Andrew, are you ok with patches 1 and 2 in this series (touching
mm/memblock.c and include/linux/memblock.h) to go in via the arm64 tree?

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v28 4/9] arm64: kdump: implement machine_crash_shutdown()

2016-11-28 Thread Catalin Marinas
On Thu, Nov 24, 2016 at 06:58:05PM +0900, AKASHI Takahiro wrote:
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 04744dc..b5168e8 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -40,7 +40,47 @@
>  static inline void crash_setup_regs(struct pt_regs *newregs,
>   struct pt_regs *oldregs)
>  {
> - /* Empty routine needed to avoid build errors. */
> + if (oldregs) {
> + memcpy(newregs, oldregs, sizeof(*newregs));
> + } else {
> + u64 tmp1, tmp2;
> +
> + __asm__ __volatile__ (
> + "stp x0,   x1, [%2, #16 *  0]\n"
> + "stp x2,   x3, [%2, #16 *  1]\n"
> + "stp x4,   x5, [%2, #16 *  2]\n"
> + "stp x6,   x7, [%2, #16 *  3]\n"
> + "stp x8,   x9, [%2, #16 *  4]\n"
> + "stpx10,  x11, [%2, #16 *  5]\n"
> + "stpx12,  x13, [%2, #16 *  6]\n"
> + "stpx14,  x15, [%2, #16 *  7]\n"
> + "stpx16,  x17, [%2, #16 *  8]\n"
> + "stpx18,  x19, [%2, #16 *  9]\n"
> + "stpx20,  x21, [%2, #16 * 10]\n"
> + "stpx22,  x23, [%2, #16 * 11]\n"
> + "stpx24,  x25, [%2, #16 * 12]\n"
> + "stpx26,  x27, [%2, #16 * 13]\n"
> + "stpx28,  x29, [%2, #16 * 14]\n"
> + "mov %0,  sp\n"
> + "stpx30,  %0,  [%2, #16 * 15]\n"
> +
> + "/* faked current PSTATE */\n"
> + "mrs %0, CurrentEL\n"
> + "mrs %1, SPSEL\n"
> + "orr %0, %0, %1\n"
> + "mrs %1, DAIF\n"
> + "orr %0, %0, %1\n"
> + "mrs %1, NZCV\n"
> + "orr %0, %0, %1\n"
> + /* pc */
> + "adr %1, 1f\n"
> + "1:\n"
> + "stp %1, %0,   [%2, #16 * 16]\n"
> + : "+r" (tmp1), "+r" (tmp2)
> + : "r" (newregs)
> + : "memory"

tmp1 and tmp2 are not input arguments here, so you should use the "=&"
modifier. With my compiler, I get warnings of these variables being used
uninitialised.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v28 0/9] arm64: add kdump support

2016-11-25 Thread Catalin Marinas
On Thu, Nov 24, 2016 at 06:55:23PM +0900, AKASHI Takahiro wrote:
> AKASHI Takahiro (8):
>   arm64: kdump: reserve memory for crash dump kernel
>   memblock: add memblock_cap_memory_range()
>   arm64: limit memory regions based on DT property, usable-memory-range
>   arm64: kdump: implement machine_crash_shutdown()
>   arm64: kdump: add kdump support
>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>   arm64: kdump: enable kdump in the arm64 defconfig
>   arm64: kdump: update a kernel doc

Given that it's nearly -rc7, holiday in the US and more acks needed, we
should defer merging this series to 4.11. In the meantime, since Will is
going to handle the 4.11 merging window, I'm acking the whole series:

Acked-by: Catalin Marinas <catalin.mari...@arm.com>

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v28 9/9] Documentation: dt: chosen properties for arm64 kdump

2016-11-24 Thread Catalin Marinas
On Thu, Nov 24, 2016 at 06:59:44PM +0900, AKASHI Takahiro wrote:
> From: James Morse 
> 
> Add documentation for
>   linux,crashkernel-base and crashkernel-size,
>   linux,usable-memory-range
>   linux,elfcorehdr
> used by arm64 kdump to decribe the kdump reserved area, and
> the elfcorehdr's location within it.
> 
> Signed-off-by: James Morse 
> [takahiro.aka...@linaro.org: added "linux,crashkernel-base" and "-size" ]
> Signed-off-by: AKASHI Takahiro 
> Cc: devicet...@vger.kernel.org
> Cc: Rob Herring 
> Cc: Mark Rutland 

Rob, Mark, are you ok with this patch?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v28 1/9] memblock: add memblock_cap_memory_range()

2016-11-24 Thread Catalin Marinas
Hi Andrew,

On Thu, Nov 24, 2016 at 06:57:17PM +0900, AKASHI Takahiro wrote:
> Add memblock_cap_memory_range() which will remove all the memblock regions
> except the memory range specified in the arguments. In addition, rework is
> done on memblock_mem_limit_remove_map() to re-implement it using
> memblock_cap_memory_range().
> 
> This function, like memblock_mem_limit_remove_map(), will not remove
> memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
> later as "device memory."
> See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
> address the mem limit issue").
> 
> This function is used, in a succeeding patch in the series of arm64 kdump
> suuport, to limit the range of usable memory, or System RAM, on crash dump
> kernel.
> (Please note that "mem=" parameter is of little use for this purpose.)
> 
> Signed-off-by: AKASHI Takahiro 
> Reviewed-by: Will Deacon 
> Cc: linux...@kvack.org
> Cc: Andrew Morton 
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c| 44 +---
>  2 files changed, 30 insertions(+), 15 deletions(-)

Are you OK with this patch to go in via the arm64 tree (together with
the other patches in this series)?

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2] arm64: Add support for additional relocations in the kexec purgatory code

2016-10-20 Thread Catalin Marinas
When compiling the kexec-tools with gcc6, the following additional
reolcations are generated in the purgatory.ro file:

R_AARCH64_ADR_PREL_PG_HI21
R_AARCH64_ADD_ABS_LO12_NC
R_AARCH64_LDST64_ABS_LO12_NC

This patch modifies the arm64 machine_apply_elf_rel() function to handle
these relocations.

Cc: Geoff Levand <ge...@infradead.org>
Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>
---

Changes for v2:
- Fixed the type string to drop the "R_AARCH64_" prefix

 kexec/arch/arm64/kexec-arm64.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index 2e8839a..e067a23 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -550,6 +550,14 @@ void machine_apply_elf_rel(struct mem_ehdr *ehdr, struct 
mem_sym *UNUSED(sym),
 # define R_AARCH64_ADR_PREL_LO21 274
 #endif
 
+#if !defined(R_AARCH64_ADR_PREL_PG_HI21)
+# define R_AARCH64_ADR_PREL_PG_HI21 275
+#endif
+
+#if !defined(R_AARCH64_ADD_ABS_LO12_NC)
+# define R_AARCH64_ADD_ABS_LO12_NC 277
+#endif
+
 #if !defined(R_AARCH64_JUMP26)
 # define R_AARCH64_JUMP26 282
 #endif
@@ -558,10 +566,15 @@ void machine_apply_elf_rel(struct mem_ehdr *ehdr, struct 
mem_sym *UNUSED(sym),
 # define R_AARCH64_CALL26 283
 #endif
 
+#if !defined(R_AARCH64_LDST64_ABS_LO12_NC)
+# define R_AARCH64_LDST64_ABS_LO12_NC 286
+#endif
+
uint64_t *loc64;
uint32_t *loc32;
uint64_t *location = (uint64_t *)ptr;
uint64_t data = *location;
+   uint64_t imm;
const char *type = NULL;
 
switch(r_type) {
@@ -585,6 +598,19 @@ void machine_apply_elf_rel(struct mem_ehdr *ehdr, struct 
mem_sym *UNUSED(sym),
*loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ (((value - address) << 3) & 0xe0));
break;
+   case R_AARCH64_ADR_PREL_PG_HI21:
+   type = "ADR_PREL_PG_HI21";
+   imm = ((value & ~0xfff) - (address & ~0xfff)) >> 12;
+   loc32 = ptr;
+   *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+   + ((imm & 3) << 29) + ((imm & 0x1c) << (5 - 2)));
+   break;
+   case R_AARCH64_ADD_ABS_LO12_NC:
+   type = "ADD_ABS_LO12_NC";
+   loc32 = ptr;
+   *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+   + ((value & 0xfff) << 10));
+   break;
case R_AARCH64_JUMP26:
type = "JUMP26";
loc32 = ptr;
@@ -597,6 +623,15 @@ void machine_apply_elf_rel(struct mem_ehdr *ehdr, struct 
mem_sym *UNUSED(sym),
*loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ (((value - address) >> 2) & 0x3ff));
break;
+   case R_AARCH64_LDST64_ABS_LO12_NC:
+   if (value & 7)
+   die("%s: ERROR Unaligned value: %lx\n", __func__,
+   value);
+   type = "LDST64_ABS_LO12_NC";
+   loc32 = ptr;
+   *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+   + ((value & 0xff8) << (10 - 3)));
+   break;
default:
die("%s: ERROR Unknown type: %lu\n", __func__, r_type);
break;
-- 
2.10.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64: Add support for additional relocations in the kexec purgatory code

2016-10-20 Thread Catalin Marinas
On Wed, Oct 19, 2016 at 03:52:30PM -0700, Geoff Levand wrote:
> Hi Catalin,
> 
> On 10/19/2016 08:58 AM, Catalin Marinas wrote:
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index 2e8839a..e067a23 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -585,6 +598,19 @@ void machine_apply_elf_rel(struct mem_ehdr *ehdr, 
> > struct mem_sym *UNUSED(sym),
> > *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> > + (((value - address) << 3) & 0xe0));
> > break;
> > +   case R_AARCH64_ADR_PREL_PG_HI21:
> > +   type = "ADR_PREL_PG_HI21";
> > +   imm = ((value & ~0xfff) - (address & ~0xfff)) >> 12;
> > +   loc32 = ptr;
> > +   *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> > +   + ((imm & 3) << 29) + ((imm & 0x1c) << (5 - 2)));
> > +   break;
> > +   case R_AARCH64_ADD_ABS_LO12_NC:
> > +   type = "R_AARCH64_ADD_ABS_LO12_NC";
> 
> Following with the others, this should be 'type = "ADD_ABS_LO12_NC"'.

Ah, I missed this detail. I'll post a v2 shortly.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Query about ARM64 virt_to_phys and vice versa implementation

2016-08-10 Thread Catalin Marinas
On Wed, Aug 10, 2016 at 01:58:08PM +0530, Pratyush Anand wrote:
> Would like to discuss virt_to_phys() and phys_to_virt() conversion in latest
> kernel. I can see that if VA < PAGE_OFFSET, then we just subtract 
> kimage_voffset
> from VA and we get the PA. However, there is no such condition for
> phys_to_virt(). phys_to_virt is always done by (PA - PHYS_OFFSET + 
> PAGE_OFFSET).
> So, how does phys_to_virt() work for all the cases? I must be missing 
> something.

Because we recently relaxed where the kernel can be loaded in RAM, we
decoupled the actual RAM linear mapping from the kernel image virtual
address. The latter, when KASLR is disabled, is fixed but the
corresponding PA is not. So for various reasons, we need to map retrieve
the PA of a kernel image address/symbol, hence the virt_to_phys() needs
to take kimage_offset into account. The corresponding phys_to_virt()
would return an address in the linear mapping rather than the kernel
image VA but we didn't find a reason where we need phys_to_virt() to
return the latter.

> -- I am using a platform with VA bits = 42.
>Therefore PAGE_OFFSET on my platform is 0xfe00
>PHYS_OFFSET is 0x80
>kimage_voffset is 0xfb800600
> 
> Now lets find physical address of log_buf:
> # cat /proc/kallsyms | grep -w "d log_buf"
> fc0008ca6b40 d log_buf
> 
> Since VA < PAGE_OFFSET (fc0008ca6b40 < fe00)
> Therefore as per definition of virt_to_phys() from asm/memory.h:
> PA = VA - kimage_voffset = fc0008ca6b40 - fb800600 = 8002CA6B40
> 
> Now convert PA=8002CA6B40 to VA.
> VA = PA - PHYS_OFFSET + PAGE_OFFSET = 8002CA6B40 - 80 + 
> fe00
> = fe0002CA6B40 which not equal to fc0008ca6b40.
> 
> So, do we have dual mapping for all physical addresses? If not, how does it
> work? What I am missing. 

Yes, we have dual mapping for the memory containing the kernel image. I
think the kernel memory layout print during boot should give you an idea
of how things look like from a VA perspective.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 00/14] arm64 kexec kernel patches

2016-06-29 Thread Catalin Marinas
On Wed, Jun 29, 2016 at 09:54:15AM +0900, AKASHI Takahiro wrote:
> On Mon, Jun 27, 2016 at 06:00:39PM +0100, Catalin Marinas wrote:
> > On Thu, Jun 23, 2016 at 05:54:47PM +, Geoff Levand wrote:
> > > AKASHI Takahiro (7):
> > >   arm64: kdump: reserve memory for crash dump kernel
> > >   arm64: limit memory regions based on DT property, usable-memory
> > >   arm64: kdump: implement machine_crash_shutdown()
> > >   arm64: kdump: add kdump support
> > >   arm64: kdump: add VMCOREINFO's for user-space coredump tools
> > >   arm64: kdump: enable kdump in the arm64 defconfig
> > >   arm64: kdump: update a kernel doc
> > > 
> > > Geoff Levand (4):
> > >   arm64: Add back cpu reset routines
> > >   arm64/kexec: Add core kexec support
> > >   arm64/kexec: Enable kexec in the arm64 defconfig
> > >   arm64/kexec: Add pr_debug output
> > > 
> > > James Morse (3):
> > >   arm64: hibernate: Don't hibernate on systems with stuck CPUs
> > >   arm64: smp: Add function to determine if cpus are stuck in the 
> > > kernel
> > >   Documentation: dt: usable-memory and elfcorehdr nodes for arm64 
> > > kexec
> > 
> > I dropped patch 1 (hibernate), cherry-picked patch 2 from mainline
> > ("cpus stuck in the kernel", already pushed by Will to 4.7-rc5) and
> > merged patches 3-6, with the amendments that James mentioned.
> > 
> > The kdump patches, including the "Documentation" one from James require
> > more review, testing and acks by the corresponding maintainers (kdump,
> > DT).
> 
> I see what you mean, but even for Geoff's kexec part, the maintainer
> (Eric) have not given his ack.

I just assumed he won't object ;). The only generic change is the
KEXEC_ARCH_AARCH64 in the uapi/linux/kexec.h file and the value chosen
matches EM_AARCH64.

The kdump patches make such changes to Documentation and I would like
those acked by the kdump and DT maintainers.

Pratyush Anand also had comments on the kdump patches in v19, so it's
fair to wait for him to complete the reviewing of the latest series.

> > You also don't seem to have added a s-o-b for those patches (since
> > you are submitting them; unless you plan to let Akashi-san submit them
> > in the future).
> 
> Basically Geoff has not reviewed nor tested kdump part, and
> so I'm not sure that adding his s-o-b is appropriate.

It is appropriate. See the "Developer's Certificate of Origin" in
Documentation/SubmittingPatches. A s-o-b from Geoff doesn't necessarily
mean that he reviewed the patches but that he has the right to pass them
on under the open source license indicated in the file (and AFAICT it is
legally binding but IANAL).

> Anyhow, I will post my future version of kdump by myself as kexec has
> been queued in for-next/core.

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 00/14] arm64 kexec kernel patches

2016-06-27 Thread Catalin Marinas
Hi Geoff,

On Thu, Jun 23, 2016 at 05:54:47PM +, Geoff Levand wrote:
> AKASHI Takahiro (7):
>   arm64: kdump: reserve memory for crash dump kernel
>   arm64: limit memory regions based on DT property, usable-memory
>   arm64: kdump: implement machine_crash_shutdown()
>   arm64: kdump: add kdump support
>   arm64: kdump: add VMCOREINFO's for user-space coredump tools
>   arm64: kdump: enable kdump in the arm64 defconfig
>   arm64: kdump: update a kernel doc
> 
> Geoff Levand (4):
>   arm64: Add back cpu reset routines
>   arm64/kexec: Add core kexec support
>   arm64/kexec: Enable kexec in the arm64 defconfig
>   arm64/kexec: Add pr_debug output
> 
> James Morse (3):
>   arm64: hibernate: Don't hibernate on systems with stuck CPUs
>   arm64: smp: Add function to determine if cpus are stuck in the kernel
>   Documentation: dt: usable-memory and elfcorehdr nodes for arm64 kexec

I dropped patch 1 (hibernate), cherry-picked patch 2 from mainline
("cpus stuck in the kernel", already pushed by Will to 4.7-rc5) and
merged patches 3-6, with the amendments that James mentioned.

The kdump patches, including the "Documentation" one from James require
more review, testing and acks by the corresponding maintainers (kdump,
DT). You also don't seem to have added a s-o-b for those patches (since
you are submitting them; unless you plan to let Akashi-san submit them
in the future).

Thanks.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 04/14] arm64/kexec: Add core kexec support

2016-06-27 Thread Catalin Marinas
On Mon, Jun 27, 2016 at 12:18:42PM +0100, James Morse wrote:
> On 23/06/16 18:54, Geoff Levand wrote:
> > "Can't kexec: secondary CPUs are stuck in the kernel.\n"
> 
> > +   return -EBUSY;
> > +   }
> > +
> 
> > +   if (num_online_cpus() > 1) {
> > +   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> > +   /* any_cpu as we don't mind being preempted */
> > +   int any_cpu = raw_smp_processor_id();
> > +
> > +   if (cpu_ops[any_cpu]->cpu_die)
> > +   return 0;
> > +   }
> > +
> > +   pr_err("Can't kexec: no mechanism to offline secondary 
> > CPUs.\n");
> > +   return -EBUSY;
> > +   }
> 
> This if() {} hunk isn't necessary with the version of 
> cpus_are_stuck_in_kernel()
> you have in patch 2. This logic is the '|| smp_spin_tables' part of that 
> helper
> function. (hibernate needed it too)

I can make the changes locally but just to be clear I understand what
you meant, here's the diff on top of Geoff's patch:

8<-

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 945ce326654c..bc96c8a7fc79 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -68,24 +68,9 @@ int machine_kexec_prepare(struct kimage *kimage)
 
kexec_image_info(kimage);
 
-   if (kimage->type != KEXEC_TYPE_CRASH) {
-   if (cpus_are_stuck_in_kernel()) {
-   pr_err("Can't kexec: failed CPUs are stuck in the 
kernel.\n");
-   return -EBUSY;
-   }
-
-   if (num_online_cpus() > 1) {
-   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
-   /* any_cpu as we don't mind being preempted */
-   int any_cpu = raw_smp_processor_id();
-
-   if (cpu_ops[any_cpu]->cpu_die)
-   return 0;
-   }
-
-   pr_err("Can't kexec: no mechanism to offline secondary 
CPUs.\n");
-   return -EBUSY;
-   }
+   if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
+   pr_err("Can't kexec: CPUs are stuck in the kernel.\n");
+   return -EBUSY;
}
 
return 0;
@@ -163,7 +148,7 @@ void machine_kexec(struct kimage *kimage)
/*
 * New cpus may have become stuck_in_kernel after we loaded the image.
 */
-   BUG_ON(cpus_are_stuck_in_kernel() && (num_online_cpus() > 1));
+   BUG_ON(cpus_are_stuck_in_kernel() || (num_online_cpus() > 1));
 
reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/8] arm64: Convert hcalls to use ISS field

2015-02-02 Thread Catalin Marinas
On Fri, Jan 30, 2015 at 11:31:21PM +, Geoff Levand wrote:
 On Mon, 2015-01-26 at 18:26 +, Catalin Marinas wrote:
  On Sat, Jan 17, 2015 at 12:23:34AM +, Geoff Levand wrote:
/*
   diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
   index a272f33..e3db3fd 100644
   --- a/arch/arm64/kernel/hyp-stub.S
   +++ b/arch/arm64/kernel/hyp-stub.S
   @@ -22,6 +22,7 @@
#include linux/irqchip/arm-gic-v3.h

#include asm/assembler.h
   +#include asm/kvm_arm.h
#include asm/ptrace.h
#include asm/virt.h

   @@ -53,14 +54,22 @@ ENDPROC(__hyp_stub_vectors)
 .align 11

el1_sync:
   - mrs x1, esr_el2
   - lsr x1, x1, #26
   - cmp x1, #0x16
   - b.ne2f  // Not an HVC trap
   - cbz x0, 1f
   - msr vbar_el2, x0// Set vbar_el2
   + mrs x18, esr_el2
   + lsr x17, x18, #ESR_ELx_EC_SHIFT
   + and x18, x18, #ESR_ELx_ISS_MASK
   +
   + cmp x17, #ESR_ELx_EC_HVC64
   + b.ne2f  // Not an HVC trap
   +
   + cmp x18, #HVC_GET_VECTORS
   + b.ne1f
   + mrs x0, vbar_el2
 b   2f
   -1:   mrs x0, vbar_el2// Return vbar_el2
   +
   +1:   cmp x18, #HVC_SET_VECTORS
   + b.ne2f
   + msr vbar_el2, x0
   +
2:   eret
ENDPROC(el1_sync)
  
  You seem to be using x17 and x18 here freely. Do you have any guarantees
  that the caller saved/restored those registers? I guess you assume they
  are temporary registers and the caller first branches to a function
  (like __kvm_hyp_call) and expects them to be corrupted. But I'm not sure
  that's always the case. Take for example the __invoke_psci_fn_hvc where
  the function is in C (we should change this for other reasons).
 
 Yes, I assume the compiler will not expect them to be preserved.  I
 missed __invoke_psci_fn_hvc.  Can we just add x17 and x18 to the
 clobbered list?
 
 asm volatile(
 __asmeq(%0, x0)
 __asmeq(%1, x1)
 __asmeq(%2, x2)
 __asmeq(%3, x3)
 hvc#0\n
 : +r (function_id)
 -   : r (arg0), r (arg1), r (arg2));
 +   : r (arg0), r (arg1), r (arg2)
 +   : x17, x18);

I think we can ignore these because they would be called from a guest
context and IIUC we would only clobber x18 on the host HVC side.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/8] arm64: Add new hcall HVC_CALL_FUNC

2015-01-27 Thread Catalin Marinas
On Sat, Jan 17, 2015 at 12:23:34AM +, Geoff Levand wrote:
 diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
 index 99c319c..4f23a48 100644
 --- a/arch/arm64/include/asm/virt.h
 +++ b/arch/arm64/include/asm/virt.h
 @@ -41,6 +41,19 @@
  
  #define HVC_CALL_HYP 3
  
 +/*
 + * HVC_CALL_FUNC - Execute a function at EL2.
 + *
 + * @x0: Physical address of the function to be executed.
 + * @x1: Passed as the first argument to the function.
 + * @x2: Passed as the second argument to the function.
 + * @x3: Passed as the third argument to the function.
 + *
 + * The called function must preserve the contents of register x18.

Can you pick a register that's normally callee saved?

 + */
 +
 +#define HVC_CALL_FUNC 4
 +
  #ifndef __ASSEMBLY__
  
  /*
 diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
 index e3db3fd..b5d36e7 100644
 --- a/arch/arm64/kernel/hyp-stub.S
 +++ b/arch/arm64/kernel/hyp-stub.S
 @@ -66,9 +66,20 @@ el1_sync:
   mrs x0, vbar_el2
   b   2f
  
 -1:   cmp x18, #HVC_SET_VECTORS
 - b.ne2f
 - msr vbar_el2, x0
 +1:   cmp x18, #HVC_SET_VECTORS

This line doesn't seem to have any change, apart from some whitespace.
Or did you want to drop the label?

 + b.ne1f
 + msr vbar_el2, x0
 + b   2f
 +
 +1:   cmp x18, #HVC_CALL_FUNC
 + b.ne2f
 + mov x18, lr
 + mov lr, x0
 + mov x0, x1
 + mov x1, x2
 + mov x2, x3
 + blr lr
 + mov lr, x18
  
  2:   eret
  ENDPROC(el1_sync)

What is the calling convention for this HVC? You mentioned x18 above but
what about other registers that the called function may corrupt (x18 is
a temporary register, so it's not expected to be callee saved).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/8] arm64: Convert hcalls to use ISS field

2015-01-26 Thread Catalin Marinas
On Sat, Jan 17, 2015 at 12:23:34AM +, Geoff Levand wrote:
 To allow for additional hcalls to be defined and to make the arm64 hcall API
 more consistent across exception vector routines, change the hcall 
 implementations
 to use the ISS field of the ESR_EL2 register to specify the hcall type.
 
 The existing arm64 hcall implementations are limited in that they only allow
 for two distinct hcalls; with the x0 register either zero, or not zero.  Also,
 the API of the hyp-stub exception vector routines and the KVM exception vector
 routines differ; hyp-stub uses a non-zero value in x0 to implement
 __hyp_set_vectors, whereas KVM uses it to implement kvm_call_hyp.
 
 Define three new preprocessor macros HVC_GET_VECTORS, HVC_SET_VECTORS and
 HVC_CALL_HYP and to be used as hcall type specifiers and convert the
 existing __hyp_get_vectors(), __hyp_set_vectors() and kvm_call_hyp() routines
 to use these new macros when executing an HVC call.  Also change the
 corresponding hyp-stub and KVM el1_sync exception vector routines to use these
 new macros.
 
 Signed-off-by: Geoff Levand ge...@infradead.org

Using the #imm value for HVC to separate what gets called looks fine to
me. However, I'd like to see a review from Marc/Christoffer on this
patch.

Some comments below:

 diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
 index 7a5df52..99c319c 100644
 --- a/arch/arm64/include/asm/virt.h
 +++ b/arch/arm64/include/asm/virt.h
 @@ -21,6 +21,26 @@
  #define BOOT_CPU_MODE_EL1(0xe11)
  #define BOOT_CPU_MODE_EL2(0xe12)
  
 +/*
 + * HVC_GET_VECTORS - Return the value of the vbar_el2 register.
 + */
 +
 +#define HVC_GET_VECTORS 1
 +
 +/*
 + * HVC_SET_VECTORS - Set the value of the vbar_el2 register.
 + *
 + * @x0: Physical address of the new vector table.
 + */
 +
 +#define HVC_SET_VECTORS 2
 +
 +/*
 + * HVC_CALL_HYP - Execute a hyp routine.
 + */
 +
 +#define HVC_CALL_HYP 3

I think you can ignore this case (make it the default), just define it
as 0 as that's the normal use-case after initialisation and avoid
checking it explicitly.

  /*
 diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
 index a272f33..e3db3fd 100644
 --- a/arch/arm64/kernel/hyp-stub.S
 +++ b/arch/arm64/kernel/hyp-stub.S
 @@ -22,6 +22,7 @@
  #include linux/irqchip/arm-gic-v3.h
  
  #include asm/assembler.h
 +#include asm/kvm_arm.h
  #include asm/ptrace.h
  #include asm/virt.h
  
 @@ -53,14 +54,22 @@ ENDPROC(__hyp_stub_vectors)
   .align 11
  
  el1_sync:
 - mrs x1, esr_el2
 - lsr x1, x1, #26
 - cmp x1, #0x16
 - b.ne2f  // Not an HVC trap
 - cbz x0, 1f
 - msr vbar_el2, x0// Set vbar_el2
 + mrs x18, esr_el2
 + lsr x17, x18, #ESR_ELx_EC_SHIFT
 + and x18, x18, #ESR_ELx_ISS_MASK
 +
 + cmp x17, #ESR_ELx_EC_HVC64
 + b.ne2f  // Not an HVC trap
 +
 + cmp x18, #HVC_GET_VECTORS
 + b.ne1f
 + mrs x0, vbar_el2
   b   2f
 -1:   mrs x0, vbar_el2// Return vbar_el2
 +
 +1:   cmp x18, #HVC_SET_VECTORS
 + b.ne2f
 + msr vbar_el2, x0
 +
  2:   eret
  ENDPROC(el1_sync)

You seem to be using x17 and x18 here freely. Do you have any guarantees
that the caller saved/restored those registers? I guess you assume they
are temporary registers and the caller first branches to a function
(like __kvm_hyp_call) and expects them to be corrupted. But I'm not sure
that's always the case. Take for example the __invoke_psci_fn_hvc where
the function is in C (we should change this for other reasons).

 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index c0d8202..1916c89 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -27,6 +27,7 @@
  #include asm/kvm_asm.h
  #include asm/kvm_mmu.h
  #include asm/memory.h
 +#include asm/virt.h
  
  #define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x)
  #define CPU_XREG_OFFSET(x)   CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
 @@ -1106,12 +1107,9 @@ __hyp_panic_str:
   * in Hyp mode (see init_hyp_mode in arch/arm/kvm/arm.c).  Return values are
   * passed in r0 and r1.
   *
 - * A function pointer with a value of 0 has a special meaning, and is
 - * used to implement __hyp_get_vectors in the same way as in
 - * arch/arm64/kernel/hyp_stub.S.
   */
  ENTRY(kvm_call_hyp)
 - hvc #0
 + hvc #HVC_CALL_HYP
   ret
  ENDPROC(kvm_call_hyp)
  
 @@ -1142,6 +1140,7 @@ el1_sync:   // 
 Guest trapped into EL2
  
   mrs x1, esr_el2
   lsr x2, x1, #ESR_ELx_EC_SHIFT
 + and x0, x1, #ESR_ELx_ISS_MASK
  
   cmp x2, #ESR_ELx_EC_HVC64
   b.neel1_trap
 @@ -1150,15 +1149,19 @@ el1_sync: // 
 Guest trapped into EL2
   cbnzx3, el1_trap// called HVC
  
   /* Here, we're pretty sure the host called HVC. */
 +

Re: [PATCH 0/8] arm64 kexec kernel patches V7

2015-01-26 Thread Catalin Marinas
Hi Geoff,

On Sat, Jan 17, 2015 at 12:23:34AM +, Geoff Levand wrote:
 This series adds the core support for kexec re-boots on arm64.  This v7 of the
 series is mainly just a rebase to the latest arm64 for-next/core branch
 (v3.19-rc4), and a few very minor changes requested for v6.

I haven't looked at the series in detail before, so some of my comments
may have already been discussed.

 Several things are known to have problems on kexec re-boot:
 
 spin-table

I think that's not too bad, for complete kexec support (SMP-SMP) we can
require some CPU unplug mechanism and PSCI is one of them.

 FIX: Upgrade system firmware to provide PSCI enable method support or add
 missing spin-table support to the kernel.

What's the missing spin-table support?

 ACPI
 
 
 PROBLEM: The kernel for ACPI based systems does not export a device tree to 
 the
 standard user space location of 'proc/device-tree'.  Current applications
 expect to access device tree information from this standard location.
 
 WORK-AROUND: Disable ACPI in firmware, OR pass 'acpi=off' on the first stage
 kernel command line, OR pass a user specified DTB using the kexec --dtb 
 option.
 
 FIX: FIX: An interface to expose a binary device tree to user space has been
 proposed.  User kexec utilities will need to be updated to add support for 
 this
 new interface.

So the fix here is to boot the second stage kernel with dtb, which means
that we mandate the existence of a DT file for any ACPI system. Are
there plans to make the kexec'ed kernel reuse the ACPI tables?

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/8] arm64: Move proc-macros.S to include/asm

2015-01-26 Thread Catalin Marinas
On Sat, Jan 17, 2015 at 12:23:34AM +, Geoff Levand wrote:
 To allow the assembler macros defined in proc-macros.S to be used outside
 the mm code move the proc-macros.S file from arch/arm64/mm/ to
 arch/arm64/include/asm/ and fix up any preprocessor includes to use the new
 file location.
 
 Signed-off-by: Geoff Levand ge...@infradead.org
 ---
  arch/arm64/include/asm/proc-macros.S | 54 
 
  arch/arm64/mm/cache.S|  3 +-
  arch/arm64/mm/proc-macros.S  | 54 
 
  arch/arm64/mm/proc.S |  3 +-

Actually, I would just merge proc-macros.S into assembler.h. Not wirth
keeping the former just for a few macros.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 7/8] arm64/kexec: Add core kexec support

2014-05-14 Thread Catalin Marinas
On Fri, May 09, 2014 at 01:48:17AM +0100, Geoff Levand wrote:
 +KEXEC FOR ARM64
 +M: Geoff Levand ge...@infradead.org
 +W: http://kernel.org/pub/linux/utils/kernel/kexec/
 +L: kexec@lists.infradead.org
 +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 +S: Maintained
 +F: arch/arm64/machine_kexec.c
 +F: arch/arm64/relocate_kernel.S

These entries missed the full directory name.

Anyway, this code already comes under the core arm64 MAINTAINERS entry
and it doesn't make sense to have a special kexec case. Please add a
proper header to the new files you introduce, including copyright and
author information.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


  1   2   >