Re: [PATCH v2 3/5] add slurp_proc_file()

2021-12-15 Thread Sven Schnelle
Philipp Rudo  writes:

> On Wed, 15 Dec 2021 11:18:34 +0100
> Sven Schnelle  wrote:
>
>> slurp_file() cannot be used to read proc files, as they are returning
>> a size of zero in stat(). Add a function slurp_proc_file() which is
>> similar to slurp_file(), but doesn't require the size of the file to
>> be known.
>> 
>> Signed-off-by: Sven Schnelle 
>> ---
>>  kexec/kexec.c | 32 
>>  1 file changed, 32 insertions(+)
>> 
>> diff --git a/kexec/kexec.c b/kexec/kexec.c
>> index f63b36b771eb..a1acba2adf2a 100644
>> --- a/kexec/kexec.c
>> +++ b/kexec/kexec.c
>> @@ -1106,6 +1106,38 @@ static void remove_parameter(char *line, const char 
>> *param_name)
>>  }
>>  }
>>  
>> +static char *slurp_proc_file(const char *filename, size_t *len)
>> +{
>> +ssize_t ret, startpos = 0;
>> +unsigned int size = 64;
>> +char *buf = NULL, *tmp;
>> +int fd;
>> +
>> +fd = open(filename, O_RDONLY);
>> +if (fd == -1)
>> +return NULL;
>> +
>> +do {
>> +size *= 2;
>> +tmp = realloc(buf, size);
>> +if (!tmp) {
>> +free(buf);
>> +return NULL;
>> +}
>> +buf = tmp;
>> +
>> +ret = read(fd, buf + startpos, size - startpos);
>> +if (ret < 0) {
>> +free(buf);
>> +return NULL;
>> +}
>> +startpos += ret;
>> +*len = startpos;
>> +} while (ret == size);
>
> I don't think this will work as intended. 
>
> 1) read returns the bytes read. So ret has a maximum value of
>size - startpos and thus can only be equal to size on the first pass
>when startpos = 0.
>
> 2) it's not an error when read reads less bytes than requested but can
>happen when, e.g. it get's interrupted.
>
> The simplest solution I see is to simply use 'while (ret)' Even when
> that means that there is an extra pass in the loop with an extra
> call to realloc.
>
> The cleaner solution probably would be to put the read into a second
> loop. I.e. the following should work (no tested)
>
>   do {
>   [...]
>   do {
>   ret = read(fd, buf + startpos, size - startpos);
>   if (ret < 0) {
>   free(buf);
>   return NULL;
>   }
>   startpos += ret;
>   while (ret && startpos != size);
>
>   *len = startpos;
>   } while (ret);
>
> Thanks
> Philipp

Sigh. True. I'll prepare a v3...

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] arm64: read VA_BITS from kcore for 52-bits VA kernel

2021-12-15 Thread Pingfan Liu
On Thu, Dec 16, 2021 at 10:46 AM Pingfan Liu  wrote:
>
> On Wed, Dec 15, 2021 at 9:35 PM Philipp Rudo  wrote:
> >
> > Hi Pingfang,
> >
> > On Fri, 10 Dec 2021 11:07:35 +0800
> > Pingfan Liu  wrote:
> >
> > > phys_to_virt() calculates virtual address. As a important factor,
> > > page_offset is excepted to be accurate.
> > >
> > > Since arm64 kernel exposes va_bits through vmcore, using it.
> > >
> > > Signed-off-by: Pingfan Liu 
> > > ---
> > >  kexec/arch/arm64/kexec-arm64.c | 31 +++
> > >  kexec/arch/arm64/kexec-arm64.h |  1 +
> > >  util_lib/elf_info.c|  5 +
> > >  3 files changed, 33 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kexec/arch/arm64/kexec-arm64.c 
> > > b/kexec/arch/arm64/kexec-arm64.c
> > > index bd650e6..ccc92db 100644
> > > --- a/kexec/arch/arm64/kexec-arm64.c
> > > +++ b/kexec/arch/arm64/kexec-arm64.c
> > > @@ -54,7 +54,7 @@
> > >  static bool try_read_phys_offset_from_kcore = false;
> > >
> > >  /* Machine specific details. */
> > > -static int va_bits;
> > > +static int va_bits = -1;
> > >  static unsigned long page_offset;
> > >
> > >  /* Global varables the core kexec routines expect. */
> > > @@ -876,7 +876,15 @@ static inline void set_phys_offset(long v, char 
> > > *set_method)
> > >
> > >  static int get_va_bits(void)
> > >  {
> > > - unsigned long long stext_sym_addr = get_kernel_sym("_stext");
> > > + unsigned long long stext_sym_addr;
> > > +
> > > + /*
> > > +  * if already got from kcore
> > > +  */
> > > + if (va_bits != -1)
> > > + goto out;
> > > +
> > > + stext_sym_addr = get_kernel_sym("_stext");
> > >
> > >   if (stext_sym_addr == 0) {
> > >   fprintf(stderr, "Can't get the symbol of _stext.\n");
> > > @@ -900,6 +908,7 @@ static int get_va_bits(void)
> > >   return -1;
> > >   }
> > >
> > > +out:
> > >   dbgprintf("va_bits : %d\n", va_bits);
> > >
> > >   return 0;
> > > @@ -917,14 +926,27 @@ int get_page_offset(unsigned long *page_offset)
> > >   if (ret < 0)
> > >   return ret;
> > >
> > > - page_offset = (0xUL) << (va_bits - 1);
> > > + if (va_bits < 52)
> > > + *page_offset = (0xUL) << (va_bits - 1);
> > > + else
> > > + *page_offset = (0xUL) << va_bits;
> >
> > wouldn't it make sense to use ULONG_MAX here? At least for me it would
> > be much better readable.
> >
>
> Yes, I tend to agree and will update it in V2 (if there is no need to
> compile it on 32-bits machine, which I consider as a rare case
> nowadays.)
>

I think UINT64_MAX can free of this issue


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/3] arm64/crashdump: unify routine to get page_offset

2021-12-15 Thread Pingfan Liu
On Wed, Dec 15, 2021 at 9:35 PM Philipp Rudo  wrote:
>
> Hi Pinfang,
>
> On Fri, 10 Dec 2021 11:07:34 +0800
> Pingfan Liu  wrote:
>
> > There are two funcs to get page_offset:
> >   get_kernel_page_offset()
> >   get_page_offset()
> >
> > Since get_kernel_page_offset() does not observe the kernel formula, and
> > remove it. Unify them in order to introduce 52-bits VA kernel more
> > easily in the coming patch.
> >
> > Signed-off-by: Pingfan Liu 
> > ---
> >  kexec/arch/arm64/crashdump-arm64.c | 23 +--
> >  kexec/arch/arm64/kexec-arm64.c |  4 ++--
> >  kexec/arch/arm64/kexec-arm64.h |  1 +
> >  3 files changed, 4 insertions(+), 24 deletions(-)
> >
> > diff --git a/kexec/arch/arm64/crashdump-arm64.c 
> > b/kexec/arch/arm64/crashdump-arm64.c
> > index a02019a..0a8d44c 100644
> > --- a/kexec/arch/arm64/crashdump-arm64.c
> > +++ b/kexec/arch/arm64/crashdump-arm64.c
>
> [...]
>
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index 7373fa3..bd650e6 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -909,7 +909,7 @@ static int get_va_bits(void)
> >   * get_page_offset - Helper for getting PAGE_OFFSET
> >   */
> >
> > -static int get_page_offset(void)
> > +int get_page_offset(unsigned long *page_offset)
> >  {
> >   int ret;
> >
>
> The new page_offset is never used in this patch. There's still the line
> left with the global page_offset but that will cause a type miss-match
> (unsigend long vs unsigned log *). In the next patch you do the update
> (page_offset -> *page_offset) but in the meantime the code is broken.
>
I will fix it in V2.

Thanks,

Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

2021-12-15 Thread Leizhen (ThunderTown)



On 2021/12/16 9:10, Baoquan He wrote:
> On 12/15/21 at 02:28pm, Borislav Petkov wrote:
>> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
>>> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>>> }
>>> }
>>>  
>>> -   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>> +   if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>> memblock_phys_free(crash_base, crash_size);
>>> return;
>>> }
>>
>> That's not a equivalent transformation on X86_32.

The original value (1ULL << 32) is inaccurate, and it enlarged the 
CRASH_ADDR_LOW
upper limit. This is because when the memory is allocated from the low end,
the address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. If
the memory is allocated from the high end, 'crash_base' is greater than or
equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.

I think I should update the description, thanks.

if (!high)
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_LOW_MAX);
if (!crash_base)
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_HIGH_MAX);

> 
> reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> transformation for x86_32 doesn't matter, I think.
> 
> .
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] arm64: read VA_BITS from kcore for 52-bits VA kernel

2021-12-15 Thread Pingfan Liu
On Wed, Dec 15, 2021 at 9:35 PM Philipp Rudo  wrote:
>
> Hi Pingfang,
>
> On Fri, 10 Dec 2021 11:07:35 +0800
> Pingfan Liu  wrote:
>
> > phys_to_virt() calculates virtual address. As a important factor,
> > page_offset is excepted to be accurate.
> >
> > Since arm64 kernel exposes va_bits through vmcore, using it.
> >
> > Signed-off-by: Pingfan Liu 
> > ---
> >  kexec/arch/arm64/kexec-arm64.c | 31 +++
> >  kexec/arch/arm64/kexec-arm64.h |  1 +
> >  util_lib/elf_info.c|  5 +
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index bd650e6..ccc92db 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -54,7 +54,7 @@
> >  static bool try_read_phys_offset_from_kcore = false;
> >
> >  /* Machine specific details. */
> > -static int va_bits;
> > +static int va_bits = -1;
> >  static unsigned long page_offset;
> >
> >  /* Global varables the core kexec routines expect. */
> > @@ -876,7 +876,15 @@ static inline void set_phys_offset(long v, char 
> > *set_method)
> >
> >  static int get_va_bits(void)
> >  {
> > - unsigned long long stext_sym_addr = get_kernel_sym("_stext");
> > + unsigned long long stext_sym_addr;
> > +
> > + /*
> > +  * if already got from kcore
> > +  */
> > + if (va_bits != -1)
> > + goto out;
> > +
> > + stext_sym_addr = get_kernel_sym("_stext");
> >
> >   if (stext_sym_addr == 0) {
> >   fprintf(stderr, "Can't get the symbol of _stext.\n");
> > @@ -900,6 +908,7 @@ static int get_va_bits(void)
> >   return -1;
> >   }
> >
> > +out:
> >   dbgprintf("va_bits : %d\n", va_bits);
> >
> >   return 0;
> > @@ -917,14 +926,27 @@ int get_page_offset(unsigned long *page_offset)
> >   if (ret < 0)
> >   return ret;
> >
> > - page_offset = (0xUL) << (va_bits - 1);
> > + if (va_bits < 52)
> > + *page_offset = (0xUL) << (va_bits - 1);
> > + else
> > + *page_offset = (0xUL) << va_bits;
>
> wouldn't it make sense to use ULONG_MAX here? At least for me it would
> be much better readable.
>

Yes, I tend to agree and will update it in V2 (if there is no need to
compile it on 32-bits machine, which I consider as a rare case
nowadays.)


> >   dbgprintf("page_offset : %lx\n", page_offset);
> >
> >   return 0;
> >  }
> >
> > +static void arm64_scan_vmcoreinfo(char *pos)
> > +{
> > + const char *str;
> > +
> > + str = "NUMBER(VA_BITS)=";
> > + if (memcmp(str, pos, strlen(str)) == 0)
> > + va_bits = strtoul(pos + strlen(str), NULL, 10);
> > +}
> > +
> >  /**
> > - * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting PHYS_OFFSET
> > + * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting 
> > PHYS_OFFSET (and va_bits)
> >   * from VMCOREINFO note inside 'kcore'.
> >   */
> >
> > @@ -937,6 +959,7 @@ static int get_phys_offset_from_vmcoreinfo_pt_note(long 
> > *phys_offset)
> >   return EFAILED;
> >   }
> >
> > + arch_scan_vmcoreinfo = arm64_scan_vmcoreinfo;
> >   ret = read_phys_offset_elf_kcore(fd, phys_offset);
> >
> >   close(fd);
> > diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> > index ed99d9d..d291705 100644
> > --- a/kexec/arch/arm64/kexec-arm64.h
> > +++ b/kexec/arch/arm64/kexec-arm64.h
> > @@ -66,6 +66,7 @@ struct arm64_mem {
> >
> >  #define arm64_mem_ngv UINT64_MAX
> >  extern struct arm64_mem arm64_mem;
> > +extern void (*arch_scan_vmcoreinfo)(char *pos);
>
> This definition isn't arm64 specific. I think this should go to
> util_lib/include/elf_info.h.
>
Good suggestion and update it in V2.


Thanks,

Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] arm64: read VA_BITS from kcore for 52-bits VA kernel

2021-12-15 Thread Pingfan Liu
On Wed, Dec 15, 2021 at 9:06 PM signed-off...@vergenet.net:Simon
Horman  wrote:
>
> On Fri, Dec 10, 2021 at 11:07:35AM +0800, Pingfan Liu wrote:
> > phys_to_virt() calculates virtual address. As a important factor,
> > page_offset is excepted to be accurate.
> >
> > Since arm64 kernel exposes va_bits through vmcore, using it.
> >
> > Signed-off-by: Pingfan Liu 
> > ---
> >  kexec/arch/arm64/kexec-arm64.c | 31 +++
> >  kexec/arch/arm64/kexec-arm64.h |  1 +
> >  util_lib/elf_info.c|  5 +
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index bd650e6..ccc92db 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -54,7 +54,7 @@
> >  static bool try_read_phys_offset_from_kcore = false;
> >
> >  /* Machine specific details. */
> > -static int va_bits;
> > +static int va_bits = -1;
> >  static unsigned long page_offset;
> >
> >  /* Global varables the core kexec routines expect. */
> > @@ -876,7 +876,15 @@ static inline void set_phys_offset(long v, char 
> > *set_method)
> >
> >  static int get_va_bits(void)
> >  {
> > - unsigned long long stext_sym_addr = get_kernel_sym("_stext");
> > + unsigned long long stext_sym_addr;
> > +
> > + /*
> > +  * if already got from kcore
> > +  */
> > + if (va_bits != -1)
> > + goto out;
>
> If va_bits is exposed by the kernel then it will be used.
> Else we continue here. Are there actually cases (old kernels) where
> we expect to continue. Or could we get rid of the fallback code here?
>

va_bits is exposed by kernel commit 84c57dbd3c48 ("arm64: kernel:
arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE")
And the first kernel which contains VA_BITS is v4.19.
I have no idea about the need for old kernels. Maybe just keep the
compatible code and throw a warning to remind users?

> > +
> > + stext_sym_addr = get_kernel_sym("_stext");
> >
> >   if (stext_sym_addr == 0) {
> >   fprintf(stderr, "Can't get the symbol of _stext.\n");
> > @@ -900,6 +908,7 @@ static int get_va_bits(void)
> >   return -1;
> >   }
> >
> > +out:
> >   dbgprintf("va_bits : %d\n", va_bits);
> >
> >   return 0;
> > @@ -917,14 +926,27 @@ int get_page_offset(unsigned long *page_offset)
> >   if (ret < 0)
> >   return ret;
> >
>
> I'm confused about why there is both a (va_bits - 1)
> and va_bits case here.
>

It originates from the changes of memory layout on arm64.
And mostly it is contributed by kernel commit 14c127c957c1 ("arm64:
mm: Flip kernel VA space"),
where sees the changes:
-#define PAGE_OFFSET(UL(0x) - \
(UL(1) << (VA_BITS - 1)) + 1)
+#define PAGE_OFFSET(UL(0x) - \
+   (UL(1) << VA_BITS) + 1)


Thanks,

Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

2021-12-15 Thread Baoquan He
On 12/15/21 at 02:28pm, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> > }
> > }
> >  
> > -   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> > +   if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> > memblock_phys_free(crash_base, crash_size);
> > return;
> > }
> 
> That's not a equivalent transformation on X86_32.

reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
transformation for x86_32 doesn't matter, I think.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCHv2 1/1] s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

2021-12-15 Thread Heiko Carstens
From: Alexander Egorenkov 

Starting with gcc 11.3, the C compiler will generate PLT-relative function
calls even if they are local and do not require it. Later on during linking,
the linker will replace all PLT-relative calls to local functions with
PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
not being linked as a regular executable or shared library would have been,
and therefore, all PLT-relative addresses remain in the generated purgatory
object code unresolved. This in turn lets kexec-tools fail with
"Unknown rela relocation: 0x14 0x73c0901c" for such relocation types.

Furthermore, the clang C compiler has always behaved like described above
and this commit should fix the purgatory code built with the latter.

Because the purgatory code is no regular executable or shared library,
contains only calls to local functions and has no PLT, all R_390_PLT32DBL
relocation entries can be resolved just like a R_390_PC32DBL one.

* 
https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html#AEN1699

Relocation entries of purgatory code generated with gcc 11.3


$ readelf -r purgatory/purgatory.o

Relocation section '.rela.text' at offset 0x6e8 contains 27 entries:
  Offset  Info   Type   Sym. ValueSym. Name + Addend
000c  00030013 R_390_PC32DBL  .data + 2
001a  00100014 R_390_PLT32DBL sha256_starts + 2
0030  00110014 R_390_PLT32DBL sha256_update + 2
0046  00120014 R_390_PLT32DBL sha256_finish + 2
0050  00030013 R_390_PC32DBL  .data + 102
005a  00130014 R_390_PLT32DBL memcmp + 2
...
0118  00160014 R_390_PLT32DBL setup_arch + 2
011e  00030013 R_390_PC32DBL  .data + 2
012c  000f0014 R_390_PLT32DBL 
verify_sha256_digest + 2
0142  00170014 R_390_PLT32DBL
post_verification[...] + 2

Relocation entries of purgatory code generated with gcc 11.2


$ readelf -r purgatory/purgatory.o

Relocation section '.rela.text' at offset 0x6e8 contains 27 entries:
  Offset  Info   Type   Sym. ValueSym. Name + Addend
000e  00030013 R_390_PC32DBL  .data + 2
001c  00100013 R_390_PC32DBL  sha256_starts + 2
0036  00110013 R_390_PC32DBL  sha256_update + 2
0048  00120013 R_390_PC32DBL  sha256_finish + 2
0052  00030013 R_390_PC32DBL  .data + 102
005c  00130013 R_390_PC32DBL  memcmp + 2
...
011a  00160013 R_390_PC32DBL  setup_arch + 2
0120  00030013 R_390_PC32DBL  .data + 122
0130  000f0013 R_390_PC32DBL  
verify_sha256_digest + 2
0146  00170013 R_390_PC32DBL  
post_verification[...] + 2

Corresponding s390 kernel discussion:
* 
https://lore.kernel.org/linux-s390/20211208105801.188140-1-egore...@linux.ibm.com/T/#u

Signed-off-by: Alexander Egorenkov 
Reported-by: Tao Liu 
Suggested-by: Philipp Rudo 
Reviewed-by: Philipp Rudo 
[h...@linux.ibm.com: changed commit message as requested by Philipp Rudo]
Signed-off-by: Heiko Carstens 
---
 kexec/arch/s390/kexec-elf-rel-s390.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kexec/arch/s390/kexec-elf-rel-s390.c 
b/kexec/arch/s390/kexec-elf-rel-s390.c
index a5e1b7345578..91ba86a9991d 100644
--- a/kexec/arch/s390/kexec-elf-rel-s390.c
+++ b/kexec/arch/s390/kexec-elf-rel-s390.c
@@ -56,6 +56,7 @@ void machine_apply_elf_rel(struct mem_ehdr *UNUSED(ehdr),
case R_390_PC16:/* PC relative 16 bit.  */
case R_390_PC16DBL: /* PC relative 16 bit shifted by 1.  */
case R_390_PC32DBL: /* PC relative 32 bit shifted by 1.  */
+   case R_390_PLT32DBL:/* 32 bit PC rel. PLT shifted by 1.  */
case R_390_PC32:/* PC relative 32 bit.  */
case R_390_PC64:/* PC relative 64 bit.  */
val -= address;
@@ -63,7 +64,7 @@ void machine_apply_elf_rel(struct mem_ehdr *UNUSED(ehdr),
*(unsigned short *) loc = val;
else if (r_type == R_390_PC16DBL)
*(unsigned short *) loc = val >> 1;
-   else if (r_type == R_390_PC32DBL)
+   else if (r_type == R_390_PC32DBL || r_type == R_390_PLT32DBL)
*(unsigned int *) loc = val >> 1;
else if (r_type == R_390_PC32)
*(unsigned int *) loc = val;
-- 
2.32.0



[PATCHv2 0/1] s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

2021-12-15 Thread Heiko Carstens
Version 2 of Alexander Egorenkov's fix with a changed commit message
as pointed out by Philipp Rudo, and requested by Simon Horman.

Please apply.

Alexander Egorenkov (1):
  s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

 kexec/arch/s390/kexec-elf-rel-s390.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/1] s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

2021-12-15 Thread Heiko Carstens
On Wed, Dec 15, 2021 at 01:43:00PM +0100, Simon Horman wrote:
> On Mon, Dec 13, 2021 at 11:44:30AM +0100, Philipp Rudo wrote:
> > Hi Alexander,
> > 
> > @Alexander: Thanks for taking care of this.
> > 
> > On Wed,  8 Dec 2021 13:53:55 +0100
> > Alexander Egorenkov  wrote:
> > 
> > > Starting with gcc 11.3, the C compiler will generate PLT-relative function
> > > calls even if they are local and do not require it. Later on during 
> > > linking,
> > > the linker will replace all PLT-relative calls to local functions with
> > > PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
> > > not being linked as a regular executable or shared library would have 
> > > been,
> > > and therefore, all PLT-relative addresses remain in the generated 
> > > purgatory
> > > object code unresolved. This leads to the situation where the purgatory
> > > code is being executed during kdump with all PLT-relative addresses
> > > unresolved. And this results in endless loops within the purgatory code.
> > 
> > Tiny nit. The last two sentences describe the situation in the kernel.
> > Luckily the kexec-tools do proper error checking and die with
> > 
> > "Unknown rela relocation: 0x14 0x73c0901c"
> > 
> > when they encounter an unknown relocation type.
> > 
> > Anyway, the code is correct
> > 
> > Reviewed-by: Philipp Rudo 
> 
> Thanks.
> 
> Alexander would you care to post a v2 with an updated patch description?

Given that Alexander is currently not available, I will resend his
patch with an updated commit message.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 3/5] add slurp_proc_file()

2021-12-15 Thread Philipp Rudo
Hi Sven,

sorry, i need to nag again.

On Wed, 15 Dec 2021 11:18:34 +0100
Sven Schnelle  wrote:

> slurp_file() cannot be used to read proc files, as they are returning
> a size of zero in stat(). Add a function slurp_proc_file() which is
> similar to slurp_file(), but doesn't require the size of the file to
> be known.
> 
> Signed-off-by: Sven Schnelle 
> ---
>  kexec/kexec.c | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index f63b36b771eb..a1acba2adf2a 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -1106,6 +1106,38 @@ static void remove_parameter(char *line, const char 
> *param_name)
>   }
>  }
>  
> +static char *slurp_proc_file(const char *filename, size_t *len)
> +{
> + ssize_t ret, startpos = 0;
> + unsigned int size = 64;
> + char *buf = NULL, *tmp;
> + int fd;
> +
> + fd = open(filename, O_RDONLY);
> + if (fd == -1)
> + return NULL;
> +
> + do {
> + size *= 2;
> + tmp = realloc(buf, size);
> + if (!tmp) {
> + free(buf);
> + return NULL;
> + }
> + buf = tmp;
> +
> + ret = read(fd, buf + startpos, size - startpos);
> + if (ret < 0) {
> + free(buf);
> + return NULL;
> + }
> + startpos += ret;
> + *len = startpos;
> + } while (ret == size);

I don't think this will work as intended. 

1) read returns the bytes read. So ret has a maximum value of
   size - startpos and thus can only be equal to size on the first pass
   when startpos = 0.

2) it's not an error when read reads less bytes than requested but can
   happen when, e.g. it get's interrupted.

The simplest solution I see is to simply use 'while (ret)' Even when
that means that there is an extra pass in the loop with an extra
call to realloc.

The cleaner solution probably would be to put the read into a second
loop. I.e. the following should work (no tested)

do {
[...]
do {
ret = read(fd, buf + startpos, size - startpos);
if (ret < 0) {
free(buf);
return NULL;
}
startpos += ret;
while (ret && startpos != size);

*len = startpos;
} while (ret);

Thanks
Philipp


> +
> + return buf;
> +}
> +
>  /*
>   * Returns the contents of the current command line to be used with
>   * --reuse-cmdline option.  The function gets called from architecture 
> specific


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-15 Thread Baoquan He
On 12/15/21 at 07:03am, Hyeonggon Yoo wrote:
> On Wed, Dec 15, 2021 at 04:48:26AM +, Hyeonggon Yoo wrote:
> > 
> > Hello Baoquan and Vlastimil.
> > 
> > I'm not sure allowing ZONE_DMA32 for kdump kernel is nice way to solve
> > this problem. Devices that requires ZONE_DMA is rare but we still
> > support them.
> > 
> > If we allow ZONE_DMA32 for ZONE_DMA in kdump kernels,
> > the problem will be hard to find.
> > 
> 
> Sorry, I sometimes forget validating my english writing :(
> 
> What I meant:
> 
> I'm not sure that allocating from ZONE_DMA32 instead of ZONE_DMA
> for kdump kernel is nice way to solve this problem.

Yeah, if it's really <32bit addressing limit on device, it doesn't solve
problem. Not sure if devices really has the limitation when
kmalloc(GFP_DMA) is invoked kernel driver.

> 
> Devices that requires ZONE_DMA memory is rare but we still support them.
> 
> If we use ZONE_DMA32 memory instead of ZONE_DMA in kdump kernels,
> It will be hard to the problem when we use devices that can use only
> ZONE_DMA memory.
> 
> > What about one of those?:
> > 
> > 1) Do not call warn_alloc in page allocator if will always fail
> > to allocate ZONE_DMA pages.
> > 

Seems we can do like below.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c7a0b5de2ff..843bc8e5550a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
va_list args;
static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
 
-   if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(_rs))
+   if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(_rs) ||
+   (gfp_mask & __GFP_DMA) && !has_managed_dma())
return;
 
> > 
> > 2) let's check all callers of kmalloc with GFP_DMA
> > if they really need GFP_DMA flag and replace those by DMA API or
> > just remove GFP_DMA from kmalloc()

I grepped and got a list, I will try to start with several easy place,
see if we can do something to improve.
start with.


> > 
> > 3) Drop support for allocating DMA memory from slab allocator
> > (as Christoph Hellwig said) and convert them to use DMA32
> 
>   (as Christoph Hellwig said) and convert them to use *DMA API*

Yes, that will be ideal result. This is equivalent to 2), or depends
on 2).

> 
> > and see what happens
> > 
> > Thanks,
> > Hyeonggon.
> > 
> > > >> 
> > > >> Maybe the function get_capabilities() want to allocate memory
> > > >> even if it's not from DMA zone, but other callers will not expect that.
> > > > 
> > > > Yeah, I have the same guess too for get_capabilities(), not sure about 
> > > > other
> > > > callers. Or, as ChristophL and ChristophH said(Sorry, not sure if this 
> > > > is
> > > > the right way to call people when the first name is the same. Correct 
> > > > me if
> > > > it's wrong), any buffer requested from kmalloc can be used by device 
> > > > driver.
> > > > Means device enforces getting memory inside addressing limit for those
> > > > DMA transferring buffer which is usually large, Megabytes level with
> > > > vmalloc() or alloc_pages(), but doesn't care about this kind of small
> > > > piece buffer memory allocated with kmalloc()? Just a guess, please tell
> > > > a counter example if anyone happens to know, it could be easy.
> > > > 
> > > > 
> > > >> 
> > > >> >  kmalloc_caches[KMALLOC_DMA][i] = 
> > > >> > create_kmalloc_cache(
> > > >> >  kmalloc_info[i].name[KMALLOC_DMA],
> > > >> >  kmalloc_info[i].size,
> > > >> > -- 
> > > >> > 2.17.2
> > > >> > 
> > > >> > 
> > > >> 
> > > > 
> > > 
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-15 Thread Baoquan He
On 12/15/21 at 11:34am, Vlastimil Babka wrote:
> On 12/15/21 08:27, Christoph Hellwig wrote:
> > On Wed, Dec 15, 2021 at 07:03:35AM +, Hyeonggon Yoo wrote:
> >> I'm not sure that allocating from ZONE_DMA32 instead of ZONE_DMA
> >> for kdump kernel is nice way to solve this problem.
> > 
> > What is the problem with zones in kdump kernels?
> 
> My understanding is that kdump kernel can only use physical memory that it
> got reserved by the main kernel, and the main kernel will reserve some block
> of memory that doesn't include any pages from ZONE_DMA (first 16MB of
> physical memory or whatnot). By looking at the "crashkernel" parameter
> documentation in kernel-parameters.txt it seems we only care about
> below-4GB/above-4GB split.
> So it can easily happen that ZONE_DMA in the kdump kernel will be completely
> empty because the main kernel was using all of it.

Exactly as you said. Even before below regression commit added, we only
have 0~640K reused in kdump kernel. We resued the 1st 640K not because
we need it for zone DMA, just the 1st 640K is needed by BIOS/firmwre
during early stage of system bootup. So there are tens of or several
hundred KB left for managed pages in zone DMA except of those firmware
reserved area in the 1st 640K. After below commit, the 1st 1M is
reserved with memblock_reserve(), so no any physicall memory added to
zone DMA. Then we see the allocation failure.

When we prepare environment for kdump kernel, usually we will customize
a initramfs to includes those necessary ko. E.g a storage device is dump
target, its driver must be loaded. If a network dump specified, network
driver is needed. I never see a ISA device or a device of 24bit
addressing limit is needed in kdump kernel.

6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option 
is specified")

> 
> >> Devices that requires ZONE_DMA memory is rare but we still support them.
> > 
> > Indeed.
> > 
> >> > 1) Do not call warn_alloc in page allocator if will always fail
> >> > to allocate ZONE_DMA pages.
> >> > 
> >> > 
> >> > 2) let's check all callers of kmalloc with GFP_DMA
> >> > if they really need GFP_DMA flag and replace those by DMA API or
> >> > just remove GFP_DMA from kmalloc()
> >> > 
> >> > 3) Drop support for allocating DMA memory from slab allocator
> >> > (as Christoph Hellwig said) and convert them to use DMA32
> >> 
> >>(as Christoph Hellwig said) and convert them to use *DMA API*
> >> 
> >> > and see what happens
> > 
> > This is the right thing to do, but it will take a while.  In fact
> > I dont think we really need the warning in step 1, a simple grep
> > already allows to go over them.  I just looked at the uses of GFP_DMA
> > in drivers/scsi for example, and all but one look bogus.
> > 
> >> > > > Yeah, I have the same guess too for get_capabilities(), not sure 
> >> > > > about other
> >> > > > callers. Or, as ChristophL and ChristophH said(Sorry, not sure if 
> >> > > > this is
> >> > > > the right way to call people when the first name is the same. 
> >> > > > Correct me if
> >> > > > it's wrong), any buffer requested from kmalloc can be used by device 
> >> > > > driver.
> >> > > > Means device enforces getting memory inside addressing limit for 
> >> > > > those
> >> > > > DMA transferring buffer which is usually large, Megabytes level with
> >> > > > vmalloc() or alloc_pages(), but doesn't care about this kind of small
> >> > > > piece buffer memory allocated with kmalloc()? Just a guess, please 
> >> > > > tell
> >> > > > a counter example if anyone happens to know, it could be easy.
> > 
> > The way this works is that the dma_map* calls will bounce buffer memory
> 
> But if ZONE_DMA is not populated, where will it get the bounce buffer from?
> I guess nowhere and the problem still exists?

Agree. When I investigated other ARCHs, arm64 has a fascinating setup
for zone DMA/DMA32. It defaults to have all low 4G memory into zone DMA,
but empty zone DMA32. Only if ACPI/DT reports <32 bit addressing
devices, it will set it as limit of zone DMA.

ZONE_DMA   ZONE_DMA32
arm64   0~XX~4G  (X is got from ACPI or DT. Otherwise it's 4G by 
default, DMA32 is empty)

> 
> > that does to fall into the addressing limitations.  This is a performance
> > overhead, but allows drivers to address all memory in a system.  If the
> > driver controls memory allocation it should use one of the dma_alloc_*
> > APIs that allocate addressable memory from the start.  The allocator
> > will dip into ZONE_DMA and ZONE_DMA32 when needed.
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] arm64: read VA_BITS from kcore for 52-bits VA kernel

2021-12-15 Thread Philipp Rudo
Hi Pingfang,

On Fri, 10 Dec 2021 11:07:35 +0800
Pingfan Liu  wrote:

> phys_to_virt() calculates virtual address. As a important factor,
> page_offset is excepted to be accurate.
> 
> Since arm64 kernel exposes va_bits through vmcore, using it.
> 
> Signed-off-by: Pingfan Liu 
> ---
>  kexec/arch/arm64/kexec-arm64.c | 31 +++
>  kexec/arch/arm64/kexec-arm64.h |  1 +
>  util_lib/elf_info.c|  5 +
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index bd650e6..ccc92db 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -54,7 +54,7 @@
>  static bool try_read_phys_offset_from_kcore = false;
>  
>  /* Machine specific details. */
> -static int va_bits;
> +static int va_bits = -1;
>  static unsigned long page_offset;
>  
>  /* Global varables the core kexec routines expect. */
> @@ -876,7 +876,15 @@ static inline void set_phys_offset(long v, char 
> *set_method)
>  
>  static int get_va_bits(void)
>  {
> - unsigned long long stext_sym_addr = get_kernel_sym("_stext");
> + unsigned long long stext_sym_addr;
> +
> + /*
> +  * if already got from kcore
> +  */
> + if (va_bits != -1)
> + goto out;
> +
> + stext_sym_addr = get_kernel_sym("_stext");
>  
>   if (stext_sym_addr == 0) {
>   fprintf(stderr, "Can't get the symbol of _stext.\n");
> @@ -900,6 +908,7 @@ static int get_va_bits(void)
>   return -1;
>   }
>  
> +out:
>   dbgprintf("va_bits : %d\n", va_bits);
>  
>   return 0;
> @@ -917,14 +926,27 @@ int get_page_offset(unsigned long *page_offset)
>   if (ret < 0)
>   return ret;
>  
> - page_offset = (0xUL) << (va_bits - 1);
> + if (va_bits < 52)
> + *page_offset = (0xUL) << (va_bits - 1);
> + else
> + *page_offset = (0xUL) << va_bits;

wouldn't it make sense to use ULONG_MAX here? At least for me it would
be much better readable.

>   dbgprintf("page_offset : %lx\n", page_offset);
>  
>   return 0;
>  }
>  
> +static void arm64_scan_vmcoreinfo(char *pos)
> +{
> + const char *str;
> +
> + str = "NUMBER(VA_BITS)=";
> + if (memcmp(str, pos, strlen(str)) == 0)
> + va_bits = strtoul(pos + strlen(str), NULL, 10);
> +}
> +
>  /**
> - * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting PHYS_OFFSET
> + * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting PHYS_OFFSET 
> (and va_bits)
>   * from VMCOREINFO note inside 'kcore'.
>   */
>  
> @@ -937,6 +959,7 @@ static int get_phys_offset_from_vmcoreinfo_pt_note(long 
> *phys_offset)
>   return EFAILED;
>   }
>  
> + arch_scan_vmcoreinfo = arm64_scan_vmcoreinfo;
>   ret = read_phys_offset_elf_kcore(fd, phys_offset);
>  
>   close(fd);
> diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> index ed99d9d..d291705 100644
> --- a/kexec/arch/arm64/kexec-arm64.h
> +++ b/kexec/arch/arm64/kexec-arm64.h
> @@ -66,6 +66,7 @@ struct arm64_mem {
>  
>  #define arm64_mem_ngv UINT64_MAX
>  extern struct arm64_mem arm64_mem;
> +extern void (*arch_scan_vmcoreinfo)(char *pos);

This definition isn't arm64 specific. I think this should go to
util_lib/include/elf_info.h.

Thanks
Philipp

>  
>  uint64_t get_phys_offset(void);
>  uint64_t get_vp_offset(void);
> diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
> index 5574c7f..d252eff 100644
> --- a/util_lib/elf_info.c
> +++ b/util_lib/elf_info.c
> @@ -310,6 +310,8 @@ int get_pt_load(int idx,
>  
>  #define NOT_FOUND_LONG_VALUE (-1)
>  
> +void (*arch_scan_vmcoreinfo)(char *pos);
> +
>  void scan_vmcoreinfo(char *start, size_t size)
>  {
>   char *last = start + size - 1;
> @@ -551,6 +553,9 @@ void scan_vmcoreinfo(char *start, size_t size)
>   }
>   }
>  
> + if (arch_scan_vmcoreinfo != NULL)
> + (*arch_scan_vmcoreinfo)(pos);
> +
>   if (last_line)
>   break;
>   }


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/3] arm64/crashdump: unify routine to get page_offset

2021-12-15 Thread Philipp Rudo
Hi Pinfang,

On Fri, 10 Dec 2021 11:07:34 +0800
Pingfan Liu  wrote:

> There are two funcs to get page_offset:
>   get_kernel_page_offset()
>   get_page_offset()
> 
> Since get_kernel_page_offset() does not observe the kernel formula, and
> remove it. Unify them in order to introduce 52-bits VA kernel more
> easily in the coming patch.
> 
> Signed-off-by: Pingfan Liu 
> ---
>  kexec/arch/arm64/crashdump-arm64.c | 23 +--
>  kexec/arch/arm64/kexec-arm64.c |  4 ++--
>  kexec/arch/arm64/kexec-arm64.h |  1 +
>  3 files changed, 4 insertions(+), 24 deletions(-)
> 
> diff --git a/kexec/arch/arm64/crashdump-arm64.c 
> b/kexec/arch/arm64/crashdump-arm64.c
> index a02019a..0a8d44c 100644
> --- a/kexec/arch/arm64/crashdump-arm64.c
> +++ b/kexec/arch/arm64/crashdump-arm64.c

[...]

> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index 7373fa3..bd650e6 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -909,7 +909,7 @@ static int get_va_bits(void)
>   * get_page_offset - Helper for getting PAGE_OFFSET
>   */
>  
> -static int get_page_offset(void)
> +int get_page_offset(unsigned long *page_offset)
>  {
>   int ret;
>  

The new page_offset is never used in this patch. There's still the line
left with the global page_offset but that will cause a type miss-match
(unsigend long vs unsigned log *). In the next patch you do the update
(page_offset -> *page_offset) but in the meantime the code is broken.

Thanks
Philipp

> @@ -954,7 +954,7 @@ int get_phys_base_from_pt_load(long *phys_offset)
>   unsigned long long phys_start;
>   unsigned long long virt_start;
>  
> - ret = get_page_offset();
> + ret = get_page_offset(_offset);
>   if (ret < 0)
>   return ret;
>  
> diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> index 1844b67..ed99d9d 100644
> --- a/kexec/arch/arm64/kexec-arm64.h
> +++ b/kexec/arch/arm64/kexec-arm64.h
> @@ -69,6 +69,7 @@ extern struct arm64_mem arm64_mem;
>  
>  uint64_t get_phys_offset(void);
>  uint64_t get_vp_offset(void);
> +int get_page_offset(unsigned long *offset);
>  
>  static inline void reset_vp_offset(void)
>  {


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

2021-12-15 Thread Borislav Petkov
On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>   }
>   }
>  
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>   memblock_phys_free(crash_base, crash_size);
>   return;
>   }

That's not a equivalent transformation on X86_32.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/crashdump: deduce paddr of _text based on kernel code size

2021-12-15 Thread Simon Horman
On Fri, Dec 10, 2021 at 10:57:33AM +0800, Pingfan Liu wrote:
> kexec-tools commit 61b8c79b0fb7 ("arm64/crashdump-arm64: deduce the
> paddr of _text") tries to deduce the paddr of _text, but turns out
> partially.
> 
> That commit is based on "The Image must be placed text_offset bytes from
> a 2MB aligned base address anywhere in usable system RAM and called
> there" in linux/Documentation/arm64/booting.rst, plus text_offset field
> is zero.
> 
> But in practice, some boot loaders does not obey the convention, and
> still boots up the kernel successfully.
> 
> Revisiting kernel commit e2a073dde921 ("arm64: omit [_text, _stext) from
> permanent kernel mapping"), the kernel code size changes from (unsigned
> long)__init_begin - (unsigned long)_text to (unsigned long)__init_begin
> - (unsigned long)_stext
> 
> And it should be a better factor to decide which label starts the
> "Kernel code" in /proc/iomem.
> 
> Signed-off-by: Pingfan Liu 
> Cc: Simon Horman 
> To: kexec@lists.infradead.org

Thanks, applied.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/1] s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

2021-12-15 Thread Simon Horman
On Mon, Dec 13, 2021 at 11:44:30AM +0100, Philipp Rudo wrote:
> Hi Alexander,
> 
> @Alexander: Thanks for taking care of this.
> 
> On Wed,  8 Dec 2021 13:53:55 +0100
> Alexander Egorenkov  wrote:
> 
> > Starting with gcc 11.3, the C compiler will generate PLT-relative function
> > calls even if they are local and do not require it. Later on during linking,
> > the linker will replace all PLT-relative calls to local functions with
> > PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
> > not being linked as a regular executable or shared library would have been,
> > and therefore, all PLT-relative addresses remain in the generated purgatory
> > object code unresolved. This leads to the situation where the purgatory
> > code is being executed during kdump with all PLT-relative addresses
> > unresolved. And this results in endless loops within the purgatory code.
> 
> Tiny nit. The last two sentences describe the situation in the kernel.
> Luckily the kexec-tools do proper error checking and die with
> 
>   "Unknown rela relocation: 0x14 0x73c0901c"
> 
> when they encounter an unknown relocation type.
> 
> Anyway, the code is correct
> 
> Reviewed-by: Philipp Rudo 

Thanks.

Alexander would you care to post a v2 with an updated patch description?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-15 Thread David Laight
From: Vlastimil Babka
> Sent: 15 December 2021 10:34
> 
> On 12/15/21 08:27, Christoph Hellwig wrote:
> > On Wed, Dec 15, 2021 at 07:03:35AM +, Hyeonggon Yoo wrote:
> >> I'm not sure that allocating from ZONE_DMA32 instead of ZONE_DMA
> >> for kdump kernel is nice way to solve this problem.
> >
> > What is the problem with zones in kdump kernels?
> 
> My understanding is that kdump kernel can only use physical memory that it
> got reserved by the main kernel, and the main kernel will reserve some block
> of memory that doesn't include any pages from ZONE_DMA (first 16MB of
> physical memory or whatnot). 
...

Is there still any support for any of the very old hardware that could only
support 24bit DMA?

I think the AMD PCnet-ISA and PCnet-PCI ethernet (lance) were both 32bit 
masters.
(I don't remember ever having to worry about physical addresses.)
I'm sure I remember some old SCSI boards only being able to do 24bit DMA.
But I can't remember which bus interface they were.
Unlikely to be ISA because it has always been hard to get a motherboard
DMA channel into 'cascade mode'.

Might have been some EISA boards - anyone still use those?
So we are left with early PCI boards.

It really is worth looking at what actually needs it at all.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-15 Thread Leizhen (ThunderTown)



On 2021/12/15 19:16, Baoquan He wrote:
> On 12/15/21 at 11:01am, Catalin Marinas wrote:
>> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
>>> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
 On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
>> From: Chen Zhou 
>>
>> The lower bounds of crash kernel reservation and crash kernel low
>> reservation are different, use the consistent value CRASH_ALIGN.
>
> A big WHY is missing here to explain why the lower bound of the
> allocation range needs to be 16M and why was 0 wrong?

 I asked the same here:

 https://lore.kernel.org/r/20210224143547.gb28...@arm.com

 IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
 lower part, so that's equivalent in practice to starting from
 CRASH_ALIGN.
>>>
>>> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
>>> Considering the existing alignment CRASH_ALIGN which is 16M, we
>>> definitely have no chance to get memory starting from 0. So starting
>>> from 16M can skip the useless memblock searching, and make the
>>> crashkernel low reservation consisten with crashkernel reservation on
>>> allocation code.
>>
>> That's the x86 assumption. Is it valid for other architectures once the
>> code has been made generic in patch 6? It should be ok for arm64, RAM
>> tends to start from higher up but other architectures may start using
>> this common code.
> 
> Good point. I didn't think of this from generic code side, then let's
> keep it as 0.
> 
>>
>> If you want to keep the same semantics as before, just leave it as 0.
>> It's not that the additional lower bound makes the search slower.
> 
> Agree.

OK, I will drop this patch.

> 
> .
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-15 Thread Baoquan He
On 12/15/21 at 11:01am, Catalin Marinas wrote:
> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> > On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > > From: Chen Zhou 
> > > > > 
> > > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > > 
> > > > A big WHY is missing here to explain why the lower bound of the
> > > > allocation range needs to be 16M and why was 0 wrong?
> > > 
> > > I asked the same here:
> > > 
> > > https://lore.kernel.org/r/20210224143547.gb28...@arm.com
> > > 
> > > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > > lower part, so that's equivalent in practice to starting from
> > > CRASH_ALIGN.
> > 
> > Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> > Considering the existing alignment CRASH_ALIGN which is 16M, we
> > definitely have no chance to get memory starting from 0. So starting
> > from 16M can skip the useless memblock searching, and make the
> > crashkernel low reservation consisten with crashkernel reservation on
> > allocation code.
> 
> That's the x86 assumption. Is it valid for other architectures once the
> code has been made generic in patch 6? It should be ok for arm64, RAM
> tends to start from higher up but other architectures may start using
> this common code.

Good point. I didn't think of this from generic code side, then let's
keep it as 0.

> 
> If you want to keep the same semantics as before, just leave it as 0.
> It's not that the additional lower bound makes the search slower.

Agree.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-12-15 Thread Catalin Marinas
On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > From: Chen Zhou 
> > > > 
> > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > 
> > > A big WHY is missing here to explain why the lower bound of the
> > > allocation range needs to be 16M and why was 0 wrong?
> > 
> > I asked the same here:
> > 
> > https://lore.kernel.org/r/20210224143547.gb28...@arm.com
> > 
> > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > lower part, so that's equivalent in practice to starting from
> > CRASH_ALIGN.
> 
> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> Considering the existing alignment CRASH_ALIGN which is 16M, we
> definitely have no chance to get memory starting from 0. So starting
> from 16M can skip the useless memblock searching, and make the
> crashkernel low reservation consisten with crashkernel reservation on
> allocation code.

That's the x86 assumption. Is it valid for other architectures once the
code has been made generic in patch 6? It should be ok for arm64, RAM
tends to start from higher up but other architectures may start using
this common code.

If you want to keep the same semantics as before, just leave it as 0.
It's not that the additional lower bound makes the search slower.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-15 Thread Vlastimil Babka
On 12/15/21 08:27, Christoph Hellwig wrote:
> On Wed, Dec 15, 2021 at 07:03:35AM +, Hyeonggon Yoo wrote:
>> I'm not sure that allocating from ZONE_DMA32 instead of ZONE_DMA
>> for kdump kernel is nice way to solve this problem.
> 
> What is the problem with zones in kdump kernels?

My understanding is that kdump kernel can only use physical memory that it
got reserved by the main kernel, and the main kernel will reserve some block
of memory that doesn't include any pages from ZONE_DMA (first 16MB of
physical memory or whatnot). By looking at the "crashkernel" parameter
documentation in kernel-parameters.txt it seems we only care about
below-4GB/above-4GB split.
So it can easily happen that ZONE_DMA in the kdump kernel will be completely
empty because the main kernel was using all of it.

>> Devices that requires ZONE_DMA memory is rare but we still support them.
> 
> Indeed.
> 
>> > 1) Do not call warn_alloc in page allocator if will always fail
>> > to allocate ZONE_DMA pages.
>> > 
>> > 
>> > 2) let's check all callers of kmalloc with GFP_DMA
>> > if they really need GFP_DMA flag and replace those by DMA API or
>> > just remove GFP_DMA from kmalloc()
>> > 
>> > 3) Drop support for allocating DMA memory from slab allocator
>> > (as Christoph Hellwig said) and convert them to use DMA32
>> 
>>  (as Christoph Hellwig said) and convert them to use *DMA API*
>> 
>> > and see what happens
> 
> This is the right thing to do, but it will take a while.  In fact
> I dont think we really need the warning in step 1, a simple grep
> already allows to go over them.  I just looked at the uses of GFP_DMA
> in drivers/scsi for example, and all but one look bogus.
> 
>> > > > Yeah, I have the same guess too for get_capabilities(), not sure about 
>> > > > other
>> > > > callers. Or, as ChristophL and ChristophH said(Sorry, not sure if this 
>> > > > is
>> > > > the right way to call people when the first name is the same. Correct 
>> > > > me if
>> > > > it's wrong), any buffer requested from kmalloc can be used by device 
>> > > > driver.
>> > > > Means device enforces getting memory inside addressing limit for those
>> > > > DMA transferring buffer which is usually large, Megabytes level with
>> > > > vmalloc() or alloc_pages(), but doesn't care about this kind of small
>> > > > piece buffer memory allocated with kmalloc()? Just a guess, please tell
>> > > > a counter example if anyone happens to know, it could be easy.
> 
> The way this works is that the dma_map* calls will bounce buffer memory

But if ZONE_DMA is not populated, where will it get the bounce buffer from?
I guess nowhere and the problem still exists?

> that does to fall into the addressing limitations.  This is a performance
> overhead, but allows drivers to address all memory in a system.  If the
> driver controls memory allocation it should use one of the dma_alloc_*
> APIs that allocate addressable memory from the start.  The allocator
> will dip into ZONE_DMA and ZONE_DMA32 when needed.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 0/5] s390: add support for extended cmdline length

2021-12-15 Thread Sven Schnelle
Hi,

this patchset adds support for extended command line lengths on s390.
The former limit of 896 has been too short in a some configurations. The
linux kernel now provides an additional field in the parameter area
which contains the maximum allowed command line length. In older kernels
this field is zero. In that case the old limit of 896 bytes is used.
This was introduced with 5ecb2da660ab ("s390: support command lines
longer than 896 bytes") in linux.

And while at it, also add the --reuse-cmdline option and use
KEXEC_ALL_OPTIONS.

Changes in v2:

- add slurp_proc_file() to read proc files without knowing the size upfront
- move command_line string from global to function scope

Sven Schnelle (5):
  s390: add variable command line size
  s390: use KEXEC_ALL_OPTIONS
  add slurp_proc_file()
  use slurp_proc_file() in get_command_line()
  s390: add support for --reuse-cmdline

 kexec/arch/s390/crashdump-s390.c   |  3 +-
 kexec/arch/s390/include/arch/options.h | 10 +--
 kexec/arch/s390/kexec-image.c  | 84 --
 kexec/arch/s390/kexec-s390.h   | 21 ---
 kexec/kexec.c  | 58 +-
 5 files changed, 114 insertions(+), 62 deletions(-)

-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 3/5] add slurp_proc_file()

2021-12-15 Thread Sven Schnelle
slurp_file() cannot be used to read proc files, as they are returning
a size of zero in stat(). Add a function slurp_proc_file() which is
similar to slurp_file(), but doesn't require the size of the file to
be known.

Signed-off-by: Sven Schnelle 
---
 kexec/kexec.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b771eb..a1acba2adf2a 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1106,6 +1106,38 @@ static void remove_parameter(char *line, const char 
*param_name)
}
 }
 
+static char *slurp_proc_file(const char *filename, size_t *len)
+{
+   ssize_t ret, startpos = 0;
+   unsigned int size = 64;
+   char *buf = NULL, *tmp;
+   int fd;
+
+   fd = open(filename, O_RDONLY);
+   if (fd == -1)
+   return NULL;
+
+   do {
+   size *= 2;
+   tmp = realloc(buf, size);
+   if (!tmp) {
+   free(buf);
+   return NULL;
+   }
+   buf = tmp;
+
+   ret = read(fd, buf + startpos, size - startpos);
+   if (ret < 0) {
+   free(buf);
+   return NULL;
+   }
+   startpos += ret;
+   *len = startpos;
+   } while (ret == size);
+
+   return buf;
+}
+
 /*
  * Returns the contents of the current command line to be used with
  * --reuse-cmdline option.  The function gets called from architecture specific
-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 5/5] s390: add support for --reuse-cmdline

2021-12-15 Thread Sven Schnelle
--reuse-cmdline reads the command line of the currently
running kernel from /proc/cmdline and uses that for the
kernel that should be kexec'd.

Signed-off-by: Sven Schnelle 
Reviewed-by: Alexander Egorenkov 
---
 kexec/arch/s390/include/arch/options.h | 10 ++
 kexec/arch/s390/kexec-image.c  |  9 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/kexec/arch/s390/include/arch/options.h 
b/kexec/arch/s390/include/arch/options.h
index 76044a301ceb..c150244996c7 100644
--- a/kexec/arch/s390/include/arch/options.h
+++ b/kexec/arch/s390/include/arch/options.h
@@ -1,9 +1,10 @@
 #ifndef KEXEC_ARCH_S390_OPTIONS_H
 #define KEXEC_ARCH_S390_OPTIONS_H
 
-#define OPT_ARCH_MAX   (OPT_MAX+0)
-#define OPT_APPEND OPT_MAX+0
-#define OPT_RAMDISKOPT_MAX+1
+#define OPT_ARCH_MAX   (OPT_MAX+0)
+#define OPT_APPEND (OPT_MAX+0)
+#define OPT_RAMDISK(OPT_MAX+1)
+#define OPT_REUSE_CMDLINE  (OPT_MAX+2)
 
 /* Options relevant to the architecture (excluding loader-specific ones),
  * in this case none:
@@ -31,7 +32,8 @@
KEXEC_ARCH_OPTIONS  \
{"command-line", 1, 0, OPT_APPEND}, \
{"append",   1, 0, OPT_APPEND}, \
-   {"initrd",   1, 0, OPT_RAMDISK},
+   {"initrd",   1, 0, OPT_RAMDISK},\
+   {"reuse-cmdline",0, 0, OPT_REUSE_CMDLINE },
 
 #define KEXEC_ALL_OPT_STR KEXEC_ARCH_OPT_STR
 
diff --git a/kexec/arch/s390/kexec-image.c b/kexec/arch/s390/kexec-image.c
index 209ab77ddccb..69aaf96812f7 100644
--- a/kexec/arch/s390/kexec-image.c
+++ b/kexec/arch/s390/kexec-image.c
@@ -71,6 +71,10 @@ int image_s390_load_file(int argc, char **argv, struct 
kexec_info *info)
case OPT_RAMDISK:
ramdisk = optarg;
break;
+   case OPT_REUSE_CMDLINE:
+   free(info->command_line);
+   info->command_line = get_command_line();
+   break;
}
}
 
@@ -123,6 +127,10 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
if (command_line_add(info, optarg))
return -1;
break;
+   case OPT_REUSE_CMDLINE:
+   free(info->command_line);
+   info->command_line = get_command_line();
+   break;
case OPT_RAMDISK:
ramdisk = optarg;
break;
@@ -223,5 +231,6 @@ image_s390_usage(void)
printf("--command-line=STRING Set the kernel command line to STRING.\n"
   "--append=STRING   Set the kernel command line to STRING.\n"
   "--initrd=FILENAME Use the file FILENAME as a ramdisk.\n"
+  "--reuse-cmdline   Use kernel command line from running 
system.\n"
);
 }
-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 4/5] use slurp_proc_file() in get_command_line()

2021-12-15 Thread Sven Schnelle
This way the size of the command line that get_command_line() can handle
is no longer fixed.

Signed-off-by: Sven Schnelle 
---
 kexec/kexec.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/kexec/kexec.c b/kexec/kexec.c
index a1acba2adf2a..e2c58de48565 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1153,25 +1153,19 @@ static char *slurp_proc_file(const char *filename, 
size_t *len)
  */
 char *get_command_line(void)
 {
-   FILE *fp;
-   char *line;
-   const int sizeof_line = 2048;
-
-   line = malloc(sizeof_line);
-   if (line == NULL)
-   die("Could not allocate memory to read /proc/cmdline.");
-
-   fp = fopen("/proc/cmdline", "r");
-   if (!fp)
-   die("Could not open /proc/cmdline.");
-
-   if (fgets(line, sizeof_line, fp) == NULL)
-   die("Can't read /proc/cmdline.");
+   char *p, *line;
+   size_t size;
 
-   fclose(fp);
+   line = slurp_proc_file("/proc/cmdline", );
+   if (!line || !size)
+   die("Failed to read /proc/cmdline\n");
 
/* strip newline */
-   line[strlen(line) - 1] = '\0';
+   line[size-1] = '\0';
+
+   p = strpbrk(line, "\r\n");
+   if (p)
+   *p = '\0';
 
remove_parameter(line, "BOOT_IMAGE");
if (kexec_flags & KEXEC_ON_CRASH)
-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/5] s390: add variable command line size

2021-12-15 Thread Sven Schnelle
Newer s390 kernels support a command line size longer than 896
bytes. Such kernels contain a new member in the parameter area,
which might be utilized by tools like kexec. Older kernels have
the location initialized to zero, so we check whether there's a
non-zero number present and use that. If there isn't, we fallback
to the legacy command line size of 896 bytes.

Signed-off-by: Sven Schnelle 
Reviewed-by: Alexander Egorenkov 
---
 kexec/arch/s390/crashdump-s390.c |  3 +-
 kexec/arch/s390/kexec-image.c| 65 +---
 kexec/arch/s390/kexec-s390.h | 21 ++-
 3 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/kexec/arch/s390/crashdump-s390.c b/kexec/arch/s390/crashdump-s390.c
index 10f4d607bbcc..3bd9efe6dafe 100644
--- a/kexec/arch/s390/crashdump-s390.c
+++ b/kexec/arch/s390/crashdump-s390.c
@@ -52,7 +52,8 @@ static int create_elf_header(struct kexec_info *info, 
unsigned long crash_base,
elfcorehdr_size = bufsz;
snprintf(str, sizeof(str), " elfcorehdr=%ld@%ldK\n",
 elfcorehdr_size, elfcorehdr / 1024);
-   command_line_add(str);
+   if (command_line_add(info, str))
+   return -1;
 #endif
return 0;
 }
diff --git a/kexec/arch/s390/kexec-image.c b/kexec/arch/s390/kexec-image.c
index 3c24fdfe3c7c..a52399eafd2a 100644
--- a/kexec/arch/s390/kexec-image.c
+++ b/kexec/arch/s390/kexec-image.c
@@ -25,7 +25,6 @@
 #include 
 
 static uint64_t crash_base, crash_end;
-static char command_line[COMMAND_LINESIZE];
 
 static void add_segment_check(struct kexec_info *info, const void *buf,
  size_t bufsz, unsigned long base, size_t memsz)
@@ -36,13 +35,18 @@ static void add_segment_check(struct kexec_info *info, 
const void *buf,
add_segment(info, buf, bufsz, crash_base + base, memsz);
 }
 
-int command_line_add(const char *str)
+int command_line_add(struct kexec_info *info, const char *str)
 {
-   if (strlen(command_line) + strlen(str) + 1 > COMMAND_LINESIZE) {
-   fprintf(stderr, "Command line too long.\n");
+   char *tmp = NULL;
+
+   tmp = concat_cmdline(info->command_line, str);
+   if (!tmp) {
+   fprintf(stderr, "out of memory\n");
return -1;
}
-   strcat(command_line, str);
+
+   free(info->command_line);
+   info->command_line = tmp;
return 0;
 }
 
@@ -64,7 +68,7 @@ int image_s390_load_file(int argc, char **argv, struct 
kexec_info *info)
while ((opt = getopt_long(argc, argv, short_options, options, 0)) != 
-1) {
switch(opt) {
case OPT_APPEND:
-   if (command_line_add(optarg))
+   if (command_line_add(info, optarg))
return -1;
break;
case OPT_RAMDISK:
@@ -78,13 +82,16 @@ int image_s390_load_file(int argc, char **argv, struct 
kexec_info *info)
if (info->initrd_fd == -1) {
fprintf(stderr, "Could not open initrd file %s:%s\n",
ramdisk, strerror(errno));
+   free(info->command_line);
+   info->command_line = NULL;
return -1;
}
}
 
-   info->command_line = command_line;
-   info->command_line_len = strlen (command_line) + 1;
-
+   if (info->command_line)
+   info->command_line_len = strlen(info->command_line) + 1;
+   else
+   info->command_line_len = 0;
return 0;
 }
 
@@ -97,7 +104,7 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
const char *ramdisk;
off_t ramdisk_len;
unsigned int ramdisk_origin;
-   int opt;
+   int opt, ret = -1;
 
if (info->file_mode)
return image_s390_load_file(argc, argv, info);
@@ -112,7 +119,6 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
};
static const char short_options[] = KEXEC_OPT_STR "";
 
-   command_line[0] = 0;
ramdisk = NULL;
ramdisk_len = 0;
ramdisk_origin = 0;
@@ -120,7 +126,7 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
while ((opt = getopt_long(argc,argv,short_options,options,0)) != -1) {
switch(opt) {
case OPT_APPEND:
-   if (command_line_add(optarg))
+   if (command_line_add(info, optarg))
return -1;
break;
case OPT_RAMDISK:
@@ -132,7 +138,7 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
if (info->kexec_flags & KEXEC_ON_CRASH) {
if (parse_iomem_single("Crash kernel\n", _base,
   _end))
-   return -1;
+   goto out;
}
 
/* Add kernel segment 

[PATCH v2 2/5] s390: use KEXEC_ALL_OPTIONS

2021-12-15 Thread Sven Schnelle
KEXEC_ALL_OPTIONS could be used instead defining the same
array several times. This makes code easier to maintain when
new options are added.

Suggested-by: Alexander Egorenkov 
Signed-off-by: Sven Schnelle 
Reviewed-by: Alexander Egorenkov 
---
 kexec/arch/s390/kexec-image.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kexec/arch/s390/kexec-image.c b/kexec/arch/s390/kexec-image.c
index a52399eafd2a..209ab77ddccb 100644
--- a/kexec/arch/s390/kexec-image.c
+++ b/kexec/arch/s390/kexec-image.c
@@ -57,10 +57,7 @@ int image_s390_load_file(int argc, char **argv, struct 
kexec_info *info)
 
static const struct option options[] =
{
-   KEXEC_OPTIONS
-   {"command-line", 1, 0, OPT_APPEND},
-   {"append",   1, 0, OPT_APPEND},
-   {"initrd",   1, 0, OPT_RAMDISK},
+   KEXEC_ALL_OPTIONS
{0,  0, 0, 0},
};
static const char short_options[] = KEXEC_OPT_STR "";
@@ -111,10 +108,7 @@ image_s390_load(int argc, char **argv, const char 
*kernel_buf,
 
static const struct option options[] =
{
-   KEXEC_OPTIONS
-   {"command-line", 1, 0, OPT_APPEND},
-   {"append",   1, 0, OPT_APPEND},
-   {"initrd",   1, 0, OPT_RAMDISK},
+   KEXEC_ALL_OPTIONS
{0,  0, 0, 0},
};
static const char short_options[] = KEXEC_OPT_STR "";
-- 
2.32.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-15 Thread Baoquan He
On 12/14/21 at 11:09am, Vlastimil Babka wrote:
> On 12/14/21 06:32, Baoquan He wrote:
> > On 12/13/21 at 01:43pm, Hyeonggon Yoo wrote:
> >> Hello Baoquan. I have a question on your code.
> >> 
> >> On Mon, Dec 13, 2021 at 08:27:12PM +0800, Baoquan He wrote:
> >> > Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
> >> > However, it will fail if DMA zone has no managed pages. The failure
> >> > can be seen in kdump kernel of x86_64 as below:
> >> > 
> 
> Could have included the warning headline too.

Sure, I will paste the whole warning when repost.

> 
> >> >  CPU: 0 PID: 65 Comm: kworker/u2:1 Not tainted 5.14.0-rc2+ #9
> >> >  Hardware name: Intel Corporation SandyBridge Platform/To be filled by 
> >> > O.E.M., BIOS RMLSDP.86I.R2.28.D690.1306271008 06/27/2013
> >> >  Workqueue: events_unbound async_run_entry_fn
> >> >  Call Trace:
> >> >   dump_stack_lvl+0x57/0x72
> >> >   warn_alloc.cold+0x72/0xd6
> >> >   __alloc_pages_slowpath.constprop.0+0xf56/0xf70
> >> >   __alloc_pages+0x23b/0x2b0
> >> >   allocate_slab+0x406/0x630
> >> >   ___slab_alloc+0x4b1/0x7e0
> >> >   ? sr_probe+0x200/0x600
> >> >   ? lock_acquire+0xc4/0x2e0
> >> >   ? fs_reclaim_acquire+0x4d/0xe0
> >> >   ? lock_is_held_type+0xa7/0x120
> >> >   ? sr_probe+0x200/0x600
> >> >   ? __slab_alloc+0x67/0x90
> >> >   __slab_alloc+0x67/0x90
> >> >   ? sr_probe+0x200/0x600
> >> >   ? sr_probe+0x200/0x600
> >> >   kmem_cache_alloc_trace+0x259/0x270
> >> >   sr_probe+0x200/0x600
> >> >   ..
> >> >   bus_probe_device+0x9f/0xb0
> >> >   device_add+0x3d2/0x970
> >> >   ..
> >> >   __scsi_add_device+0xea/0x100
> >> >   ata_scsi_scan_host+0x97/0x1d0
> >> >   async_run_entry_fn+0x30/0x130
> >> >   process_one_work+0x2b0/0x5c0
> >> >   worker_thread+0x55/0x3c0
> >> >   ? process_one_work+0x5c0/0x5c0
> >> >   kthread+0x149/0x170
> >> >   ? set_kthread_struct+0x40/0x40
> >> >   ret_from_fork+0x22/0x30
> >> >  Mem-Info:
> >> >  ..
> >> > 
> >> > The above failure happened when calling kmalloc() to allocate buffer with
> >> > GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> >> > pages in there.
> >> >  sr_probe()
> >> >  --> get_capabilities()
> >> >  --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> >> > 
> >> > The DMA zone should be checked if it has managed pages, then try to 
> >> > create
> >> > dma-kmalloc.
> >> >
> >> 
> >> What is problem here?
> >> 
> >> The slab allocator requested buddy allocator with GFP_DMA,
> >> and then buddy allocator failed to allocate page in DMA zone because
> >> there was no page in DMA zone. and then the buddy allocator called 
> >> warn_alloc
> >> because it failed at allocating page.
> >> 
> >> Looking at warn, I don't understand what the problem is.
> > 
> > The problem is this is a generic issue on x86_64, and will be warned out
> > always on all x86_64 systems, but not on a certain machine or a certain
> > type of machine. If not fixed, we can always see it in kdump kernel. The
> > way things are, it doesn't casue system or device collapse even if
> > dma-kmalloc can't provide buffer or provide buffer from zone NORMAL.
> > 
> > 
> > I have got bug reports several times from different people, and we have
> > several bugs tracking this inside Redhat. I think nobody want to see
> > this appearing in customers' monitor w or w/o a note. If we have to
> > leave it with that, it's a little embrassing.
> > 
> > 
> >> 
> >> > ---
> >> >  mm/slab_common.c | 9 +
> >> >  1 file changed, 9 insertions(+)
> >> > 
> >> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> > index e5d080a93009..ae4ef0f8903a 100644
> >> > --- a/mm/slab_common.c
> >> > +++ b/mm/slab_common.c
> >> > @@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> >> >  {
> >> >  int i;
> >> >  enum kmalloc_cache_type type;
> >> > +#ifdef CONFIG_ZONE_DMA
> >> > +bool managed_dma;
> >> > +#endif
> >> >  
> >> >  /*
> >> >   * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
> >> > @@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t 
> >> > flags)
> >> >  slab_state = UP;
> >> >  
> >> >  #ifdef CONFIG_ZONE_DMA
> >> > +managed_dma = has_managed_dma();
> >> > +
> >> >  for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
> >> >  struct kmem_cache *s = 
> >> > kmalloc_caches[KMALLOC_NORMAL][i];
> >> >  
> >> >  if (s) {
> >> > +if (!managed_dma) {
> >> > +kmalloc_caches[KMALLOC_DMA][i] = 
> >> > kmalloc_caches[KMALLOC_NORMAL][i];
> 
> The right side could be just 's'?

Right, will see if we will take another way, will change it if keeping
this way.

> 
> >> > +continue;
> >> > +}
> >> 
> >> This code is copying normal kmalloc caches to DMA kmalloc caches.
> >> With this code, the kmalloc() with GFP_DMA will succeed even if allocated
> >> memory is not actually 

Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

2021-12-15 Thread Baoquan He
On 12/15/21 at 04:56pm, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
> > 
> > 
> > On 2021/12/10 14:55, Zhen Lei wrote:
> >> From: Chen Zhou 
> >>
> >> We will make the functions reserve_crashkernel() as generic, the
> >> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> >> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> >> So move xen_pv_domain() check and insert_resource() to setup_arch()
> >> to keep them in x86.
> >>
> >> Suggested-by: Mike Rapoport 
> >> Signed-off-by: Chen Zhou 
> >> Signed-off-by: Zhen Lei 
> >> Tested-by: John Donnelly 
> >> Tested-by: Dave Kleikamp 
> >> Acked-by: Baoquan He 
> >> ---
> >>  arch/x86/kernel/setup.c | 19 +++
> >>  1 file changed, 11 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index bb2a0973b98059e..7ae00716a208f82 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
> >>  
> >>crashk_low_res.start = low_base;
> >>crashk_low_res.end   = low_base + low_size - 1;
> >> -  insert_resource(_resource, _low_res);
> >>  #endif
> >>return 0;
> >>  }
> >> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
> >>high = true;
> >>}
> >>  
> >> -  if (xen_pv_domain()) {
> >> -  pr_info("Ignoring crashkernel for a Xen PV domain\n");
> >> -  return;
> >> -  }
> >> -
> >>/* 0 means: find the address automatically */
> >>if (!crash_base) {
> >>/*
> >> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
> >>  
> >>crashk_res.start = crash_base;
> >>crashk_res.end   = crash_base + crash_size - 1;
> >> -  insert_resource(_resource, _res);
> >>  }
> >>  #else
> >>  static void __init reserve_crashkernel(void)
> >> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
> >> * Reserve memory for crash kernel after SRAT is parsed so that it
> >> * won't consume hotpluggable memory.
> >> */
> >> -  reserve_crashkernel();
> > 
> > Hi Baoquan:
> >   How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> > empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> > when CONFIG_KEXEC_CORE is enabled before.
> 
> Hi Baoquan:
>   Did you miss this email? If no reply, I will keep it no change.

I checked this patch, and it's no update since I acked it. 

Moving reserve_crashkernel() into the CONFIG_KEXEC_CORE ifdeffery is
also fine to me.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

2021-12-15 Thread Leizhen (ThunderTown)



On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/10 14:55, Zhen Lei wrote:
>> From: Chen Zhou 
>>
>> We will make the functions reserve_crashkernel() as generic, the
>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>> x86, the same as insert_resource() in reserve_crashkernel[_low]().
>> So move xen_pv_domain() check and insert_resource() to setup_arch()
>> to keep them in x86.
>>
>> Suggested-by: Mike Rapoport 
>> Signed-off-by: Chen Zhou 
>> Signed-off-by: Zhen Lei 
>> Tested-by: John Donnelly 
>> Tested-by: Dave Kleikamp 
>> Acked-by: Baoquan He 
>> ---
>>  arch/x86/kernel/setup.c | 19 +++
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index bb2a0973b98059e..7ae00716a208f82 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>>  
>>  crashk_low_res.start = low_base;
>>  crashk_low_res.end   = low_base + low_size - 1;
>> -insert_resource(_resource, _low_res);
>>  #endif
>>  return 0;
>>  }
>> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
>>  high = true;
>>  }
>>  
>> -if (xen_pv_domain()) {
>> -pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> -return;
>> -}
>> -
>>  /* 0 means: find the address automatically */
>>  if (!crash_base) {
>>  /*
>> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>>  
>>  crashk_res.start = crash_base;
>>  crashk_res.end   = crash_base + crash_size - 1;
>> -insert_resource(_resource, _res);
>>  }
>>  #else
>>  static void __init reserve_crashkernel(void)
>> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
>>   * Reserve memory for crash kernel after SRAT is parsed so that it
>>   * won't consume hotpluggable memory.
>>   */
>> -reserve_crashkernel();
> 
> Hi Baoquan:
>   How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> when CONFIG_KEXEC_CORE is enabled before.

Hi Baoquan:
  Did you miss this email? If no reply, I will keep it no change.

> 
>> +if (xen_pv_domain())
>> +pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> +else {
>> +reserve_crashkernel();
>> +#ifdef CONFIG_KEXEC_CORE
>> +if (crashk_res.end > crashk_res.start)
>> +insert_resource(_resource, _res);
>> +if (crashk_low_res.end > crashk_low_res.start)
>> +insert_resource(_resource, _low_res);
>> +#endif
>> +}
>>  
>>  memblock_find_dma_reserve();
>>  
>>
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec