from:"Atsushi Kumagai"

RE: [PATCH 4.14 023/159] mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y

2018-01-16 Thread Atsushi Kumagai

>Hm, this fix means that the vmlinux symbol table and vmcoreinfo have
>different values for mem_section. That seems... odd. I had to patch
>makedumpfile to fix the case of an explicit vmlinux being passed on the
>command line (which I realized I don't need to do, but it should still
>work):

Looks good to me, I'll merge this into makedumpfile-1.6.4.

Thanks,
Atsushi Kumagai

>From 542a11a8f28b0f0a989abc3adff89da22f44c719 Mon Sep 17 00:00:00 2001
>Message-Id: 
><542a11a8f28b0f0a989abc3adff89da22f44c719.1515995400.git.osan...@fb.com>
>From: Omar Sandoval 
>Date: Sun, 14 Jan 2018 17:10:30 -0800
>Subject: [PATCH] Fix SPARSEMEM_EXTREME support on Linux v4.15 when passing
> vmlinux
>
>Since kernel commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at
>runtime for CONFIG_SPARSEMEM_EXTREME=y"), mem_section is a dynamically
>allocated array of pointers to mem_section instead of a static one
>(i.e., struct mem_section ** instead of struct mem_section * []). This
>adds an extra layer of indirection that breaks makedumpfile, which will
>end up with a bunch of bogus mem_maps.
>
>Since kernel commit a0b1280368d1 ("kdump: write correct address of
>mem_section into vmcoreinfo"), the mem_section symbol in vmcoreinfo
>contains the address of the actual struct mem_section * array instead of
>the address of the pointer in .bss, which gets rid of the extra
>indirection. However, makedumpfile still uses the debugging symbol from
>the vmlinux image. Fix this by allowing symbols from the vmcore to
>override symbols from the vmlinux image. As the comment in initial()
>says, "vmcoreinfo in /proc/vmcore is more reliable than -x/-i option".
>
>Signed-off-by: Omar Sandoval 
>---
> makedumpfile.h | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 57cf4d9..d68c798 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -274,8 +274,10 @@ do { \
> } while (0)
> #define READ_SYMBOL(str_symbol, symbol) \
> do { \
>-  if (SYMBOL(symbol) == NOT_FOUND_SYMBOL) { \
>-  SYMBOL(symbol) = 
>read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \
>+  unsigned long _tmp_symbol; \
>+  _tmp_symbol = read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \
>+  if (_tmp_symbol != NOT_FOUND_SYMBOL) { \
>+  SYMBOL(symbol) = _tmp_symbol; \
>   if (SYMBOL(symbol) == INVALID_SYMBOL_DATA) \
>   return FALSE; \
>   } \
>--
>2.9.5
>
>
>___
>kexec mailing list
>ke...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

RE: [PATCH 1/2] kexec: update VMCOREINFO for compound_order/dtor

2016-04-03 Thread Atsushi Kumagai

>On Tue, 1 Mar 2016 06:14:32 +0000 Atsushi Kumagai  
>wrote:
>
>> makedumpfile refers page.lru.next to get the order of compound pages
>> for page filtering. However, now the order is stored in page.compound_order,
>> hence VMCOREINFO should be updated to export the offset of
>> page.compound_order.
>>
>> The fact is, page.compound_order was introduced already in kernel 4.0,
>> but the offset of it was the same as page.lru.next until kernel 4.3,
>> so this was not actual problem.
>>
>> The above can be said also for page.lru.prev and page.compound_dtor,
>> it's necessary to detect hugetlbfs pages. Further, the content was
>> changed from direct address to the ID which means dtor.
>
>It's unclear which kernels need the patch and why.  I *think* that the
>patch is needed in 4.3.x, 4.4.x, 4.5.x and 4.6 in order to make
>makedumpfile work correctly.  Is that right?

The patch is necessary for 4.4.x, 4.5.x and 4.6.
4.3.x don't have the problem.

>And it appears that [patch 2/2] is needed in 4.0+?

[patch 2/2] is for 4.5.x and latter:

  $ git name-rev 1c290f642101e6
  1c290f642101e6 tags/v4.5-rc1~77^2~129
  $

>However in both cases I am uncertain - what are the end-user visible
>effects of these regressions?  Why can bugs remain in place for so long
>without having been observed?

The problem is that unnecessary hugepages wouldn't be removed from
a dump file in the older kernels. This means that extra disk space
would be consumed.
It's a problem, but not critical.

>Please make all these things clear when perparing changelogs for
>bugfixes: which kernel versions need fixing and why (ie: what are the
>end-user visible effects of the bug).

Sure, I will be careful about it.


Thanks,
Atsushi Kumagai

RE: [PATCH 0/2] Export new VMCOREINFO about compound page

2016-03-31 Thread Atsushi Kumagai

Hello Andrew,

>> This patch set is to follow up modifications of struct page for
>> makedumpfile which filters dump file.
>> It's necessary to filter unnecessary compound pages in newer kernel
>> as usual.
>>
>> Incidentally, [PATCH 1/2] was post in:
>>
>>   https://lkml.org/lkml/2016/1/27/92
>>
>> but it didn't get any response, I repost it here.
>>
>> Atsushi Kumagai (2):
>>   kexec: update VMCOREINFO for compound_order/dtor
>>   kexec: export OFFSET(page.compound_head) to check anonymous page
>>
>> kernel/kexec_core.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>>
>
>Acked-by: Dave Young 
>
>Thanks
>Dave

Could you pick these patches ?

  https://lkml.org/lkml/2016/3/1/25
  https://lkml.org/lkml/2016/3/1/26

Thanks,
Atsushi Kumagai

[PATCH 2/2] kexec: export OFFSET(page.compound_head) to find out compound tail page

2016-02-29 Thread Atsushi Kumagai

PageAnon() always look at head page to check PAGE_MAPPING_ANON and
tail page's page->mapping has just a poisoned data since commit
1c290f642101e6.

If makedumpfile checks page->mapping of an compound tail page to
distinguish anonymous page as usual, it must fail in newer kernel.
So it's necessary to export OFFSET(page.compound_head) to
avoid checking compound tail pages.

Signed-off-by: Atsushi Kumagai 
---
 kernel/kexec_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 7c0b61d..4726999 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1415,6 +1415,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, private);
VMCOREINFO_OFFSET(page, compound_dtor);
VMCOREINFO_OFFSET(page, compound_order);
+   VMCOREINFO_OFFSET(page, compound_head);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
-- 
2.7.2

[PATCH 1/2] kexec: update VMCOREINFO for compound_order/dtor

2016-02-29 Thread Atsushi Kumagai

makedumpfile refers page.lru.next to get the order of compound pages
for page filtering. However, now the order is stored in page.compound_order,
hence VMCOREINFO should be updated to export the offset of
page.compound_order.

The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3,
so this was not actual problem.

The above can be said also for page.lru.prev and page.compound_dtor,
it's necessary to detect hugetlbfs pages. Further, the content was
changed from direct address to the ID which means dtor.

Signed-off-by: Atsushi Kumagai 
---
 kernel/kexec_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8dc6591..7c0b61d 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1413,6 +1413,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, lru);
VMCOREINFO_OFFSET(page, _mapcount);
VMCOREINFO_OFFSET(page, private);
+   VMCOREINFO_OFFSET(page, compound_dtor);
+   VMCOREINFO_OFFSET(page, compound_order);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
@@ -1445,8 +1447,8 @@ static int __init crash_save_vmcoreinfo_init(void)
 #ifdef CONFIG_X86
VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
 #endif
-#ifdef CONFIG_HUGETLBFS
-   VMCOREINFO_SYMBOL(free_huge_page);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
 #endif
 
arch_crash_save_vmcoreinfo();
-- 
2.7.2

[PATCH 0/2] Export new VMCOREINFO about compound page

2016-02-29 Thread Atsushi Kumagai

Hello,

This patch set is to follow up modifications of struct page for 
makedumpfile which filters dump file.
It's necessary to filter unnecessary compound pages in newer kernel
as usual.

Incidentally, [PATCH 1/2] was post in:

  https://lkml.org/lkml/2016/1/27/92

but it didn't get any response, I repost it here.

Atsushi Kumagai (2):
  kexec: update VMCOREINFO for compound_order/dtor
  kexec: export OFFSET(page.compound_head) to check anonymous page

kernel/kexec_core.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


Thanks,
Atsushi Kumagai

[PATCH] kexec: Add compound_order/compound_dtor to VMCOREINFO

2016-01-26 Thread Atsushi Kumagai

makedumpfile refers to page.lru.next to get the order of compound pages
for page filtering. However, now the order is stored in page.compound_order,
hence VMCOREINFO should be updated to export the offset of page.compound_order.

The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3,
so this was not actual problem.

The above can be said also for page.lru.prev, it contained the address of
destructor for compound pages but it was moved to page.compound_dtor.
It's necessary to detect hugetlbfs pages. Further, the content of
page.compound_dtor was changed from direct address of destructor to the
ID of it.

Signed-off-by: Atsushi Kumagai 
---
 kernel/kexec_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 11b64a6..8c7a6e8 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1388,6 +1388,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, lru);
VMCOREINFO_OFFSET(page, _mapcount);
VMCOREINFO_OFFSET(page, private);
+   VMCOREINFO_OFFSET(page, compound_dtor);
+   VMCOREINFO_OFFSET(page, compound_order);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
@@ -1420,8 +1422,8 @@ static int __init crash_save_vmcoreinfo_init(void)
 #ifdef CONFIG_X86
VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
 #endif
-#ifdef CONFIG_HUGETLBFS
-   VMCOREINFO_SYMBOL(free_huge_page);
+#ifdef CONFIG_HUGETLB_PAGE
+   VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
 #endif
 
arch_crash_save_vmcoreinfo();
-- 
1.9.0

[PATCH] kexec: Export free_huge_page to VMCOREINFO

2014-07-11 Thread Atsushi Kumagai

PG_head_mask was added into VMCOREINFO to filter huge pages in
commit:b3acc56bfe1(kexec: save PG_head_mask in VMCOREINFO), but
makedumpfile still need another symbol to filter *hugetlbfs* pages.

If a user hope to filter user pages, makedumpfile tries to exclude
them by checking the condition whether the page is anonymous, but
hugetlbfs pages aren't anonymous while they also be user pages.

We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():

int PageHuge(struct page *page)
{
if (!PageCompound(page))
return 0;

page = compound_head(page);
return get_compound_page_dtor(page) == free_huge_page;
}

For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.

Signed-off-by: Atsushi Kumagai 
---
 include/linux/hugetlb.h | 1 +
 kernel/kexec.c  | 2 ++
 mm/hugetlb.c| 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 255cd5c..a23c096 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -80,6 +80,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
 bool isolate_huge_page(struct page *page, struct list_head *list);
 void putback_active_hugepage(struct page *page);
 bool is_hugepage_active(struct page *page);
+void free_huge_page(struct page *page);
 
 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
 pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 369f41a..23a088f 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1619,6 +1620,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 #endif
VMCOREINFO_NUMBER(PG_head_mask);
VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
+   VMCOREINFO_SYMBOL(free_huge_page);
 
arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2024bbd..d5437eb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -856,7 +856,7 @@ struct hstate *size_to_hstate(unsigned long size)
return NULL;
 }
 
-static void free_huge_page(struct page *page)
+void free_huge_page(struct page *page)
 {
/*
 * Can't pass hstate in here because it is called from the
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC PATCH 2/2] makedumpfile: get additional information from vmcore

2014-05-14 Thread Atsushi Kumagai

>>>>> Now we define MAX_PHYSMEM_BITS and SECTION_SIZE_BITS as
>>>>> macros. So if we deal with vmcores with different values
>>>>> of these two macros. We have to recompile makedumpfile.
>>>>
>>>> There are other macros which have architecture-specific values
>>>> (e.g. __PAGE_OFFSET), and some functions are specific to each
>>>> architecture (e.g. vaddr_to_paddr()), so we need recompilation
>>>> eventually.
>>>>
>>>> OTOH, we already don't need recompilation for the same architecture
>>>> since the values of such macros are defined for each kernel version
>>>> like below:
>>>>
>>>> #ifdef __x86_64__
>>>> ...
>>>> #define _MAX_PHYSMEM_BITS_ORIG  (40)
>>>> #define _MAX_PHYSMEM_BITS_2_6_26(44)
>>>> #define _MAX_PHYSMEM_BITS_2_6_31(46)
>>>>
>>>> So I don't think this patch is valuable.
>>>
>>> Hi Atsushi,
>>>
>>> For x86, it is not necessory. But for arm, different venders
>>> may define different SECTION_SIZE_BITS. for example:
>>>
>>>   1 arch/arm/mach-clps711x/include/mach/memory.h
>>> #define SECTION_SIZE_BITS 24
>>>   2 arch/arm/mach-exynos/include/mach/memory.h
>>> #define SECTION_SIZE_BITS 28
>>>   4 arch/arm/mach-hisi/include/mach/memory.h
>>> #define SECTION_SIZE_BITS 26
>>>   8 arch/arm/mach-sa1100/include/mach/memory.h
>>> #define SECTION_SIZE_BITS 27
>>>
>>> Perhaps we should find another way to let the userspace tools
>>> to get the architecture-specific values.
>>
>> I see, I think this description is better than the first one.
>>
>> Now, makedumpfile can't get an appropriate values of the two macros since the
>> values are variable even if the architecture and the kernel version are fixed
>> (at least for arm), and we can't solve this without *manual code fixing*, 
>> right?
>>
>> In practice, the current code expects that all arm machines adopt Exynos
>> processors, this is an problem definitely.
>>
>>   #ifdef __arm__
>>   #define KVBASE_MASK (0x)
>>   #define KVBASE  (SYMBOL(_stext) & ~KVBASE_MASK)
>>   #define _SECTION_SIZE_BITS  (28)
>>   #define _MAX_PHYSMEM_BITS   (32)
>>
>> I think it's better to fix the descriptions to get acceptability,
>> but this patch is necessary from the view point of makedumpfile.
>> So I recommend you to repost this patch set, then I'll accept it.
>>
>Ok, Thanks for you suggest. I will repost this patch. By now no one
>relpy my kernel patch related to this issue, named "[RFC PATCH 1/2]
>kdump: add sparse memory related values to vmcore". Didn't I cc
>the right person or something else?

You should CC Eric Biederman (ebied...@xmission.com) as the maintainer
of kexec.

The kernel side doesn't need that patch because they aren't in trouble
even without it, so we had to highlight the necessity from the user space
side. Now, it's clear, I hope it will be accepted.

>BTW, For patch "[PATCH] makedumpfile: ARM: get correct mem_map offset",
>Did I explain my idea clearly ? If not, I would like repost one with
>more details.

I need more explanations, I'll mention it in that thread.


Thanks
Atsushi Kumagai

>>
>> Thanks
>> Atsushi Kumagai
>>
>>>>
>>>>> This patch makes makedumpfile get these two values from
>>>>> vmcore info, if existing. It makes the makedumpfile more
>>>>> compatible to vmcores with different section size.
>>>>>
>>>>> Signed-off-by: Liu Hua 
>>>>> ---
>>>>> makedumpfile.c | 17 +
>>>>> makedumpfile.h |  2 ++
>>>>> 2 files changed, 19 insertions(+)
>>>>>
>>>>> diff --git a/makedumpfile.c b/makedumpfile.c
>>>>> index 6cf6e24..3cdf323 100644
>>>>> --- a/makedumpfile.c
>>>>> +++ b/makedumpfile.c
>>>>> @@ -2111,6 +2111,8 @@ read_vmcoreinfo(void)
>>>>>   READ_NUMBER("PG_slab", PG_slab);
>>>>>   READ_NUMBER("PG_buddy", PG_buddy);
>>>>>   READ_NUMBER("PG_hwpoison", PG_hwpoison);
>>>>> + READ_NUMBER("SECTION_SIZE_BITS", SECTION_SIZE_BITS);
>>>>> + READ_NUMBER("MAX_PHYSMEM_BITS", MAX_PHYSMEM_BITS);
>>>>>
>>>>>   READ_SRCFILE("pud_t",

RE: [RFC PATCH 2/2] makedumpfile: get additional information from vmcore

2014-05-14 Thread Atsushi Kumagai

>On 2014/5/13 14:21, Atsushi Kumagai wrote:
>> Hello Liu,
>>
>>> Now we define MAX_PHYSMEM_BITS and SECTION_SIZE_BITS as
>>> macros. So if we deal with vmcores with different values
>>> of these two macros. We have to recompile makedumpfile.
>>
>> There are other macros which have architecture-specific values
>> (e.g. __PAGE_OFFSET), and some functions are specific to each
>> architecture (e.g. vaddr_to_paddr()), so we need recompilation
>> eventually.
>>
>> OTOH, we already don't need recompilation for the same architecture
>> since the values of such macros are defined for each kernel version
>> like below:
>>
>> #ifdef __x86_64__
>> ...
>> #define _MAX_PHYSMEM_BITS_ORIG  (40)
>> #define _MAX_PHYSMEM_BITS_2_6_26(44)
>> #define _MAX_PHYSMEM_BITS_2_6_31(46)
>>
>> So I don't think this patch is valuable.
>
>Hi Atsushi,
>
>For x86, it is not necessory. But for arm, different venders
>may define different SECTION_SIZE_BITS. for example:
>
>   1 arch/arm/mach-clps711x/include/mach/memory.h
> #define SECTION_SIZE_BITS 24
>   2 arch/arm/mach-exynos/include/mach/memory.h
> #define SECTION_SIZE_BITS 28
>   4 arch/arm/mach-hisi/include/mach/memory.h
> #define SECTION_SIZE_BITS 26
>   8 arch/arm/mach-sa1100/include/mach/memory.h
> #define SECTION_SIZE_BITS 27
>
>Perhaps we should find another way to let the userspace tools
>to get the architecture-specific values.

I see, I think this description is better than the first one.

Now, makedumpfile can't get an appropriate values of the two macros since the
values are variable even if the architecture and the kernel version are fixed
(at least for arm), and we can't solve this without *manual code fixing*, right?

In practice, the current code expects that all arm machines adopt Exynos
processors, this is an problem definitely.

  #ifdef __arm__
  #define KVBASE_MASK (0x)
  #define KVBASE  (SYMBOL(_stext) & ~KVBASE_MASK)
  #define _SECTION_SIZE_BITS  (28)
  #define _MAX_PHYSMEM_BITS   (32)

I think it's better to fix the descriptions to get acceptability,
but this patch is necessary from the view point of makedumpfile.
So I recommend you to repost this patch set, then I'll accept it.


Thanks
Atsushi Kumagai

>>
>>> This patch makes makedumpfile get these two values from
>>> vmcore info, if existing. It makes the makedumpfile more
>>> compatible to vmcores with different section size.
>>>
>>> Signed-off-by: Liu Hua 
>>> ---
>>> makedumpfile.c | 17 +
>>> makedumpfile.h |  2 ++
>>> 2 files changed, 19 insertions(+)
>>>
>>> diff --git a/makedumpfile.c b/makedumpfile.c
>>> index 6cf6e24..3cdf323 100644
>>> --- a/makedumpfile.c
>>> +++ b/makedumpfile.c
>>> @@ -2111,6 +2111,8 @@ read_vmcoreinfo(void)
>>> READ_NUMBER("PG_slab", PG_slab);
>>> READ_NUMBER("PG_buddy", PG_buddy);
>>> READ_NUMBER("PG_hwpoison", PG_hwpoison);
>>> +   READ_NUMBER("SECTION_SIZE_BITS", SECTION_SIZE_BITS);
>>> +   READ_NUMBER("MAX_PHYSMEM_BITS", MAX_PHYSMEM_BITS);
>>>
>>> READ_SRCFILE("pud_t", pud_t);
>>>
>>> @@ -2998,6 +3000,18 @@ initialize_bitmap_memory(void)
>>> }
>>>
>>> int
>>> +calibrate_machdep_info(void)
>>> +{
>>> +   if (NUMBER(MAX_PHYSMEM_BITS) > 0)
>>> +   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
>>> +
>>> +   if (NUMBER(SECTION_SIZE_BITS) > 0)
>>> +   info->section_size_bits = NUMBER(SECTION_SIZE_BITS);
>>> +
>>> +   return TRUE;
>>> +}
>>> +
>>> +int
>>> initial(void)
>>> {
>>> off_t offset;
>>> @@ -3214,6 +3228,9 @@ out:
>>> if (debug_info && !get_machdep_info())
>>> return FALSE;
>>>
>>> +   if (debug_info && !calibrate_machdep_info())
>>> +   return FALSE;
>>> +
>>> if (is_xen_memory() && !get_dom0_mapnr())
>>> return FALSE;
>>>
>>> diff --git a/makedumpfile.h b/makedumpfile.h
>>> index eb03688..7acb23a 100644
>>> --- a/makedumpfile.h
>>> +++ b/makedumpfile.h
>>> @@ -1434,6 +1434,8 @@ struct number_table {
>>> longPG_hwpoison;
>>>
>>> longPAGE_BUDDY_MAPCOUNT_VALUE;
>>> +   longSECTION_SIZE_BITS;
>>> +   longMAX_PHYSMEM_BITS;
>>> };
>>>
>>> struct srcfile_table {
>>> --
>>> 1.9.0
>>
>> .
>>
>

RE: [RFC PATCH 2/2] makedumpfile: get additional information from vmcore

2014-05-13 Thread Atsushi Kumagai

Hello Liu,

>Now we define MAX_PHYSMEM_BITS and SECTION_SIZE_BITS as
>macros. So if we deal with vmcores with different values
>of these two macros. We have to recompile makedumpfile.

There are other macros which have architecture-specific values
(e.g. __PAGE_OFFSET), and some functions are specific to each
architecture (e.g. vaddr_to_paddr()), so we need recompilation
eventually.

OTOH, we already don't need recompilation for the same architecture
since the values of such macros are defined for each kernel version
like below:

#ifdef __x86_64__
...
#define _MAX_PHYSMEM_BITS_ORIG  (40)
#define _MAX_PHYSMEM_BITS_2_6_26(44)
#define _MAX_PHYSMEM_BITS_2_6_31(46)

So I don't think this patch is valuable.


Thanks
Atsushi Kumagai

>This patch makes makedumpfile get these two values from
>vmcore info, if existing. It makes the makedumpfile more
>compatible to vmcores with different section size.
>
>Signed-off-by: Liu Hua 
>---
> makedumpfile.c | 17 +
> makedumpfile.h |  2 ++
> 2 files changed, 19 insertions(+)
>
>diff --git a/makedumpfile.c b/makedumpfile.c
>index 6cf6e24..3cdf323 100644
>--- a/makedumpfile.c
>+++ b/makedumpfile.c
>@@ -2111,6 +2111,8 @@ read_vmcoreinfo(void)
>   READ_NUMBER("PG_slab", PG_slab);
>   READ_NUMBER("PG_buddy", PG_buddy);
>   READ_NUMBER("PG_hwpoison", PG_hwpoison);
>+  READ_NUMBER("SECTION_SIZE_BITS", SECTION_SIZE_BITS);
>+  READ_NUMBER("MAX_PHYSMEM_BITS", MAX_PHYSMEM_BITS);
>
>   READ_SRCFILE("pud_t", pud_t);
>
>@@ -2998,6 +3000,18 @@ initialize_bitmap_memory(void)
> }
>
> int
>+calibrate_machdep_info(void)
>+{
>+  if (NUMBER(MAX_PHYSMEM_BITS) > 0)
>+  info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
>+
>+  if (NUMBER(SECTION_SIZE_BITS) > 0)
>+  info->section_size_bits = NUMBER(SECTION_SIZE_BITS);
>+
>+  return TRUE;
>+}
>+
>+int
> initial(void)
> {
>   off_t offset;
>@@ -3214,6 +3228,9 @@ out:
>   if (debug_info && !get_machdep_info())
>   return FALSE;
>
>+  if (debug_info && !calibrate_machdep_info())
>+  return FALSE;
>+
>   if (is_xen_memory() && !get_dom0_mapnr())
>   return FALSE;
>
>diff --git a/makedumpfile.h b/makedumpfile.h
>index eb03688..7acb23a 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1434,6 +1434,8 @@ struct number_table {
>   longPG_hwpoison;
>
>   longPAGE_BUDDY_MAPCOUNT_VALUE;
>+  longSECTION_SIZE_BITS;
>+  longMAX_PHYSMEM_BITS;
> };
>
> struct srcfile_table {
>--
>1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai

On 2013/12/03 18:06:13, kexec  wrote:
> >> This is a suggestion from different point of view...
> >>
> >> In general, data on crash dump can be corrupted. Thus, order contained in 
> >> a page
> >> descriptor can also be corrupted. For example, if the corrupted value were 
> >> a huge
> >> number, wide range of pages after buddy page would be filtered falsely.
> >>
> >> So, actually we should sanity check data in crash dump before using them 
> >> for application
> >> level feature. I've picked up order contained in page descriptor, so there 
> >> would be other
> >> data used in makedumpfile that are not checked.
> > 
> > What you said is reasonable, but how will you do such sanity check ?
> > Certain standard values are necessary for sanity check, how will
> > you prepare such values ?
> > (Get them from kernel source and hard-code them in makedumpfile ?)
> > 
> >> Unlike diskdump, we no longer need to care about kernel/hardware level 
> >> data integrity
> >> outside of user-land, but we still care about data its own integrity.
> >>
> >> On the other hand, if we do it, we might face some difficulty, for 
> >> example, hardness of
> >> maintenance or performance bottleneck; it might be the reason why we don't 
> >> see sanity
> >> check in makedumpfile now.
> > 
> > There are many values which should be checked, e.g. page.flags, page._count,
> > page.mapping, list_head.next and so on.
> > If we introduce sanity check for them, the issues you mentioned will be 
> > appear
> > distinctly.
> > 
> > So I think makedumpfile has to trust crash dump in practice.
> > 
> 
> Yes, I don't mean such very drastic checking; I understand hardness because I 
> often
> handle/write this kind of code; I don't want to fight tremendously many 
> dependencies...
> 
> So we need to concentrate on things that can affect makedumpfile's behavior 
> significantly,
> e.g. infinite loop caused by broken linked list objects, buffer overrun 
> cauesd by large values
> from broken data, etc. We should be able to deal with them by carefully 
> handling
> dump data against makedumpfile's runtime data structure, e.g., buffer size.
> 
> Is it OK to consider this is a policy of makedumpfile for data corruption?

Right. 
Of course, if there is a very simple and effective check for a dump data, 
then we can take it.


Thanks
Atsushi Kumagai

> -- 
> Thanks.
> HATAYAMA, Daisuke
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-12-03 Thread Atsushi Kumagai

On 2013/11/29 13:57:21, kexec  wrote:
> (2013/11/29 13:23), Atsushi Kumagai wrote:
> > On 2013/11/29 12:24:45, kexec  wrote:
> >> (2013/11/29 12:02), Atsushi Kumagai wrote:
> >>> On 2013/11/28 16:50:21, kexec  wrote:
> >>>>>> ping, in case you overlooked this...
> >>>>>
> >>>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >>>>>
> >>>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> >>>>> as you said. In addition, I'm considering other way to address such 
> >>>>> case,
> >>>>> that is to bring the number of "overflowed pages" to the next cycle and
> >>>>> exclude them at the top of __exclude_unnecessary_pages() like below:
> >>>>>
> >>>>>   /*
> >>>>>* The pages which should be excluded still remain.
> >>>>>*/
> >>>>>   if (remainder >= 1) {
> >>>>>   int i;
> >>>>>   unsigned long tmp;
> >>>>>   for (i = 0; i < remainder; ++i) {
> >>>>>   if 
> >>>>> (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
> >>>>>   pfn_user++;
> >>>>>   tmp++;
> >>>>>   }
> >>>>>   }
> >>>>>   pfn += tmp;
> >>>>>   remainder -= tmp;
> >>>>>   mem_map += (tmp - 1) * SIZE(page);
> >>>>>   continue;
> >>>>>   }
> >>>>>
> >>>>> If this way works well, then aligning info->buf_size_cyclic will be
> >>>>> unnecessary.
> >>>>>
> >>>>
> >>>> I selected the current implementation of changing cyclic buffer size 
> >>>> becuase
> >>>> I thought it was simpler than carrying over remaining filtered pages to 
> >>>> next cycle
> >>>> in that there was no need to add extra code in filtering processing.
> >>>>
> >>>> I guess the reason why you think this is better now is how to detect 
> >>>> maximum order of
> >>>> huge page is hard in some way, right?
> >>>
> >>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or 
> >>> HPAGE_PMD_ORDER,
> >>> so I don't say it's hard. However, the carrying over method doesn't 
> >>> depend on
> >>> such kernel symbols, so I think it's robuster.
> >>>
> >>
> >> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part 
> >> of free page
> >> filtering in __exclude_unnecessary_pages(). Could you do that too?
> >
> > Sure, I'll modify it too.
> >
>
> This is a suggestion from different point of view...
>
> In general, data on crash dump can be corrupted. Thus, order contained in a 
> page
> descriptor can also be corrupted. For example, if the corrupted value were a 
> huge
> number, wide range of pages after buddy page would be filtered falsely.
>
> So, actually we should sanity check data in crash dump before using them for 
> application
> level feature. I've picked up order contained in page descriptor, so there 
> would be other
> data used in makedumpfile that are not checked.

What you said is reasonable, but how will you do such sanity check ?
Certain standard values are necessary for sanity check, how will
you prepare such values ?
(Get them from kernel source and hard-code them in makedumpfile ?)

> Unlike diskdump, we no longer need to care about kernel/hardware level data 
> integrity
> outside of user-land, but we still care about data its own integrity.
>
> On the other hand, if we do it, we might face some difficulty, for example, 
> hardness of
> maintenance or performance bottleneck; it might be the reason why we don't 
> see sanity
> check in makedumpfile now.

There are many values which should be checked, e.g. page.flags, page._count,
page.mapping, list_head.next and so on.
If we introduce sanity check for them, the issues you mentioned will be appear
distinctly.

So I think makedumpfile has to trust crash dump in practice.


Thanks
Atsushi Kumagai

> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai

On 2013/11/29 12:24:45, kexec  wrote:
> (2013/11/29 12:02), Atsushi Kumagai wrote:
> > On 2013/11/28 16:50:21, kexec  wrote:
> >>>> ping, in case you overlooked this...
> >>>
> >>> Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >>>
> >>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> >>> as you said. In addition, I'm considering other way to address such case,
> >>> that is to bring the number of "overflowed pages" to the next cycle and
> >>> exclude them at the top of __exclude_unnecessary_pages() like below:
> >>>
> >>>  /*
> >>>   * The pages which should be excluded still remain.
> >>>   */
> >>>  if (remainder >= 1) {
> >>>  int i;
> >>>  unsigned long tmp;
> >>>  for (i = 0; i < remainder; ++i) {
> >>>  if 
> >>> (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) {
> >>>  pfn_user++;
> >>>  tmp++;
> >>>  }
> >>>  }
> >>>  pfn += tmp;
> >>>  remainder -= tmp;
> >>>  mem_map += (tmp - 1) * SIZE(page);
> >>>  continue;
> >>>  }
> >>>
> >>> If this way works well, then aligning info->buf_size_cyclic will be
> >>> unnecessary.
> >>>
> >>
> >> I selected the current implementation of changing cyclic buffer size 
> >> becuase
> >> I thought it was simpler than carrying over remaining filtered pages to 
> >> next cycle
> >> in that there was no need to add extra code in filtering processing.
> >>
> >> I guess the reason why you think this is better now is how to detect 
> >> maximum order of
> >> huge page is hard in some way, right?
> > 
> > The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
> > so I don't say it's hard. However, the carrying over method doesn't depend 
> > on
> > such kernel symbols, so I think it's robuster.
> > 
> 
> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of 
> free page
> filtering in __exclude_unnecessary_pages(). Could you do that too?

Sure, I'll modify it too.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-28 Thread Atsushi Kumagai

On 2013/11/28 16:50:21, kexec  wrote:
> >> ping, in case you overlooked this...
> >
> > Sorry for the delayed response, I prioritize the release of v1.5.5 now.
> >
> > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed
> > as you said. In addition, I'm considering other way to address such case,
> > that is to bring the number of "overflowed pages" to the next cycle and
> > exclude them at the top of __exclude_unnecessary_pages() like below:
> >
> > /*
> >  * The pages which should be excluded still remain.
> >  */
> > if (remainder >= 1) {
> > int i;
> > unsigned long tmp;
> > for (i = 0; i < remainder; ++i) {
> > if (clear_bit_on_2nd_bitmap_for_kernel(pfn 
> > + i)) {
> > pfn_user++;
> > tmp++;
> > }
> > }
> > pfn += tmp;
> > remainder -= tmp;
> > mem_map += (tmp - 1) * SIZE(page);
> > continue;
> > }
> >
> > If this way works well, then aligning info->buf_size_cyclic will be
> > unnecessary.
> >
>
> I selected the current implementation of changing cyclic buffer size becuase
> I thought it was simpler than carrying over remaining filtered pages to next 
> cycle
> in that there was no need to add extra code in filtering processing.
>
> I guess the reason why you think this is better now is how to detect maximum 
> order of
> huge page is hard in some way, right?

The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER,
so I don't say it's hard. However, the carrying over method doesn't depend on
such kernel symbols, so I think it's robuster.


Thanks
Atsushi Kumagai

> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-27 Thread Atsushi Kumagai

On 2013/11/22 16:18:20, kexec  wrote:
> (2013/11/07 9:54), HATAYAMA Daisuke wrote:
> > (2013/11/06 11:21), Atsushi Kumagai wrote:
> >> (2013/11/06 5:27), Vivek Goyal wrote:
> >>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
> >>>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
> >>>> file.
> >>>>
> >>>> This patch requires the kernel patch to export necessary data structures 
> >>>> into
> >>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
> >>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
> >>>>
> >>>> This patch introduce two new dump levels 32 and 64 to exclude all unused 
> >>>> and
> >>>> active hugepages. The level to exclude all unnecessary pages will be 127 
> >>>> now.
> >>>
> >>> Interesting. Why hugepages should be treated any differentely than normal
> >>> pages?
> >>>
> >>> If user asked to filter out free page, then it should be filtered and
> >>> it should not matter whether it is a huge page or not?
> >>
> >> I'm making a RFC patch of hugepages filtering based on such policy.
> >>
> >> I attach the prototype version.
> >> It's able to filter out also THPs, and suitable for cyclic processing
> >> because it depends on mem_map and looking up it can be divided into
> >> cycles. This is the same idea as page_is_buddy().
> >>
> >> So I think it's better.
> >>
> >
> >> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map,
> >>&& !isAnon(mapping)) {
> >>if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >>pfn_cache_private++;
> >> +/*
> >> + * NOTE: If THP for cache is introduced, the check for
> >> + *   compound pages is needed here.
> >> + */
> >>}
> >>/*
> >> * Exclude the data page of the user process.
> >> */
> >> -else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
> >> -&& isAnon(mapping)) {
> >> -if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >> -pfn_user++;
> >> +else if (info->dump_level & DL_EXCLUDE_USER_DATA) {
> >> +/*
> >> + * Exclude the anonnymous pages as user pages.
> >> + */
> >> +if (isAnon(mapping)) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
> >> +pfn_user++;
> >> +
> >> +/*
> >> + * Check the compound page
> >> + */
> >> +if (page_is_hugepage(flags) && compound_order > 0) {
> >> +int i, nr_pages = 1 << compound_order;
> >> +
> >> +for (i = 1; i < nr_pages; ++i) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> >> +pfn_user++;
> >> +}
> >> +pfn += nr_pages - 2;
> >> +mem_map += (nr_pages - 1) * SIZE(page);
> >> +}
> >> +}
> >> +/*
> >> + * Exclude the hugetlbfs pages as user pages.
> >> + */
> >> +else if (hugetlb_dtor == SYMBOL(free_huge_page)) {
> >> +int i, nr_pages = 1 << compound_order;
> >> +
> >> +for (i = 0; i < nr_pages; ++i) {
> >> +if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> >> +pfn_user++;
> >> +}
> >> +pfn += nr_pages - 1;
> >> +mem_map += (nr_pages - 1) * SIZE(page);
> >> +}
> >>}
> >>/*
> >> * Exclude the hwpoison page.
> >
> > I'm concerned about the case that filtering is not performed to part of 
> > mem_map
> > entries not belonging to the current cyclic range.
> >
> > If maximum value of compound_order is larger than maximum value of
> > CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtain

Re: /proc/vmcore mmap() failure issue

2013-11-25 Thread Atsushi Kumagai

On 2013/11/20 15:44:45, kexec  wrote:
> (2013/11/20 14:27), Atsushi Kumagai wrote:
> > On 2013/11/19 18:56:21, kexec  wrote:
> >> (2013/11/18 9:51), Atsushi Kumagai wrote:
> >>> (2013/11/15 23:26), Vivek Goyal wrote:
> >>>> On Fri, Nov 15, 2013 at 06:41:52PM +0900, HATAYAMA Daisuke wrote:
> >>>>
> >>>> [..]
> >>>>>> Given the fact that hpa does not like fixing it in kernel. We are
> >>>>>> left with option of fixing it in following places.
> >>>>>>
> >>>>>> - Drop partial pages in kexec-tools
> >>>>>> - Drop partial pages in makeudmpfile.
> >>>>>> - Read partial pages using read() interface in makedumpfile
> >>>>>> - Modify /proc/vmcore to copy partial pages in second kernel's memory.
> >>>>>>
> >>>>>> It is not clear to me that partial pages are really useful.  So I
> >>>>>> want to avoid modifying /proc/vmcore to deal with partial pages and
> >>>>>> increase complexity.
> >>>>>>
> >>>>>> So fixing makedumpfile (either option2 or option 3) seems least
> >>>>>> risky to me. In fact I would say let us keep it simple and truncate
> >>>>>> partial pages in makedumpfile to keep it simple. And look at option
> >>>>>> 3 once we have a strong use case for partial pages.
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>
> >>>>> As you say, it's not clear that partial pages are really useful, but
> >>>>> on the other hand, it seems to me not clear that they are really 
> >>>>> useless.
> >>>>> I think we should get them as long as we have access to them.
> >>>>>
> >>>>> It seems best to me the option 3). Switching between read and mmap
> >>>>> would be not so complex and also it's by far flexible in
> >>>>> makedumpfile than in kernel.
> >>>>
> >>>> Ok, I am fine with option 3. It is more complicated option but safe
> >>>> option.
> >>>
> >>> It sounds reasonable also to me.
> >>>
> >>>> Is there any chance that you could look into fixing this. I have no
> >>>> experience writing code for makedumpfile.
> >>>
> >>> I'll send a patch to fix this soon.
> >>>
> >>
> >> Thanks.
> >>
> >> BTW, now the following patch has been applied on top of makedumpfile in 
> >> kexec-tools package on fedora in order to avoid the issue.
> >>
> >> https://lists.fedoraproject.org/pipermail/kexec/2013-November/000254.html
> >>
> >> I remember prototype version of mmap patch implemented a kind of --no-mmap 
> >> option and we could use it to disable mmap() use and use read() instead, I 
> >> think which is useful when we face this kind of issue.
> > 
> > How about this fail back structure instead of such an extra option ?
> > 
> 
> I think this logic is useful and should be merged together in this fix.
> 
> However, I still think a kind of --no-mmap option is needed. There could 
> happen
> worse case due to mmap() in the future on some system, of course, I don't know
> what the system actually is, but at least it must be behaving differently from
> typical systems... Then, option is more flexible than patching.
> 
> It would also be useful for debugging use. read() is simpler than mmap(), and
> read() is basic in the sense that initially makedumpfile didn't use mmap().
> There might be a situation where we want to avoid using mmap(); for example,
> when makedumpfile works badly and it looks like caused by mmap() code in 
> kernel
> code; Then, we would want to see if makedumpfile works well by disabling 
> mmap(), 

Thanks for your explanation.
Additionally, the option to disable mmap() manually will help my test,
so I should introduce the option into upstream. 


Thanks
Atsushi Kumagai

> -- 
> Thanks.
> HATAYAMA, Daisuke
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/vmcore mmap() failure issue

2013-11-25 Thread Atsushi Kumagai

On 2013/11/25 23:42:31, kexec  wrote:
> On Mon, Nov 25, 2013 at 06:01:37PM +0900, HATAYAMA Daisuke wrote:
> 
> [..]
> > > I agree to avoid this issue by fixing makedumpfile as workaround while to
> > > fix kernel is so tough and risky. However, it sounds strange to me to fix
> > > userspace side elaborately for such definite kernel issue whose cause is
> > > known, so we should fix the kernel itself.
> > > 
> > 
> > > Otherwise, will you continue to add specific fixes into user tools to
> > > address kernel issues like this case ?
> > > 
> > 
> > makedumpfile supports a wide range of kernel versions and needs to satisfy
> > backward compatibility. mmap() on /proc/vmcore might be backported to some 
> > of
> > the old versions on some distributions if necessary. Then, it's hard to fix
> > each old kernel at each back port. The method that can be applied to all the
> > kernels in general, is necessary.
> > 
> > Also, looking at ia64 case where there's boot loader data on partial pages,
> > there could be other environments where partial pages contain other 
> > important
> > data other components have. So, the issue depends not only on kernels but 
> > also
> > other components such as boot loader and firmwares that can put data on
> > partial pages. We need to get there as long as there's important data there
> > and we have access to there.
> 
> Hi Atsushi, Hatayama,
> 
> So even if we fix the mmap() issue in kernel, looks like it will be a
> good idea to ship the fix in makedumpfile as there have been a kernel
> release where mmap() will cause issues.

OK, I'll make a patch set for makedumpfile to address issues about mmap():

  1. Fix the partial page issue
  2. Introduce general fall back structure (I already posted)
 http://lists.infradead.org/pipermail/kexec/2013-November/010199.html
  3. Add --non-mmap option


Thanks
Atsushi Kumagai

> Having said that, I think we need to fix it in kernel also. I was not sure
> that what's the right fix. Should we truncate partial pages or should
> we just copy partial pages from old memory to new kernel's memory and fill
> partial page with zeros. And that's why I was hoping that makedumpfile
> can fill the gap.
> 
> Copying partial pages to new memory seems like a safer approach. So may
> be we can take a fix in makeudmpfile and another in kernel.
> 
> Hatayama, I know that in the past your initial mmap() patches were copying
> partial pages to new kernel's memory. Would you like to resurrect that
> patch again?
> 
> Thanks
> Vivek
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/vmcore mmap() failure issue

2013-11-25 Thread Atsushi Kumagai

On 2013/11/22 1:53:14, kexec  wrote:
> On Thu, Nov 21, 2013 at 05:31:46PM +0900, HATAYAMA Daisuke wrote:
> 
> [..]
> > > So I think the patch I sent is enough, the policy will be simpler as
> > > "Don't use mmap() for buggy kernels".
> > > 
> > > [PATCH] Fall back to read() when mmap() fails.
> > > http://lists.infradead.org/pipermail/kexec/2013-November/010199.html
> > > 
> > 
> > I think logic becomes not so complex. For example, if input vmcore
> > format is ELF, then:
> > 
> > o in update_mmap_range():
> >   - first calculate a range of the corresponding PT_LOAD entry truncated 
> > with
> > PAGE_SIZE.
> >   - Then, truncate range of mmap() by the truncated range of the 
> > corresponding
> > PT_LOAD entry, i.e., exlucde partial pages from mmap() target range.
> >   - Then determine offsets of two partial pages; the number of partial pages
> > are always at most two. The offsets can easily be calculated from the
> > original range of the corresponding PT_LOAD entry
> > 
> > o in read_from_vmcore(), if a given offset belongs to either of two partial
> >   pages, then go to read() path; if not, go to mmap() path.
> 
> I agree that we should do mmap() on all non-partial pages and do read()
> on all partial pages. Otherwise we lose the benefit of faster speed of
> mmap().

I agree to avoid this issue by fixing makedumpfile as workaround while to
fix kernel is so tough and risky. However, it sounds strange to me to fix
userspace side elaborately for such definite kernel issue whose cause is
known, so we should fix the kernel itself.

Otherwise, will you continue to add specific fixes into user tools to
address kernel issues like this case ?


Thanks
Atsushi Kumagai

> Thanks
> Vivek
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/vmcore mmap() failure issue

2013-11-25 Thread Atsushi Kumagai

Hello WANG,

On 2013/11/21 16:15:22, kexec  wrote:
> > How about this fail back structure instead of such an extra option ?
> > 
> > Thanks
> > Atsushi Kumagai
> > 
> > From: Atsushi Kumagai 
> > Date: Wed, 20 Nov 2013 14:10:19 +0900
> > Subject: [PATCH] Fall back to read() when mmap() fails.
> > 
> > Signed-off-by: Atsushi Kumagai 
> > ---
> >  makedumpfile.c | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index ca03440..f583602 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -324,7 +324,15 @@ read_from_vmcore(off_t offset, void *bufptr, unsigned 
> > long size)
> > if (!read_with_mmap(offset, bufptr, size)) {
> > ERRMSG("Can't read the dump memory(%s) with mmap().\n",
> >info->name_memory);
> > -   return FALSE;
> > +
> > +   ERRMSG("This kernel might have some problems about 
> > mmap().\n");
> > +   ERRMSG("read() will be used instead of mmap() from 
> > now.\n");
> > +
> > +   /*
> > +* Fall back to read().
> > +*/
> > +   info->flag_usemmap = FALSE;
> > +   read_from_vmcore(offset, bufptr, size);
> 
> Hi, Atsushi
> 
> I've got such a workstation too. And I confirm this patch works for me.

Thanks for your testing !

> However, I have a question:
> Why not switch to mmap() back after read()?

I made this patch as a general safety net, not only for the partial page
issue.
When facing unknown issues related mmap(), the kernel may have some bugs
and mmap() can fail for every pages. In the worst case, most all mmap()
will fail and try read() with error messages after every fail, but this
patch will prevent the chattering of the switch and so many error messages.


Thanks
Atsushi Kumagai

> Thanks
> WANG Chao
> 
> > }
> > } else {
> > if (lseek(info->fd_memory, offset, SEEK_SET) == failed) {
> > -- 
> > 1.8.0.2
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/vmcore mmap() failure issue

2013-11-19 Thread Atsushi Kumagai

On 2013/11/18 22:56:10, kexec  wrote:
> On Mon, Nov 18, 2013 at 12:51:39AM +0000, Atsushi Kumagai wrote:
> 
> [..]
> > > Is there any chance that you could look into fixing this. I have no 
> > > experience writing code for makedumpfile.
> > 
> > I'll send a patch to fix this soon.
> 
> Thanks Atsushi.
> 
> Vivek

Vivek, could you test this patch ?

Thanks
Atsushi Kumagai


From: Atsushi Kumagai 
Date: Wed, 20 Nov 2013 10:05:03 +0900
Subject: [PATCH] Disable mmap() for reading fractional pages.

Since mmap() was introduced on /proc/vmcore, it fails
for fractional pages which don't start or end at page boundary
due to kernel issue.
This patch disables mmap() temporarily for fractional pages
to avoid this issue, so mmap() will be used only for aligned pages.

Signed-off-by: Atsushi Kumagai 
---
 makedumpfile.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 3746cf6..ca03440 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -368,6 +368,7 @@ readpage_elf(unsigned long long paddr, void *bufptr)
off_t offset1, offset2;
size_t size1, size2;
unsigned long long phys_start, phys_end, frac_head = 0;
+   int original_usemmap = info->flag_usemmap;
 
offset1 = paddr_to_offset(paddr);
offset2 = paddr_to_offset(paddr + info->page_size);
@@ -392,6 +393,7 @@ readpage_elf(unsigned long long paddr, void *bufptr)
offset1 = paddr_to_offset(phys_start);
frac_head = phys_start - paddr;
memset(bufptr, 0, frac_head);
+   info->flag_usemmap = FALSE;
}
 
/*
@@ -402,6 +404,7 @@ readpage_elf(unsigned long long paddr, void *bufptr)
phys_end = page_head_to_phys_end(paddr);
offset2 = paddr_to_offset(phys_end);
memset(bufptr + (phys_end - paddr), 0, info->page_size - 
(phys_end - paddr));
+   info->flag_usemmap = FALSE;
}
 
/*
@@ -420,7 +423,7 @@ readpage_elf(unsigned long long paddr, void *bufptr)
if(!read_from_vmcore(offset1, bufptr + frac_head, size1)) {
ERRMSG("Can't read the dump memory(%s).\n",
   info->name_memory);
-   return FALSE;
+   goto error;
}
 
if (size1 + frac_head != info->page_size) {
@@ -429,11 +432,16 @@ readpage_elf(unsigned long long paddr, void *bufptr)
if(!read_from_vmcore(offset2, bufptr + frac_head + size1, 
size2)) {
ERRMSG("Can't read the dump memory(%s).\n",
   info->name_memory);
-   return FALSE;
+   goto error;
}
}
 
+   info->flag_usemmap = original_usemmap;
return TRUE;
+
+error:
+   info->flag_usemmap = original_usemmap;
+   return FALSE;
 }
 
 static int
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/vmcore mmap() failure issue

2013-11-19 Thread Atsushi Kumagai

On 2013/11/19 18:56:21, kexec  wrote:
> (2013/11/18 9:51), Atsushi Kumagai wrote:
> > (2013/11/15 23:26), Vivek Goyal wrote:
> >> On Fri, Nov 15, 2013 at 06:41:52PM +0900, HATAYAMA Daisuke wrote:
> >>
> >> [..]
> >>>> Given the fact that hpa does not like fixing it in kernel. We are 
> >>>> left with option of fixing it in following places.
> >>>>
> >>>> - Drop partial pages in kexec-tools
> >>>> - Drop partial pages in makeudmpfile.
> >>>> - Read partial pages using read() interface in makedumpfile
> >>>> - Modify /proc/vmcore to copy partial pages in second kernel's memory.
> >>>>
> >>>> It is not clear to me that partial pages are really useful.  So I 
> >>>> want to avoid modifying /proc/vmcore to deal with partial pages and 
> >>>> increase complexity.
> >>>>
> >>>> So fixing makedumpfile (either option2 or option 3) seems least 
> >>>> risky to me. In fact I would say let us keep it simple and truncate 
> >>>> partial pages in makedumpfile to keep it simple. And look at option 
> >>>> 3 once we have a strong use case for partial pages.
> >>>>
> >>>> What do you think?
> >>>>
> >>>
> >>> As you say, it's not clear that partial pages are really useful, but 
> >>> on the other hand, it seems to me not clear that they are really useless.
> >>> I think we should get them as long as we have access to them.
> >>>
> >>> It seems best to me the option 3). Switching between read and mmap 
> >>> would be not so complex and also it's by far flexible in 
> >>> makedumpfile than in kernel.
> >>
> >> Ok, I am fine with option 3. It is more complicated option but safe 
> >> option.
> > 
> > It sounds reasonable also to me.
> > 
> >> Is there any chance that you could look into fixing this. I have no 
> >> experience writing code for makedumpfile.
> > 
> > I'll send a patch to fix this soon.
> > 
> 
> Thanks.
> 
> BTW, now the following patch has been applied on top of makedumpfile in 
> kexec-tools package on fedora in order to avoid the issue.
> 
> https://lists.fedoraproject.org/pipermail/kexec/2013-November/000254.html
> 
> I remember prototype version of mmap patch implemented a kind of --no-mmap 
> option and we could use it to disable mmap() use and use read() instead, I 
> think which is useful when we face this kind of issue.

How about this fail back structure instead of such an extra option ?

Thanks
Atsushi Kumagai

From: Atsushi Kumagai 
Date: Wed, 20 Nov 2013 14:10:19 +0900
Subject: [PATCH] Fall back to read() when mmap() fails.

Signed-off-by: Atsushi Kumagai 
---
 makedumpfile.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index ca03440..f583602 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -324,7 +324,15 @@ read_from_vmcore(off_t offset, void *bufptr, unsigned long 
size)
if (!read_with_mmap(offset, bufptr, size)) {
ERRMSG("Can't read the dump memory(%s) with mmap().\n",
   info->name_memory);
-   return FALSE;
+
+   ERRMSG("This kernel might have some problems about 
mmap().\n");
+   ERRMSG("read() will be used instead of mmap() from 
now.\n");
+
+   /*
+* Fall back to read().
+*/
+   info->flag_usemmap = FALSE;
+   read_from_vmcore(offset, bufptr, size);
}
} else {
if (lseek(info->fd_memory, offset, SEEK_SET) == failed) {
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-07 Thread Atsushi Kumagai

Hello Jingbai,

(2013/11/07 17:58), Jingbai Ma wrote:
> On 11/06/2013 10:23 PM, Vivek Goyal wrote:
>> On Wed, Nov 06, 2013 at 02:21:39AM +, Atsushi Kumagai wrote:
>>> (2013/11/06 5:27), Vivek Goyal wrote:
>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump 
>>>>> file.
>>>>>
>>>>> This patch requires the kernel patch to export necessary data structures 
>>>>> into
>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>>>>
>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused 
>>>>> and
>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 
>>>>> now.
>>>>
>>>> Interesting. Why hugepages should be treated any differentely than normal
>>>> pages?
>>>>
>>>> If user asked to filter out free page, then it should be filtered and
>>>> it should not matter whether it is a huge page or not?
>>>
>>> I'm making a RFC patch of hugepages filtering based on such policy.
>>>
>>> I attach the prototype version.
>>> It's able to filter out also THPs, and suitable for cyclic processing
>>> because it depends on mem_map and looking up it can be divided into
>>> cycles. This is the same idea as page_is_buddy().
>>>
>>> So I think it's better.
>>
>> Agreed. Being able to treat hugepages in same manner as other pages
>> sounds good.
>>
>> Jingbai, looks good to you?
>
> It looks good to me.
>
> My only concern is by this way, we only can exclude all hugepage together, 
> but can't exclude the free hugepages only. I'm not sure if user need to dump 
> out the activated hugepage only.
>
> Kumagai-san, please correct me, if I'm wrong.

Yes, my patch treats all allocated hugetlbfs pages as user pages,
doesn't distinguish whether the pages are actually used or not.
I made so because I guess it's enough for almost all users.

We can introduce new dump level after it's needed actually,
but I don't think now is the time. To introduce it without
demand will make this tool just more complex.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump

2013-11-05 Thread Atsushi Kumagai

(2013/11/06 5:27), Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote:
>> This patch set intend to exclude unnecessary hugepages from vmcore dump file.
>>
>> This patch requires the kernel patch to export necessary data structures into
>> vmcore: "kexec: export hugepage data structure into vmcoreinfo"
>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html
>>
>> This patch introduce two new dump levels 32 and 64 to exclude all unused and
>> active hugepages. The level to exclude all unnecessary pages will be 127 now.
>
> Interesting. Why hugepages should be treated any differentely than normal
> pages?
>
> If user asked to filter out free page, then it should be filtered and
> it should not matter whether it is a huge page or not?

I'm making a RFC patch of hugepages filtering based on such policy.

I attach the prototype version.
It's able to filter out also THPs, and suitable for cyclic processing
because it depends on mem_map and looking up it can be divided into
cycles. This is the same idea as page_is_buddy().

So I think it's better.

-- 
Thanks
Atsushi Kumagai


From: Atsushi Kumagai 
Date: Wed, 6 Nov 2013 10:10:43 +0900
Subject: [PATCH] [RFC] Exclude hugepages.

Signed-off-by: Atsushi Kumagai 
---
   makedumpfile.c | 122 
++---
   makedumpfile.h |   8 
   2 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 428c53e..75b7123 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -63,6 +63,7 @@ do { \
   
   static void check_cyclic_buffer_overrun(void);
   static void setup_page_is_buddy(void);
+static void setup_page_is_hugepage(void);
   
   void
   initialize_tables(void)
@@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) {
   }
   
   static int
+page_is_hugepage(unsigned long flags) {
+   if (NUMBER(PG_head) != NOT_FOUND_NUMBER) {
+   return isHead(flags);
+   } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) {
+   return isTail(flags);
+   }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) {
+   return isCompound(flags);
+   }
+   return 0;
+}
+
+static int
   is_mapped_with_mmap(off_t offset) {
   
if (info->flag_usemmap
@@ -1107,6 +1120,8 @@ get_symbol_info(void)
SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn,
"node_remap_start_pfn");
   
+   SYMBOL_INIT(free_huge_page, "free_huge_page");
+
return TRUE;
   }
   
@@ -1214,11 +1229,19 @@ get_structure_info(void)
   
ENUM_NUMBER_INIT(PG_lru, "PG_lru");
ENUM_NUMBER_INIT(PG_private, "PG_private");
+   ENUM_NUMBER_INIT(PG_head, "PG_head");
+   ENUM_NUMBER_INIT(PG_tail, "PG_tail");
+   ENUM_NUMBER_INIT(PG_compound, "PG_compound");
ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
ENUM_NUMBER_INIT(PG_buddy, "PG_buddy");
ENUM_NUMBER_INIT(PG_slab, "PG_slab");
ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");
   
+   if (NUMBER(PG_head) == NOT_FOUND_NUMBER &&
+   NUMBER(PG_compound) == NOT_FOUND_NUMBER)
+   /* Pre-2.6.26 kernels did not have pageflags */
+   NUMBER(PG_compound) = PG_compound_ORIGINAL;
+
ENUM_TYPE_SIZE_INIT(pageflags, "pageflags");
   
TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");
@@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void)
WRITE_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
WRITE_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
WRITE_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
+   WRITE_SYMBOL("free_huge_page", free_huge_page);
   
/*
 * write the structure size of 1st kernel
@@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void)
   
WRITE_NUMBER("PG_lru", PG_lru);
WRITE_NUMBER("PG_private", PG_private);
+   WRITE_NUMBER("PG_head", PG_head);
+   WRITE_NUMBER("PG_tail", PG_tail);
+   WRITE_NUMBER("PG_compound", PG_compound);
WRITE_NUMBER("PG_swapcache", PG_swapcache);
WRITE_NUMBER("PG_buddy", PG_buddy);
WRITE_NUMBER("PG_slab", PG_slab);
@@ -1932,6 +1959,7 @@ read_vmcoreinfo(void)
READ_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr);
READ_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr);
READ_SYMBOL("node_remap_start_pfn", node_remap_start_pfn);
+   READ_SYMBOL("free_huge_page", free_huge_page);
   
READ_STRUCTURE_SIZE("page", page);
READ_STRUCTURE_SIZE("mem_section&quo

Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore

2013-06-03 Thread Atsushi Kumagai

Hello Maxim,

On Thu, 30 May 2013 14:30:01 +0400
Maxim Uvarov  wrote:

> 2013/5/30 Zhang Yanfei 
> 
> > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > >
> > >
> > >
> > > 2013/5/27 HATAYAMA Daisuke  > d.hatay...@jp.fujitsu.com>>
> > >
> > > (2013/05/24 18:02), Maxim Uvarov wrote:
> > >
> > >
> > >
> > >
> > > 2013/5/24 Andrew Morton  > a...@linux-foundation.org> <mailto:akpm@linux-foundation.__org  > a...@linux-foundation.org>>>
> > >
> > >
> > > On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > d.hatay...@jp.fujitsu.com <mailto:d.hatay...@jp.fujitsu.com>  > d.hatayama@jp.fujitsu.__com <mailto:d.hatay...@jp.fujitsu.com>>> wrote:
> > >
> > >  > This patch introduces mmap_vmcore().
> > >  >
> > >  > Don't permit writable nor executable mapping even with
> > mprotect()
> > >  > because this mmap() is aimed at reading crash dump memory.
> > >  > Non-writable mapping is also requirement of
> > remap_pfn_range() when
> > >  > mapping linear pages on non-consecutive physical pages;
> > see
> > >  > is_cow_mapping().
> > >  >
> > >  > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
> > and by
> > >  > remap_vmalloc_range_pertial at the same time for a single
> > >  > vma. do_munmap() can correctly clean partially remapped
> > vma with two
> > >  > functions in abnormal case. See zap_pte_range(),
> > vm_normal_page() and
> > >  > their comments for details.
> > >  >
> > >  > On x86-32 PAE kernels, mmap() supports at most 16TB
> > memory only. This
> > >  > limitation comes from the fact that the third argument of
> > >  > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > unsigned long.
> > >
> > > More reviewing and testing, please.
> > >
> > >
> > > Do you have git pull for both kernel and userland changes? I
> > would like to do some more testing on my machines.
> > >
> > > Maxim.
> > >
> > >
> > > Thanks! That's very helpful.
> > >
> > > --
> > > Thanks.
> > > HATAYAMA, Daisuke
> > >
> > > Any update for this? Where can I checkout all sources?
> >
> > This series is now in Andrew Morton's -mm tree.
> >
> > Ok, and what about makedumpfile changes? Is it possible to fetch them from
> somewhere?

You can fetch them from here, "mmap" branch is the change:

  git://git.code.sf.net/p/makedumpfile/code

And they will be merged into v1.5.4.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux doesn't create /proc/vmcore

2013-05-20 Thread Atsushi Kumagai

Hello matt,

> CONFIG_PROC_VMCORE=y

Is it specified for the capture kernel ?
By any chance, did you run kdump-config on the system kernel,
not on the capture kernel ?

> DESCRIPTION
>   kdump-config manages the kdump feature of the  Linux  kernel.   When  a
>   kdump  enabled  kernel panics, it immediately boots into a clean kernel
>   called the kdump kernel.  The memory image of the panicked kernel  will
>   be  presented  in  /proc/vmcore  while  the  kdump  kernel (or "capture
>   kernel") is running.

/proc/vmcore doesn't be created on the system kernel, this tool should be
run on the capture kernel.

And this document may help you:

  Documentation/kdump/kdump.txt


Thanks
Atsushi Kumagai

> 
> my error on the kdump:
> kdump-config savecore
> running makedumpfile -c -d 31 /proc/vmcore
> /var/crash/201305180259/dump-incomplete.
> open_dump_memory: Can't open the dump memory(/proc/vmcore). No such
> file or directory
> 
> makedumpfile Failed.
> kdump-config: makedumpfile failed, falling back to 'cp' ... failed!
> cp: cannot stat `/proc/vmcore': No such file or directory
> kdump-config: failed to save vmcore in /var/crash/201305180259 ... failed!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/8] mm, vmalloc: export vmap_area_list, instead of vmlist

2013-03-15 Thread Atsushi Kumagai

Hello,

On Tue, 12 Mar 2013 23:43:48 -0700
ebied...@xmission.com (Eric W. Biederman) wrote:

> Joonsoo Kim  writes:
> 
> > From: Joonsoo Kim 
> >
> > Although our intention is to unexport internal structure entirely,
> > but there is one exception for kexec. kexec dumps address of vmlist
> > and makedumpfile uses this information.
> >
> > We are about to remove vmlist, then another way to retrieve information
> > of vmalloc layer is needed for makedumpfile. For this purpose,
> > we export vmap_area_list, instead of vmlist.
> 
> That seems entirely reasonable to me.  Usage by kexec should not limit
> the evoluion of the kernel especially usage by makedumpfile.
> 
> Atsushi Kumagai can you make makedumpfile work with this change?

Sure! I'm going to work with this change in the next version.
But, I noticed that necessary information is missed in this patch,
and sorry for too late reply.

Both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) are
necessary to get vmalloc_start value from vmap_area_list, but
they aren't exported in this patch.
I understand that the policy of this patch series "to unexport
internal structure entirely", although the information is necessary
for makedumpfile.

Additionally, OFFSET(vm_struct.addr) is no longer used, should be
removed. It was added for the same purpose as vmlist in the commit
below. 

  commit acd99dbf54020f5c80b9aa2f2ea86f43cb285b02
  Author: Ken'ichi Ohmichi 
  Date:   Sat Oct 18 20:28:30 2008 -0700

  kdump: add vmlist.addr to vmcoreinfo for x86 vmalloc translation.

To sum it up, I would like to push the patch below.

Thanks
Atsushi Kumagai

--
From: Atsushi Kumagai 
Date: Fri, 15 Mar 2013 14:19:28 +0900
Subject: [PATCH] kexec, vmalloc: Export additional information of
 vmalloc layer.

Now, vmap_area_list is exported as VMCOREINFO for makedumpfile
to get the start address of vmalloc region (vmalloc_start).
The address which contains vmalloc_start value is represented as
below:

  vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start)

However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list)
aren't exported as VMCOREINFO.

So, this patch exports them externally with small cleanup.

Signed-off-by: Atsushi Kumagai 
---
 include/linux/vmalloc.h | 12 
 kernel/kexec.c  |  3 ++-
 mm/vmalloc.c| 11 ---
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 8a25f90..62e0354 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include   /* pgprot_t */
+#include 
 
 struct vm_area_struct; /* vma defining user mapping in mm_types.h */
 
@@ -35,6 +36,17 @@ struct vm_struct {
const void  *caller;
 };
 
+struct vmap_area {
+   unsigned long va_start;
+   unsigned long va_end;
+   unsigned long flags;
+   struct rb_node rb_node; /* address sorted rbtree */
+   struct list_head list;  /* address sorted list */
+   struct list_head purge_list;/* "lazy purge" list */
+   struct vm_struct *vm;
+   struct rcu_head rcu_head;
+};
+
 /*
  * Highlevel APIs for driver use
  */
diff --git a/kernel/kexec.c b/kernel/kexec.c
index d9bfc6c..5db0148 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1527,7 +1527,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(free_area, free_list);
VMCOREINFO_OFFSET(list_head, next);
VMCOREINFO_OFFSET(list_head, prev);
-   VMCOREINFO_OFFSET(vm_struct, addr);
+   VMCOREINFO_OFFSET(vmap_area, va_start);
+   VMCOREINFO_OFFSET(vmap_area, list);
VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER);
log_buf_kexec_setup();
VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 151da8a..72043d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -249,17 +249,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 #define VM_LAZY_FREEING0x02
 #define VM_VM_AREA 0x04
 
-struct vmap_area {
-   unsigned long va_start;
-   unsigned long va_end;
-   unsigned long flags;
-   struct rb_node rb_node; /* address sorted rbtree */
-   struct list_head list;  /* address sorted list */
-   struct list_head purge_list;/* "lazy purge" list */
-   struct vm_struct *vm;
-   struct rcu_head rcu_head;
-};
-
 static DEFINE_SPINLOCK(vmap_area_lock);
 /* Export for kexec only */
 LIST_HEAD(vmap_area_list);
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2 v2] kexec: Export PG_hwpoison flag into vmcoreinfo

2013-02-15 Thread Atsushi Kumagai

Hello Tanino-san,

On Fri, 15 Feb 2013 17:48:24 +0900
Mitsuhiro Tanino  wrote:

> Hello Kumagai-san,
> 
> > I'm curious to know the status of this patch because I'll release
> > makedumpfile-1.5.2 with the feature to exclude hwpoison page soon.
> 
> I requested Andrew to merge the kernel side patch, and the patch
> has been added the -mm tree.
> 
> Please push the makedumpfile side patch into makedumpfile-1.5.2.

Thanks, I've released makedumpfile-1.5.2 with your patch:

http://lists.infradead.org/pipermail/kexec/2013-February/007954.html


Thanks
Atsushi Kumagai

> Regards,
> Mitsuhiro Tanino
> 
> > TO: mm-comm...@vger.kernel.org; 
> > CC: mitsuhiro.tanino...@hitachi.com; ebied...@xmission.com; 
> > vgo...@redhat.com; 
> > Sender: a...@linux-foundation.org
> > Sugject: + kexec-export-pg_hwpoison-flag-into-vmcoreinfo.patch added to -mm 
> > tree
> > Date: 2013/02/13 07:50
> > 
> > 
> > The patch titled
> >  Subject: kexec: export PG_hwpoison flag into vmcoreinfo
> > has been added to the -mm tree.  Its filename is
> >  kexec-export-pg_hwpoison-flag-into-vmcoreinfo.patch
> > 
> > Before you just go and hit "reply", please:
> >a) Consider who else should be cc'ed
> >b) Prefer to cc a suitable mailing list as well
> >c) Ideally: find the original patch on the mailing list and do a
> >   reply-to-all to that, adding suitable additional cc's
> > 
> > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> > 
> > The -mm tree is included into linux-next and is updated
> > there every 3-4 working days
> > 
> > --
> > From: Mitsuhiro Tanino 
> > Subject: kexec: export PG_hwpoison flag into vmcoreinfo
> > 
> > This patch exports a PG_hwpoison into vmcoreinfo when
> > CONFIG_MEMORY_FAILURE is defined.  "makedumpfile" needs to read
> > information of memory, such as 'mem_section', 'zone', 'pageflags' from
> > vmcore.
> > 
> > We introduce a function into "makedumpfile" to exclude hwpoison page from
> > vmcore dump.  In order to introduce this function, PG_hwpoison flag have
> > to export into vmcoreinfo.
> > 
> > Signed-off-by: Mitsuhiro Tanino 
> > Acked-by: "Eric W. Biederman" 
> > Cc: Mitsuhiro Tanino 
> > Cc: Vivek Goyal 
> > Signed-off-by: Andrew Morton 
> > ---
> > 
> >  kernel/kexec.c |3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff -puN kernel/kexec.c~kexec-export-pg_hwpoison-flag-into-vmcoreinfo 
> > kernel/kexec.c
> > --- a/kernel/kexec.c~kexec-export-pg_hwpoison-flag-into-vmcoreinfo
> > +++ a/kernel/kexec.c
> > @@ -1537,6 +1537,9 @@ static int __init crash_save_vmcoreinfo_
> > VMCOREINFO_NUMBER(PG_private);
> >     VMCOREINFO_NUMBER(PG_swapcache);
> > VMCOREINFO_NUMBER(PG_slab);
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +   VMCOREINFO_NUMBER(PG_hwpoison);
> > +#endif
> > VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> >  
> > arch_crash_save_vmcoreinfo();
> > _
> > 
> > Patches currently in -mm which might be from 
> > mitsuhiro.tanino...@hitachi.com are
> > 
> > kexec-export-pg_hwpoison-flag-into-vmcoreinfo.patch
> 
> 
> 
> 
> (2013/02/08 17:49), Atsushi Kumagai wrote:
> > Hello,
> > 
> > On Wed, 31 Oct 2012 23:05:05 +0900
> > Mitsuhiro Tanino  wrote:
> > 
> >> This patch exports a PG_hwpoison into vmcoreinfo when
> >> CONFIG_MEMORY_FAILURE is defined.
> >> "makedumpfile" needs to read information of memory, such as
> >> 'mem_section', 'zone', 'pageflags' from vmcore.
> >>
> >> We introduce a function into "makedumpfile" to exclude
> >> hwpoison page from vmcore dump.
> >> In order to introduce this function, PG_hwpoison flag have
> >> to export into vmcoreinfo.
> >>
> >> Signed-off-by: Mitsuhiro Tanino 
> > 
> > I'm curious to know the status of this patch because I'll release
> > makedumpfile-1.5.2 with the feature to exclude hwpoison page soon.
> > 
> > 
> > Thanks
> > Atsushi Kumagai
> > 
> >> ---
> >>  kernel/kexec.c |3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/kernel/kexec.c b/kernel/kexec.c
> >> index 0668d58..0d5d6bc 100644
> >> --- a/kernel/kexec.c
> >> +++ b/kernel/kexec.c
> >> @@ -15

Re: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore

2013-02-14 Thread Atsushi Kumagai

7;s no
>   longer memcpy() from kernel-space to user-space.
> 
> Design
> ==
> 
> = Support Range
> 
> - mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32
>   interface is used only if dump target size is less than 4GB. Then,
>   the existing interface is enough in performance.
> 
> = Change of /proc/vmcore format
> 
> For mmap()'s page-size boundary requirement, /proc/vmcore changed its
> own shape and now put its objects in page-size boundary.
> 
> - Allocate buffer for ELF headers in page-size boundary.
>   => See [PATCH 01/13].
> 
> - Note objects scattered on old memory are copied in a single
>   page-size aligned buffer on 2nd kernel, and it is remapped to
>   user-space.
>   => See [PATCH 09/13].
>   
> - The head and/or tail pages of memroy chunks are also copied on 2nd
>   kernel if either of their ends is not page-size aligned. See
>   => See [PATCH 12/13].
> 
> = 32-bit PAE limitation
> 
> - On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory
>   only since remap_pfn_range()'s third argument, pfn, has 32-bit
>   length only, defined as unsigned long type.
> 
> TODO
> 
> 
> - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
>   confirm whether we can see enough performance improvement.

As a first step, I'll make a prototype patch for benchmarking unless you
have already done it.


Thanks
Atsushi Kumagai

> 
> Test
> 
> 
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
> 
> ---
> 
> HATAYAMA Daisuke (13):
>   vmcore: introduce mmap_vmcore()
>   vmcore: copy non page-size aligned head and tail pages in 2nd kernel
>   vmcore: count holes generated by round-up operation for vmcore size
>   vmcore: round-up offset of vmcore object in page-size boundary
>   vmcore: copy ELF note segments in buffer on 2nd kernel
>   vmcore: remove unused helper function
>   vmcore: modify read_vmcore() to read buffer on 2nd kernel
>   vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
>   vmcore: modify ELF32 code according to new type
>   vmcore: introduce types for objects copied in 2nd kernel
>   vmcore: fill unused part of buffer for ELF headers with 0
>   vmcore: round up buffer size of ELF headers by PAGE_SIZE
>   vmcore: allocate buffer for ELF headers on page-size alignment
> 
> 
>  fs/proc/vmcore.c|  408 
> +++
>  include/linux/proc_fs.h |   11 +
>  2 files changed, 313 insertions(+), 106 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2 v2] kexec: Export PG_hwpoison flag into vmcoreinfo

2013-02-08 Thread Atsushi Kumagai

Hello,

On Wed, 31 Oct 2012 23:05:05 +0900
Mitsuhiro Tanino  wrote:

> This patch exports a PG_hwpoison into vmcoreinfo when
> CONFIG_MEMORY_FAILURE is defined.
> "makedumpfile" needs to read information of memory, such as
> 'mem_section', 'zone', 'pageflags' from vmcore.
> 
> We introduce a function into "makedumpfile" to exclude
> hwpoison page from vmcore dump.
> In order to introduce this function, PG_hwpoison flag have
> to export into vmcoreinfo.
> 
> Signed-off-by: Mitsuhiro Tanino 

I'm curious to know the status of this patch because I'll release
makedumpfile-1.5.2 with the feature to exclude hwpoison page soon.


Thanks
Atsushi Kumagai

> ---
>  kernel/kexec.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 0668d58..0d5d6bc 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1513,6 +1513,9 @@ static int __init crash_save_vmcoreinfo_init(void)
> VMCOREINFO_NUMBER(PG_lru);
> VMCOREINFO_NUMBER(PG_private);
> VMCOREINFO_NUMBER(PG_swapcache);
> +#ifdef CONFIG_MEMORY_FAILURE
> +   VMCOREINFO_NUMBER(PG_hwpoison);
> +#endif
> 
> arch_crash_save_vmcoreinfo();
> update_vmcoreinfo_note();
> --
> 1.7.10.1
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Add the values related to buddy system for filtering free pages.

2013-02-07 Thread Atsushi Kumagai

Hello Lisa,

On Thu, 07 Feb 2013 05:29:11 -0700
Lisa Mitchell  wrote:

> > > > Also, I have one question. Can we always think of 1st and 2nd kernels
> > > > are same?
> > > 
> > > Not at all.  Distros frequently implement it with the same kernel in
> > > both role but it should be possible to use an old crusty stable kernel
> > > as the 2nd kernel.
> > > 
> > > > If I understand correctly, kexec/kdump can use the 2nd kernel different
> > > > from the 1st's. So, differnet kernels need to do the same thing as 
> > > > makedumpfile
> > > > does. If assuming two are same, problem is mush simplified.
> > > 
> > > As a developer it becomes attractive to use a known stable kernel to
> > > capture the crash dump even as I experiment with a brand new kernel.
> > 
> > To allow to use the 2nd kernel different from the 1st's, I think we have
> > to take care of each kernel version with the logic included in makedumpfile
> > for them. That's to say, makedumpfile goes on as before.
> > 
> > 
> > Thanks
> > Atsushi Kumagai
> 
> 
> Atsushi and Vivek:  
> 
> I'm trying to get the status of whether the patch submitted in
> https://lkml.org/lkml/2012/11/21/90  is going to be accepted upstream
> and get in some version of the Linux 3.8 kernel.   I'm replying to the
> last email thread above on kexec_lists and lkml.org  that I could find
> about this patch.  
> 
> I was counting on this kernel patch to improve performance of
> makedumpfilev1.5.1, so at least it wouldn't be a regression in
> performance over makedumpfile v1.4.   It was listed as recommended in
> the makedumpfilev1.5.1 release posting:
> http://lists.infradead.org/pipermail/kexec/2012-December/007460.html
> 
> 
> All the conversations in the thread since this patch was committed seem
> to voice some reservations now, and reference other fixes being tried to
> improve performance.
> 
> Does that mean you are abandoning getting this patch accepted upstream,
> in favor of pursuing other alternatives?

No, this patch has been merged into -next, we should just wait for it to be
merged into linus tree.

  
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=0c63e90dd1c7b35ae2ea9475ba67cf68d8801a26

What interests us now is improvement for interfaces of /proc/vmcore,
it's not alternative but another idea which can be consistent with
this patch.


Thanks
Atsushi Kumagai

> 
> I had hoped this patch would be okay to get accepted upstream, and then
> other improvements could be built on top of it.  
> 
> Is that not the case?   
> 
> Or has further review concluded now that this change is a bad idea due
> to adding dependence of this new makedumpfile feature on some deep
> kernel memory internals?
> 
> Thanks,
> 
> Lisa Mitchell
> 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Add the values related to buddy system for filtering free pages.

2012-12-27 Thread Atsushi Kumagai

Hello,

On Thu, 20 Dec 2012 18:00:11 -0800
ebied...@xmission.com (Eric W. Biederman) wrote:

> "Hatayama, Daisuke"  writes:
> 
> >> From: kexec-boun...@lists.infradead.org
> >> [mailto:kexec-boun...@lists.infradead.org] On Behalf Of Atsushi Kumagai
> >> Sent: Thursday, December 20, 2012 11:21 AM
> >
> >> On Wed, 19 Dec 2012 16:18:56 -0800
> >> Andrew Morton  wrote:
> >> 
> >> > On Mon, 10 Dec 2012 10:39:13 +0900
> >> > Atsushi Kumagai  wrote:
> >> >
> >
> >> >
> >> > We might change the PageBuddy() implementation at any time, and
> >> > makedumpfile will break.  Or in this case, become less efficient.
> >> >
> >> > Is there any way in which we can move some of this logic into the
> >> > kernel?  In this case, add some kernel code which uses PageBuddy() on
> >> > behalf of makedumpfile, rather than replicating the PageBuddy() logic
> >> > in userspace?
> >> 
> >> In last month, Cliff Wickman proposed such idea:
> >> 
> >>   [PATCH v2] makedumpfile: request the kernel do page scans
> >>   http://lists.infradead.org/pipermail/kexec/2012-November/007318.html
> >> 
> >>   [PATCH] scan page tables for makedumpfile, 3.0.13 kernel
> >>   http://lists.infradead.org/pipermail/kexec/2012-November/007319.html
> >> 
> >> In his idea, the kernel does page scans to distinguish unnecessary pages
> >> (free pages and others) and returns the list of PFN's which should be
> >> excluded for makedumpfile.
> >> As a result, makedumpfile doesn't need to consider internal kernel
> >> behavior.
> >> 
> >> I think it's a good idea from the viewpoint of maintainability and
> >> performance.
> 
> > I also think wide part of his code can be reused in this work. But the bad
> > performance is caused by a lot of ioremap, not a lot of copying. See my
> > profiling result I posted some days ago. Two issues, ioremap one and 
> > filtering
> > maintainability, should be considered separately. Even on ioremap issue,
> > there is secondary one to consider in memory consumption on the 2nd
> > kernel.
> 
> Thanks.  I was wondering why moving the code into /proc/vmcore would
> make things faster.

Thanks HATAYAMA-san, I've understood the issues correctly.
We should continue improving the ioremap issue as Cliff and HATAYAMA-san
are doing now.

> 
> > Also, I have one question. Can we always think of 1st and 2nd kernels
> > are same?
> 
> Not at all.  Distros frequently implement it with the same kernel in
> both role but it should be possible to use an old crusty stable kernel
> as the 2nd kernel.
> 
> > If I understand correctly, kexec/kdump can use the 2nd kernel different
> > from the 1st's. So, differnet kernels need to do the same thing as 
> > makedumpfile
> > does. If assuming two are same, problem is mush simplified.
> 
> As a developer it becomes attractive to use a known stable kernel to
> capture the crash dump even as I experiment with a brand new kernel.

To allow to use the 2nd kernel different from the 1st's, I think we have
to take care of each kernel version with the logic included in makedumpfile
for them. That's to say, makedumpfile goes on as before.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Add the values related to buddy system for filtering free pages.

2012-12-19 Thread Atsushi Kumagai

Hello Andrew,

On Wed, 19 Dec 2012 16:18:56 -0800
Andrew Morton  wrote:

> On Mon, 10 Dec 2012 10:39:13 +0900
> Atsushi Kumagai  wrote:
> 
> > This patch adds the values related to buddy system to vmcoreinfo data
> > so that makedumpfile (dump filtering command) can filter out all free
> > pages with the new logic.
> > It's faster than the current logic because it can distinguish free page
> > by analyzing page structure at the same time as filtering for other
> > unnecessary pages (e.g. anonymous page).
> > OTOH, the current logic has to trace free_list to distinguish free 
> > pages while analyzing page structure to filter out other unnecessary
> > pages.
> > 
> > The new logic uses the fact that buddy page is marked by _mapcount == 
> > PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other
> > fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab
> > is set or not before looking up _mapcount value.
> > And we can get the order of buddy system from private field.
> > To sum it up, the values below are required for this logic.
> > 
> > Required values:
> >   - OFFSET(page._mapcount)
> >   - OFFSET(page.private)
> >   - NUMBER(PG_slab)
> >   - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
> > 
> > Changelog from v1 to v2:
> > 1. remove SIZE(pageflags)
> >   The new logic was changed after I sent v1 patch.  
> >   Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile.
> > 
> > What's makedumpfile:
> >   makedumpfile creates a small dumpfile by excluding unnecessary pages
> >   for the analysis. To distinguish unnecessary pages, makedumpfile gets
> >   the vmcoreinfo data which has the minimum debugging information only
> >   for dump filtering.
> 
> Gee, this info is getting highly dependent upon deep internal kernel
> behaviour.

Yes. makedumpfile should be changed depend on kernel version and we did it.

> > index 5e4bd78..b27efe4 100644
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1490,6 +1490,8 @@ static int __init crash_save_vmcoreinfo_init(void)
> > VMCOREINFO_OFFSET(page, _count);
> > VMCOREINFO_OFFSET(page, mapping);
> > VMCOREINFO_OFFSET(page, lru);
> > +   VMCOREINFO_OFFSET(page, _mapcount);
> > +   VMCOREINFO_OFFSET(page, private);
> > VMCOREINFO_OFFSET(pglist_data, node_zones);
> > VMCOREINFO_OFFSET(pglist_data, nr_zones);
> >  #ifdef CONFIG_FLAT_NODE_MEM_MAP
> > @@ -1512,6 +1514,8 @@ static int __init crash_save_vmcoreinfo_init(void)
> > VMCOREINFO_NUMBER(PG_lru);
> > VMCOREINFO_NUMBER(PG_private);
> > VMCOREINFO_NUMBER(PG_swapcache);
> > +   VMCOREINFO_NUMBER(PG_slab);
> > +   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> 
> We might change the PageBuddy() implementation at any time, and
> makedumpfile will break.  Or in this case, become less efficient.
> 
> Is there any way in which we can move some of this logic into the
> kernel?  In this case, add some kernel code which uses PageBuddy() on
> behalf of makedumpfile, rather than replicating the PageBuddy() logic
> in userspace?

In last month, Cliff Wickman proposed such idea:

  [PATCH v2] makedumpfile: request the kernel do page scans
  http://lists.infradead.org/pipermail/kexec/2012-November/007318.html

  [PATCH] scan page tables for makedumpfile, 3.0.13 kernel
  http://lists.infradead.org/pipermail/kexec/2012-November/007319.html

In his idea, the kernel does page scans to distinguish unnecessary pages
(free pages and others) and returns the list of PFN's which should be
excluded for makedumpfile.
As a result, makedumpfile doesn't need to consider internal kernel
behavior.

I think it's a good idea from the viewpoint of maintainability and
performance.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/8] remove vm_struct list management

2012-12-11 Thread Atsushi Kumagai

Hello,

On Tue, 11 Dec 2012 17:17:05 -0500 (EST)
Dave Anderson  wrote:

> 
> 
> - Original Message -
> > On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:
> > 
> > [..]
> > > > So without knowing details of both the data structures, I think if 
> > > > vmlist
> > > > is going away, then user space tools should be able to traverse 
> > > > vmap_area_root
> > > > rb tree. I am assuming it is sorted using ->addr field and we should be
> > > > able to get vmalloc area start from there. It will just be a matter of
> > > > exporting right fields to user space (instead of vmlist).
> > > 
> > > There is address sorted list of vmap_area, vmap_area_list.
> > > So we can use it for traversing vmalloc areas if it is necessary.
> > > But, as I mentioned before, kexec write *just* address of vmlist and
> > > offset of vm_struct's address field.  It imply that they don't traverse 
> > > vmlist,
> > > because they didn't write vm_struct's next field which is needed for 
> > > traversing.
> > > Without vm_struct's next field, they have no method for traversing.
> > > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by 
> > > [7/8] is
> > > a safe way to maintain a compatibility of userspace tool. :)
> > 
> > Actually the design of "makedumpfile" and "crash" tool is that they know
> > about kernel data structures and they adopt to changes. So for major
> > changes they keep track of kernel version numbers and if access the
> > data structures accordingly.
> > 
> > Currently we access first element of vmlist to determine start of vmalloc
> > address. True we don't have to traverse the list.
> > 
> > But as you mentioned we should be able to get same information by
> > traversing to left most element of vmap_area_list rb tree. So I think
> > instead of trying to retain vmlist first element just for backward
> > compatibility, I will rather prefer get rid of that code completely
> > from kernel and let user space tool traverse rbtree. Just export
> > minimum needed info for traversal in user space.
> 
> There's no need to traverse the rbtree.  There is a vmap_area_list
> linked list of vmap_area structures that is also sorted by virtual
> address.
> 
> All that makedumpfile would have to do is to access the first vmap_area
> in the vmap_area_list -- as opposed to the way that it does now, which is
> by accessing the first vm_struct in the to-be-obsoleted vmlist list.
> 
> So it seems silly to keep the dummy "vmlist" around.

I think so, I will modify makedumpfile to get the start address of vmalloc 
with vmap_area_list if the related symbols are provided as VMCOREINFO like
vmlist.

BTW, have we to consider other tools ?
If it is clear, I think we can get rid of the dummy vmlist.


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add the values related to buddy system for filtering free pages

2012-12-09 Thread Atsushi Kumagai

Hello Vivek,

On Fri, 7 Dec 2012 10:08:05 -0500
Vivek Goyal  wrote:

> On Wed, Nov 21, 2012 at 05:02:47PM +0900, Atsushi Kumagai wrote:
> > This patch adds the values related to buddy system to vmcoreinfo data
> > so that makedumpfile (dump filtering command) can filter out all free
> > pages with the new logic.
> > It's faster than the current logic because it can distinguish free page
> > by analyzing page structure at the same time as filtering for other
> > unnecessary pages (e.g. anonymous page).
> > OTOH, the current logic has to trace free_list to distinguish free 
> > pages while analyzing page structure to filter out other unnecessary
> > pages.
> > 
> > The new logic uses the fact that buddy page is marked by _mapcount == 
> > PAGE_BUDDY_MAPCOUNT_VALUE. The values below are required to distinguish
> > it.
> > 
> > Required values:
> >   - OFFSET(page._mapcount)
> >   - OFFSET(page.private)
> >   - SIZE(pageflags)
> >   - NUMBER(PG_slab)
> >   - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
> >   
> 
> As per your explanation, you should just need to export page._mapcount
> offset and PAGE_BUDDY_MAPCOUNT_VALUE value so that you can figure out
> if a page is free or not.
> 
> Why do we need rest of the three fields.
> 
> - OFFSET(page.private)
> - SIZE(pageflags)
> - NUMBER(PG_slab)

Thanks for your comment.

SIZE(pageflags) is unnecessary as you said, but the other two are
certainly necessary.
I modified the description in v2 to make it clear, please see below:

  https://lkml.org/lkml/2012/12/9/138


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] Add the values related to buddy system for filtering free pages.

2012-12-09 Thread Atsushi Kumagai

This patch adds the values related to buddy system to vmcoreinfo data
so that makedumpfile (dump filtering command) can filter out all free
pages with the new logic.
It's faster than the current logic because it can distinguish free page
by analyzing page structure at the same time as filtering for other
unnecessary pages (e.g. anonymous page).
OTOH, the current logic has to trace free_list to distinguish free 
pages while analyzing page structure to filter out other unnecessary
pages.

The new logic uses the fact that buddy page is marked by _mapcount == 
PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other
fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab
is set or not before looking up _mapcount value.
And we can get the order of buddy system from private field.
To sum it up, the values below are required for this logic.

Required values:
  - OFFSET(page._mapcount)
  - OFFSET(page.private)
  - NUMBER(PG_slab)
  - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)

Changelog from v1 to v2:
1. remove SIZE(pageflags)
  The new logic was changed after I sent v1 patch.  
  Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile.

What's makedumpfile:
  makedumpfile creates a small dumpfile by excluding unnecessary pages
  for the analysis. To distinguish unnecessary pages, makedumpfile gets
  the vmcoreinfo data which has the minimum debugging information only
  for dump filtering.

Signed-off-by: Atsushi Kumagai 
---
 kernel/kexec.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..b27efe4 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1490,6 +1490,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, _count);
VMCOREINFO_OFFSET(page, mapping);
VMCOREINFO_OFFSET(page, lru);
+   VMCOREINFO_OFFSET(page, _mapcount);
+   VMCOREINFO_OFFSET(page, private);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
@@ -1512,6 +1514,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_lru);
VMCOREINFO_NUMBER(PG_private);
VMCOREINFO_NUMBER(PG_swapcache);
+   VMCOREINFO_NUMBER(PG_slab);
+   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);

arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
--
1.7.9.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Add the values related to buddy system for filtering free pages

2012-11-21 Thread Atsushi Kumagai

This patch adds the values related to buddy system to vmcoreinfo data
so that makedumpfile (dump filtering command) can filter out all free
pages with the new logic.
It's faster than the current logic because it can distinguish free page
by analyzing page structure at the same time as filtering for other
unnecessary pages (e.g. anonymous page).
OTOH, the current logic has to trace free_list to distinguish free 
pages while analyzing page structure to filter out other unnecessary
pages.

The new logic uses the fact that buddy page is marked by _mapcount == 
PAGE_BUDDY_MAPCOUNT_VALUE. The values below are required to distinguish
it.

Required values:
  - OFFSET(page._mapcount)
  - OFFSET(page.private)
  - SIZE(pageflags)
  - NUMBER(PG_slab)
  - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
  
What's makedumpfile:
  makedumpfile creates a small dumpfile by excluding unnecessary pages
  for the analysis. To distinguish unnecessary pages, makedumpfile gets
  the vmcoreinfo data which has the minimum debugging information only
  for dump filtering.

Signed-off-by: Atsushi Kumagai 
---
 include/linux/kexec.h | 3 +++
 kernel/kexec.c| 5 +
 2 files changed, 8 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..a90b148 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -158,6 +158,9 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_STRUCT_SIZE(name) \
vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \
  (unsigned long)sizeof(struct name))
+#define VMCOREINFO_ENUM_SIZE(name) \
+   vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \
+ (unsigned long)sizeof(enum name))
 #define VMCOREINFO_OFFSET(name, field) \
vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \
  (unsigned long)offsetof(struct name, field))
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..511151b 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1485,10 +1485,13 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_STRUCT_SIZE(zone);
VMCOREINFO_STRUCT_SIZE(free_area);
VMCOREINFO_STRUCT_SIZE(list_head);
+   VMCOREINFO_ENUM_SIZE(pageflags);
VMCOREINFO_SIZE(nodemask_t);
VMCOREINFO_OFFSET(page, flags);
VMCOREINFO_OFFSET(page, _count);
VMCOREINFO_OFFSET(page, mapping);
+   VMCOREINFO_OFFSET(page, _mapcount);
+   VMCOREINFO_OFFSET(page, private);
VMCOREINFO_OFFSET(page, lru);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
@@ -1512,6 +1515,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_lru);
VMCOREINFO_NUMBER(PG_private);
VMCOREINFO_NUMBER(PG_swapcache);
+   VMCOREINFO_NUMBER(PG_slab);
+   VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);

arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
--
1.7.11
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2 v2] makedumpfile: Add a default action to exclude hwpoison page from vmcore

2012-11-04 Thread Atsushi Kumagai

Hello Tanino-san,

On Wed, 31 Oct 2012 23:05:01 +0900
Mitsuhiro Tanino  wrote:

> This patch introduces a function which excludes hwpoison pages
> from vmcore as a default action for makedumpfile.
> 
> Signed-off-by: Mitsuhiro Tanino 

Thank you for your work. I think it's good feature.

I will merge this patch into makedumpfile-1.5.2 with the small change below.
Of course, I will accept --no-hwposion-filtering option when it's needed.


diff --git a/makedumpfile.c b/makedumpfile.c
index 30cf130..fcf42f6 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3864,8 +3864,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 * Exclude the hwpoison page.
 */
else if (isHWPOISON(flags)) {
-   clear_bit_on_2nd_bitmap_for_kernel(pfn);
-   pfn_hwpoison++;
+   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
+   pfn_hwpoison++;
}
}
return TRUE;


Thanks
Atsushi Kumagai


> diff -uprN a/makedumpfile.c b/makedumpfile.c
> --- a/makedumpfile.c  2012-10-01 15:26:54.510354074 +0900
> +++ b/makedumpfile.c  2012-10-29 22:32:24.913057535 +0900
> @@ -43,6 +43,7 @@ unsigned long long pfn_cache;
>  unsigned long long pfn_cache_private;
>  unsigned long long pfn_user;
>  unsigned long long pfn_free;
> +unsigned long long pfn_hwpoison;
>  
>  unsigned long long num_dumped;
>  
> @@ -969,6 +970,7 @@ get_structure_info(void)
>   ENUM_NUMBER_INIT(PG_lru, "PG_lru");
>   ENUM_NUMBER_INIT(PG_private, "PG_private");
>   ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache");
> + ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison");
>  
>   TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t");
>  
> @@ -1371,6 +1373,7 @@ write_vmcoreinfo_data(void)
>   WRITE_NUMBER("PG_lru", PG_lru);
>   WRITE_NUMBER("PG_private", PG_private);
>   WRITE_NUMBER("PG_swapcache", PG_swapcache);
> + WRITE_NUMBER("PG_hwpoison", PG_hwpoison);
>  
>   /*
>* write the source file of 1st kernel
> @@ -1659,6 +1662,7 @@ read_vmcoreinfo(void)
>   READ_NUMBER("PG_lru", PG_lru);
>   READ_NUMBER("PG_private", PG_private);
>   READ_NUMBER("PG_swapcache", PG_swapcache);
> + READ_NUMBER("PG_hwpoison", PG_hwpoison);
>  
>   READ_SRCFILE("pud_t", pud_t);
>  
> @@ -3856,6 +3860,13 @@ __exclude_unnecessary_pages(unsigned lon
>   if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
>   pfn_user++;
>   }
> + /*
> +  * Exclude the hwpoison page.
> +  */
> + else if (isHWPOISON(flags)) {
> + clear_bit_on_2nd_bitmap_for_kernel(pfn);
> + pfn_hwpoison++;
> + }
>   }
>   return TRUE;
>  }
> @@ -3914,11 +3925,13 @@ exclude_unnecessary_pages_cyclic(void)
>   return FALSE;
>  
>   /*
> -  * Exclude cache pages, cache private pages, user data pages, and free 
> pages.
> +  * Exclude cache pages, cache private pages, user data pages,
> +free pages and hwpoison pages.
>*/
>   if (info->dump_level & DL_EXCLUDE_CACHE ||
>   info->dump_level & DL_EXCLUDE_CACHE_PRI ||
> - info->dump_level & DL_EXCLUDE_USER_DATA) {
> + info->dump_level & DL_EXCLUDE_USER_DATA ||
> + (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {
>  
>   gettimeofday(&tv_start, NULL);
>  
> @@ -4018,11 +4031,13 @@ create_2nd_bitmap(void)
>   }
>  
>   /*
> -  * Exclude cache pages, cache private pages, user data pages.
> +  * Exclude cache pages, cache private pages, user data pages,
> +  * and hwpoison pages.
>*/
>   if (info->dump_level & DL_EXCLUDE_CACHE ||
>   info->dump_level & DL_EXCLUDE_CACHE_PRI ||
> - info->dump_level & DL_EXCLUDE_USER_DATA) {
> + info->dump_level & DL_EXCLUDE_USER_DATA ||
> + (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) {
>   if (!exclude_unnecessary_pages()) {
>   ERRMSG("Can't exclude unnecessary pages.\n");
>   return FALSE;
> @@ -5062,7 +5077,8 @@ write_elf_pages_cyclic(struct cache_data
>   /*
>* Reset counter for debug message.
>*/
> - pfn_zero =  pfn_cache = pfn_cache_private = pfn_user = pfn_free = 0;
> + pfn_zero = pfn_cache = pfn_cache_private = 0;
> + pfn_user = pfn_free = pfn_hwpoison =

Re: [PATCH] kdump: Append newline to the last lien of vmcoreinfo note

2012-07-19 Thread Atsushi Kumagai

Hello Vivek,

On Thu, 19 Jul 2012 09:49:21 -0400
Vivek Goyal  wrote:

> On Wed, Jul 18, 2012 at 03:04:39PM -0700, Andrew Morton wrote:
> > On Tue, 17 Jul 2012 13:36:55 -0400
> > Vivek Goyal  wrote:
> > 
> > > Last line of vmcoreinfo note does not end with \n. Parsing all the lines
> > > in note becomes easier if all lines end with \n instead of trying to 
> > > special
> > > case the last line.
> > > 
> > > I know atleast one tool, vmcore-dmesg in kexec-tools tree which made the
> > > assumption that all lines end with \n. I think it is a good idea to
> > > fix it.
> > > 
> > > Signed-off-by: Vivek Goyal 
> > > ---
> > >  kernel/kexec.c |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > Index: linux-2.6/kernel/kexec.c
> > > ===
> > > --- linux-2.6.orig/kernel/kexec.c 2012-07-17 19:26:38.844033784 -0400
> > > +++ linux-2.6/kernel/kexec.c  2012-07-17 23:51:33.311701781 -0400
> > > @@ -1424,7 +1424,7 @@ static void update_vmcoreinfo_note(void)
> > >  
> > >  void crash_save_vmcoreinfo(void)
> > >  {
> > > - vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
> > > + vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds());
> > >   update_vmcoreinfo_note();
> > >  }
> > 
> > huh, that was a screwup.  And now we have to make what must be
> > viewed as a non-back-compatible ABI change.
> > 
> > Ho hum, presumably there isn't a lot of code out there which is
> > dependent upon a non-newline-terminated CRASHTIME record.
> 
> I think so. AFAIK, makedumpfile (vmcore filtering utility) is only
> user of CRASHTIME=.
> 
> > 
> > Why did this work at all, anyway?  Is CRASHTIME always the last-emitted
> > record?
> 
> Yes, CRASHTIME= is always the last emitted line in vmcoreinfo note.
> 
> I had a quick look at makedumpfile code and looks like they read the whole
> note, dump it to a file and then do fgets() on the file in a loop. As it is
> last line in the file, fgets encounters EOF and reads the CRASHTIME= line
> successfully. So even after this change makedumpfile should remain
> unaffected. 
> 
> CCing makedumpfile maintainer, Atsushi Kumagai.

As you said, makedumpfile reads VMCOREINFO line by line with fgets().
So, it's OK that the end of the last line is whether \n or EOF.

Therefore, this change doesn't affect makedumpfile. 


Thanks
Atsushi Kumagai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

39 matches

Mail list logo