Re: [PATCH v1] proc/vmcore: fix clearing user buffer by properly using clear_user()

2021-11-11 Thread Baoquan He
On 11/11/21 at 08:18pm, David Hildenbrand wrote:
> To clear a user buffer we cannot simply use memset, we have to use
> clear_user(). Using a kernel config based on rawhide Fedora and a
> virtio-mem device that registers a vmcore_cb, I can easily trigger:
> 
> [   11.327580] systemd[1]: Starting Kdump Vmcore Save Service...
> [   11.339697] kdump[420]: Kdump is using the default log level(3).
> [   11.370964] kdump[453]: saving to 
> /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
> [   11.373997] kdump[458]: saving vmcore-dmesg.txt to 
> /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
> [   11.385357] kdump[465]: saving vmcore-dmesg.txt complete
> [   11.386722] kdump[467]: saving vmcore
> [   16.531275] BUG: unable to handle page fault for address: 7f2374e01000
> [   16.531705] #PF: supervisor write access in kernel mode
> [   16.532037] #PF: error_code(0x0003) - permissions violation
> [   16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 
> 80007048f867
> [   16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI
> [   16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
> [   16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
> [   16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
> [   16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 
> 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 
> 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
> [   16.535670] RSP: 0018:c973be08 EFLAGS: 00010212
> [   16.535998] RAX: 1000 RBX: 002fd000 RCX: 
> 7f2374e01000
> [   16.536441] RDX: 0001 RSI: dfff RDI: 
> 7f2374e01008
> [   16.536878] RBP: 1000 R08:  R09: 
> c973bc50
> [   16.537315] R10: c973bc48 R11: 829461a8 R12: 
> f000
> [   16.537755] R13: 7f2374e01000 R14:  R15: 
> 88807bd421e8
> [   16.538200] FS:  7f2374e12140() GS:88807f00() 
> knlGS:
> [   16.538696] CS:  0010 DS:  ES:  CR0: 80050033
> [   16.539055] CR2: 7f2374e01000 CR3: 7a4aa000 CR4: 
> 00350eb0
> [   16.539510] Call Trace:
> [   16.539679]  
> [   16.539828]  read_vmcore+0x236/0x2c0
> [   16.540063]  ? enqueue_hrtimer+0x2f/0x80
> [   16.540323]  ? inode_security+0x22/0x60
> [   16.540572]  proc_reg_read+0x55/0xa0
> [   16.540807]  vfs_read+0x95/0x190
> [   16.541022]  ksys_read+0x4f/0xc0
> [   16.541238]  do_syscall_64+0x3b/0x90
> [   16.541475]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> To fix, properly use clear_user() when required.

Looks a great fix to me, thanks for fixing this. 

Check the code, clear_user invokes access_ok to do check, then call
memset(). It's unclear to me how the bug is triggered, could you
please tell more so that I can learn? 


> 
> Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to 
> check for non-ram pages")
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Andrew Morton 
> Cc: Philipp Rudo 
> Cc: kexec@lists.infradead.org
> Cc: linux...@kvack.org
> Cc: linux-fsde...@vger.kernel.org
> Signed-off-by: David Hildenbrand 
> ---
>  fs/proc/vmcore.c | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 30a3b66f475a..509f85148fee 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>   nr_bytes = count;
>  
>   /* If pfn is not ram, return zeros for sparse dump files */
> - if (!pfn_is_ram(pfn))
> - memset(buf, 0, nr_bytes);
> - else {
> + if (!pfn_is_ram(pfn)) {
> + tmp = 0;
> + if (!userbuf)
> + memset(buf, 0, nr_bytes);
> + else if (clear_user(buf, nr_bytes))
> + tmp = -EFAULT;
> + } else {
>   if (encrypted)
>   tmp = copy_oldmem_page_encrypted(pfn, buf,
>nr_bytes,
> @@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>   else
>   tmp = copy_oldmem_page(pfn, buf, nr_bytes,
>  offset, userbuf);
> -
> - if (tmp < 0) {
> - up_read(_cb_rwsem);
> - return tmp;
> - }
>   }
> + if (tmp < 0) {
> + up_read(_cb_rwsem);
> + return tmp;
> + }
> +
>   *ppos += nr_bytes;
>   count -= nr_bytes;
>   buf += 

Re: [PATCH v1] proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration

2021-11-11 Thread Baoquan He
On 11/11/21 at 08:22pm, David Hildenbrand wrote:
> In commit cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback
> to more generic vmcore callbacks"), we added detection of surprise
> vmcore_cb unregistration after the vmcore was already opened. Once
> detected, we warn the user and simulate reading zeroes from that point on
> when accessing the vmcore.
> 
> The basic reason was that unexpected unregistration, for example, by
> manually unbinding a driver from a device after opening the
> vmcore, is not supported and could result in reading oldmem the
> vmcore_cb would have actually prohibited while registered. However,
> something like that can similarly be trigger by a user that's really
> looking for trouble simply by unbinding the relevant driver before opening
> the vmcore -- or by disallowing loading the driver in the first place.
> So it's actually of limited help.

Yes, this is the change what I would like to see in the original patch
"proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore 
callbacks".
I am happy with this patch appended to commit cc5f2704c934.

> 
> Currently, unregistration can only be triggered via virtio-mem when
> manually unbinding the driver from the device inside the VM; there is no
> way to trigger it from the hypervisor, as hypervisors don't allow for
> unplugging virtio-mem devices -- ripping out system RAM from a VM without
> coordination with the guest is usually not a good idea.
> 
> The important part is that unbinding the driver and unregistering the
> vmcore_cb while concurrently reading the vmcore won't crash the system,
> and that is handled by the rwsem.
> 
> To make the mechanism more future proof, let's remove the "read zero"
> part, but leave the warning in place. For example, we could have a future
> driver (like virtio-balloon) that will contact the hypervisor to figure out
> if we already populated a page for a given PFN. Hotunplugging such a device
> and consequently unregistering the vmcore_cb could be triggered from the
> hypervisor without harming the system even while kdump is running. In that

I am a little confused, could you tell more about "contact the hypervisor to
figure out if we already populated a page for a given PFN."? I think
vmcore dumping relies on the eflcorehdr which is created beforehand, and
relies on vmcore_cb registered in advance too, virtio-balloon could take
another way to interact with kdump to make sure the dumpable ram?

Nevertheless, this patch looks good to me, thanks.

Acked-by: Baoquan He 

> case, we don't want to silently end up with a vmcore that contains wrong
> data, because the user inside the VM might be unaware of the hypervisor
> action and might easily miss the warning in the log.
> 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Andrew Morton 
> Cc: Philipp Rudo 
> Cc: kexec@lists.infradead.org
> Cc: linux...@kvack.org
> Cc: linux-fsde...@vger.kernel.org
> Signed-off-by: David Hildenbrand 
> ---
>  fs/proc/vmcore.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 30a3b66f475a..948691cf4a1a 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -65,8 +65,6 @@ static size_t vmcoredd_orig_sz;
>  static DECLARE_RWSEM(vmcore_cb_rwsem);
>  /* List of registered vmcore callbacks. */
>  static LIST_HEAD(vmcore_cb_list);
> -/* Whether we had a surprise unregistration of a callback. */
> -static bool vmcore_cb_unstable;
>  /* Whether the vmcore has been opened once. */
>  static bool vmcore_opened;
>  
> @@ -94,10 +92,8 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
>* very unusual (e.g., forced driver removal), but we cannot stop
>* unregistering.
>*/
> - if (vmcore_opened) {
> + if (vmcore_opened)
>   pr_warn_once("Unexpected vmcore callback unregistration\n");
> - vmcore_cb_unstable = true;
> - }
>   up_write(_cb_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(unregister_vmcore_cb);
> @@ -108,8 +104,6 @@ static bool pfn_is_ram(unsigned long pfn)
>   bool ret = true;
>  
>   lockdep_assert_held_read(_cb_rwsem);
> - if (unlikely(vmcore_cb_unstable))
> - return false;
>  
>   list_for_each_entry(cb, _cb_list, next) {
>   if (unlikely(!cb->pfn_is_ram))
> @@ -577,7 +571,7 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct 
> *vma,
>* looping over all pages without a reason.
>*/
>   down_read(_cb_rwsem);
> - if (!list_empty(_cb_list) || vmcore_cb_unstable)
> + if (!list_empty(_cb_list))
>   ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
>   else
>   ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
> 
> base-commit: debe436e77c72fcee804fb867f275e6d31aa999c
> -- 
> 2.31.1
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1] proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration

2021-11-11 Thread David Hildenbrand
In commit cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback
to more generic vmcore callbacks"), we added detection of surprise
vmcore_cb unregistration after the vmcore was already opened. Once
detected, we warn the user and simulate reading zeroes from that point on
when accessing the vmcore.

The basic reason was that unexpected unregistration, for example, by
manually unbinding a driver from a device after opening the
vmcore, is not supported and could result in reading oldmem the
vmcore_cb would have actually prohibited while registered. However,
something like that can similarly be trigger by a user that's really
looking for trouble simply by unbinding the relevant driver before opening
the vmcore -- or by disallowing loading the driver in the first place.
So it's actually of limited help.

Currently, unregistration can only be triggered via virtio-mem when
manually unbinding the driver from the device inside the VM; there is no
way to trigger it from the hypervisor, as hypervisors don't allow for
unplugging virtio-mem devices -- ripping out system RAM from a VM without
coordination with the guest is usually not a good idea.

The important part is that unbinding the driver and unregistering the
vmcore_cb while concurrently reading the vmcore won't crash the system,
and that is handled by the rwsem.

To make the mechanism more future proof, let's remove the "read zero"
part, but leave the warning in place. For example, we could have a future
driver (like virtio-balloon) that will contact the hypervisor to figure out
if we already populated a page for a given PFN. Hotunplugging such a device
and consequently unregistering the vmcore_cb could be triggered from the
hypervisor without harming the system even while kdump is running. In that
case, we don't want to silently end up with a vmcore that contains wrong
data, because the user inside the VM might be unaware of the hypervisor
action and might easily miss the warning in the log.

Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Philipp Rudo 
Cc: kexec@lists.infradead.org
Cc: linux...@kvack.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: David Hildenbrand 
---
 fs/proc/vmcore.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..948691cf4a1a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -65,8 +65,6 @@ static size_t vmcoredd_orig_sz;
 static DECLARE_RWSEM(vmcore_cb_rwsem);
 /* List of registered vmcore callbacks. */
 static LIST_HEAD(vmcore_cb_list);
-/* Whether we had a surprise unregistration of a callback. */
-static bool vmcore_cb_unstable;
 /* Whether the vmcore has been opened once. */
 static bool vmcore_opened;
 
@@ -94,10 +92,8 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
 * very unusual (e.g., forced driver removal), but we cannot stop
 * unregistering.
 */
-   if (vmcore_opened) {
+   if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback unregistration\n");
-   vmcore_cb_unstable = true;
-   }
up_write(_cb_rwsem);
 }
 EXPORT_SYMBOL_GPL(unregister_vmcore_cb);
@@ -108,8 +104,6 @@ static bool pfn_is_ram(unsigned long pfn)
bool ret = true;
 
lockdep_assert_held_read(_cb_rwsem);
-   if (unlikely(vmcore_cb_unstable))
-   return false;
 
list_for_each_entry(cb, _cb_list, next) {
if (unlikely(!cb->pfn_is_ram))
@@ -577,7 +571,7 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct 
*vma,
 * looping over all pages without a reason.
 */
down_read(_cb_rwsem);
-   if (!list_empty(_cb_list) || vmcore_cb_unstable)
+   if (!list_empty(_cb_list))
ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
else
ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);

base-commit: debe436e77c72fcee804fb867f275e6d31aa999c
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1] proc/vmcore: fix clearing user buffer by properly using clear_user()

2021-11-11 Thread David Hildenbrand
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). Using a kernel config based on rawhide Fedora and a
virtio-mem device that registers a vmcore_cb, I can easily trigger:

[   11.327580] systemd[1]: Starting Kdump Vmcore Save Service...
[   11.339697] kdump[420]: Kdump is using the default log level(3).
[   11.370964] kdump[453]: saving to 
/sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[   11.373997] kdump[458]: saving vmcore-dmesg.txt to 
/sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[   11.385357] kdump[465]: saving vmcore-dmesg.txt complete
[   11.386722] kdump[467]: saving vmcore
[   16.531275] BUG: unable to handle page fault for address: 7f2374e01000
[   16.531705] #PF: supervisor write access in kernel mode
[   16.532037] #PF: error_code(0x0003) - permissions violation
[   16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 
80007048f867
[   16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI
[   16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
[   16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[   16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
[   16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 
38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 
00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
[   16.535670] RSP: 0018:c973be08 EFLAGS: 00010212
[   16.535998] RAX: 1000 RBX: 002fd000 RCX: 7f2374e01000
[   16.536441] RDX: 0001 RSI: dfff RDI: 7f2374e01008
[   16.536878] RBP: 1000 R08:  R09: c973bc50
[   16.537315] R10: c973bc48 R11: 829461a8 R12: f000
[   16.537755] R13: 7f2374e01000 R14:  R15: 88807bd421e8
[   16.538200] FS:  7f2374e12140() GS:88807f00() 
knlGS:
[   16.538696] CS:  0010 DS:  ES:  CR0: 80050033
[   16.539055] CR2: 7f2374e01000 CR3: 7a4aa000 CR4: 00350eb0
[   16.539510] Call Trace:
[   16.539679]  
[   16.539828]  read_vmcore+0x236/0x2c0
[   16.540063]  ? enqueue_hrtimer+0x2f/0x80
[   16.540323]  ? inode_security+0x22/0x60
[   16.540572]  proc_reg_read+0x55/0xa0
[   16.540807]  vfs_read+0x95/0x190
[   16.541022]  ksys_read+0x4f/0xc0
[   16.541238]  do_syscall_64+0x3b/0x90
[   16.541475]  entry_SYSCALL_64_after_hwframe+0x44/0xae

To fix, properly use clear_user() when required.

Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check 
for non-ram pages")
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Philipp Rudo 
Cc: kexec@lists.infradead.org
Cc: linux...@kvack.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: David Hildenbrand 
---
 fs/proc/vmcore.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
 
/* If pfn is not ram, return zeros for sparse dump files */
-   if (!pfn_is_ram(pfn))
-   memset(buf, 0, nr_bytes);
-   else {
+   if (!pfn_is_ram(pfn)) {
+   tmp = 0;
+   if (!userbuf)
+   memset(buf, 0, nr_bytes);
+   else if (clear_user(buf, nr_bytes))
+   tmp = -EFAULT;
+   } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
 nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
   offset, userbuf);
-
-   if (tmp < 0) {
-   up_read(_cb_rwsem);
-   return tmp;
-   }
}
+   if (tmp < 0) {
+   up_read(_cb_rwsem);
+   return tmp;
+   }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;

base-commit: debe436e77c72fcee804fb867f275e6d31aa999c
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec