Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 12/6/23 12:51, Peter Zijlstra wrote: > On Wed, Dec 06, 2023 at 10:37:33AM -0600, Madhavan T. Venkataraman wrote: >> >> >> On 11/30/23 05:33, Peter Zijlstra wrote: >>> On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote: >>> >>>> Kernel Lockdown >>>> --- >>>> >>>> But, we must provide at least some security in V2. Otherwise, it is >>>> useless. >>>> >>>> So, we have implemented what we call a kernel lockdown. At the end of >>>> kernel >>>> boot, Heki establishes permissions in the extended page table as mentioned >>>> before. Also, it adds an immutable attribute for kernel text and kernel RO >>>> data. >>>> Beyond that point, guest requests that attempt to modify permissions on >>>> any of >>>> the immutable pages will be denied. >>>> >>>> This means that features like FTrace and KProbes will not work on kernel >>>> text >>>> in V2. This is a temporary limitation. Once authentication is in place, the >>>> limitation will go away. >>> >>> So either you're saying your patch 17 / text_poke is broken (so why >>> include it ?!?) or your statement above is incorrect. Pick one. >>> >> >> It has been included so that people can be aware of the changes. >> >> I will remove the text_poke() changes from the patchset and send it later >> when >> I have some authentication in place. It will make sense then. > > If you know its broken then fucking say so in the Changelog instead of > wasting everybody's time.. OMG. It is not broken. It addresses one part of the problem. The other part is WIP. I am preparing a detailed response to your comments. I ask you to be patient until then. In fact, I would appreciate your input/suggestions on some problems we are trying to solve in this context. I will mention them in my response. Madhavan
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 11/30/23 18:45, Edgecombe, Rick P wrote: > On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote: >> Threat Model >> >> >> In the threat model in Heki, the attacker is a user space attacker >> who exploits >> a kernel vulnerability to gain more privileges or bypass the kernel's >> access >> control and self-protection mechanisms. >> >> In the context of the guest page table, one of the things that the >> threat model translates >> to is a hacker gaining access to a guest page with RWX permissions. >> E.g., by adding execute >> permissions to a writable page or by adding write permissions to an >> executable page. >> >> Today, the permissions for a guest page in the extended page table >> are RWX by >> default. So, if a hacker manages to establish RWX for a page in the >> guest page >> table, then that is all he needs to do some damage. > > I had a few random comments from watching the plumbers talk online: > > Is there really a big difference between a page that is RWX, and a RW > page that is about to become RX? I realize that there is an addition of > timing, but when executable code is getting loaded it can be written to > then and later executed. I think that gap could be addressed in two > different ways, both pretty difficult: > 1. Verifying the loaded code before it gets marked > executable. This is difficult because the kernel does lots of > tweaks on the code it is loading (alternatives, etc). It can't > just check a signature. > 2. Loading the code in a protected environment. In this model the > (for example) module signature would be checked, then the code > would be loaded in some sort of protected environment. This way > integrity of the loaded code would be enforced. But extracting > module loading into a separate domain would be difficult. > Various scattered features all have their hands in the loading. > > Secondly, I wonder if another way to look at the memory parts of HEKI > could be that this is a way to protect certain page table bits from > stay writes. The RWX bits in the EPT are not directly writable, so more > steps are needed to change things than just a stray write (instead the > helpers involved in the operations need to be called). If that is a > fair way of looking at it, then I wonder how HEKI compares to a > solution like this security-wise: > https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgeco...@intel.com/ > > Functional-wise it had the benefit of working on bare metal and > supporting the normal kernel features. Thanks for the comments. I will think about what you have said and will respond soon. Madhavan
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 11/30/23 05:33, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote: > >> Kernel Lockdown >> --- >> >> But, we must provide at least some security in V2. Otherwise, it is useless. >> >> So, we have implemented what we call a kernel lockdown. At the end of kernel >> boot, Heki establishes permissions in the extended page table as mentioned >> before. Also, it adds an immutable attribute for kernel text and kernel RO >> data. >> Beyond that point, guest requests that attempt to modify permissions on any >> of >> the immutable pages will be denied. >> >> This means that features like FTrace and KProbes will not work on kernel text >> in V2. This is a temporary limitation. Once authentication is in place, the >> limitation will go away. > > So either you're saying your patch 17 / text_poke is broken (so why > include it ?!?) or your statement above is incorrect. Pick one. > It has been included so that people can be aware of the changes. I will remove the text_poke() changes from the patchset and send it later when I have some authentication in place. It will make sense then. > >> __text_poke() >> This function is called by various features to patch text. >> This calls heki_text_poke_start() and heki_text_poke_end(). >> >> heki_text_poke_start() is called to add write permissions to the >> extended page table so that text can be patched. heki_text_poke_end() >> is called to revert write permissions in the extended page table. > > This, if text_poke works, then static_call / jump_label / ftrace and > everything else should work, they all rely on this. > > >> Peter mentioned the following: >> >> "if you want to mirror the native PTEs why don't you hook into the >> paravirt page-table muck and get all that for free?" >> >> We did consider using a shadow page table kind of approach so that guest >> page table >> modifications can be intercepted and reflected in the page table entry. We >> did not >> do this for two reasons: >> >> - there are bits in the page table entry that are not permission bits. We >> would like >> the guest kernel to be able to modify them directly. > > This statement makes no sense. > >> - we cannot tell a genuine request from an attack. > > Why not? How is an explicit call different from an explicit call in a > paravirt hook? > >>From a maintenance pov we already hate paravirt with a passion, but it > is ever so much better than sprinkling yet another pile of crap > (heki_*) around. I only said that the idea was considered. We can resume the discussion on this topic when I send the text_poke() changes in a later version of the Heki patchset. Madhavan
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 11/27/23 14:08, Peter Zijlstra wrote: > On Mon, Nov 27, 2023 at 10:48:29AM -0600, Madhavan T. Venkataraman wrote: >> Apologies for the late reply. I was on vacation. Please see my response >> below: >> >> On 11/13/23 02:19, Peter Zijlstra wrote: >>> On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote: >>>> From: Madhavan T. Venkataraman >>>> >>>> X86 uses a function called __text_poke() to modify executable code. This >>>> patching function is used by many features such as KProbes and FTrace. >>>> >>>> Update the permissions counters for the text page so that write >>>> permissions can be temporarily established in the EPT to modify the >>>> instructions in that page. >>>> >>>> Cc: Borislav Petkov >>>> Cc: Dave Hansen >>>> Cc: H. Peter Anvin >>>> Cc: Ingo Molnar >>>> Cc: Kees Cook >>>> Cc: Madhavan T. Venkataraman >>>> Cc: Mickaël Salaün >>>> Cc: Paolo Bonzini >>>> Cc: Sean Christopherson >>>> Cc: Thomas Gleixner >>>> Cc: Vitaly Kuznetsov >>>> Cc: Wanpeng Li >>>> Signed-off-by: Madhavan T. Venkataraman >>>> --- >>>> >>>> Changes since v1: >>>> * New patch >>>> --- >>>> arch/x86/kernel/alternative.c | 5 >>>> arch/x86/mm/heki.c| 49 +++ >>>> include/linux/heki.h | 14 ++ >>>> 3 files changed, 68 insertions(+) >>>> >>>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c >>>> index 517ee01503be..64fd8757ba5c 100644 >>>> --- a/arch/x86/kernel/alternative.c >>>> +++ b/arch/x86/kernel/alternative.c >>>> @@ -18,6 +18,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void >>>> *addr, const void *src, size_t l >>>> */ >>>>pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL); >>>> >>>> + heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot); >>>>/* >>>> * The lock is not really needed, but this allows to avoid open-coding. >>>> */ >>>> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void >>>> *addr, const void *src, size_t l >>>>} >>>> >>>>local_irq_restore(flags); >>>> + >>>>pte_unmap_unlock(ptep, ptl); >>>> + heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot); >>>> + >>>>return addr; >>>> } >>> >>> This makes no sense, we already use a custom CR3 with userspace alias >>> for the actual pages to write to, why are you then frobbing permissions >>> on that *again* ? >> >> Today, the permissions for a guest page in the extended page table >> (EPT) are RWX (unless permissions are restricted for some specific >> reason like for shadow page table pages). In this Heki feature, we >> don't allow RWX by default in the EPT. We only allow those permissions >> in the EPT that the guest page actually needs. E.g., for a text page, >> it is R_X in both the guest page table and the EPT. > > To what end? If you always mirror what the guest does, you've not > actually gained anything. > >> For text patching, the above code establishes an alternate mapping in >> the guest page table that is RW_ so that the text can be patched. That >> needs to be reflected in the EPT so that the EPT permissions will >> change from R_X to RWX. In other words, RWX is allowed only as >> necessary. At the end of patching, the EPT permissions are restored to >> R_X. >> >> Does that address your comment? > > No, if you want to mirror the native PTEs why don't you hook into the > paravirt page-table muck and get all that for free? > > Also, this is the user range, are you saying you're also playing these > daft games with user maps? I think that we should have done a better job of communicating the threat model in Heki and how we are trying to address it. I will correct that here. I think this will help answer many questions. Some of these questions also came up in the LPC when we presented this. Apologies for the slightly long answe
Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
On 11/27/23 14:03, Peter Zijlstra wrote: > On Mon, Nov 27, 2023 at 11:05:23AM -0600, Madhavan T. Venkataraman wrote: >> Apologies for the late reply. I was on vacation. Please see my response >> below: >> >> On 11/13/23 02:54, Peter Zijlstra wrote: >>> On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote: >>>> From: Madhavan T. Venkataraman >>>> >>>> Implement a hypervisor function, kvm_protect_memory() that calls the >>>> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to >>>> set specified permissions on a list of guest pages. >>>> >>>> Using the protect_memory() function, set proper EPT permissions for all >>>> guest pages. >>>> >>>> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static >>>> sections and the boot-time read-only sections. This enables to make sure >>>> a compromised guest will not be able to change its main physical memory >>>> page permissions. However, this also disable any feature that may change >>>> the kernel's text section (e.g., ftrace, Kprobes), but they can still be >>>> used on kernel modules. >>>> >>>> Module loading/unloading, and eBPF JIT is allowed without restrictions >>>> for now, but we'll need a way to authenticate these code changes to >>>> really improve the guests' security. We plan to use module signatures, >>>> but there is no solution yet to authenticate eBPF programs. >>>> >>>> Being able to use ftrace and Kprobes in a secure way is a challenge not >>>> solved yet. We're looking for ideas to make this work. >>>> >>>> Likewise, the JUMP_LABEL feature cannot work because the kernel's text >>>> section is read-only. >>> >>> What is the actual problem? As is the kernel text map is already RO and >>> never changed. >> >> For the JUMP_LABEL optimization, the text needs to be patched at some point. >> That patching requires a writable mapping of the text page at the time of >> patching. >> >> In this Heki feature, we currently lock down the kernel text at the end of >> kernel boot just before kicking off the init process. The lockdown is >> implemented by setting the permissions of a text page to R_X in the extended >> page table and not allowing write permissions in the EPT after that. So, >> jump label >> patching during kernel boot is not a problem. But doing it after kernel >> boot is a problem. > > But you see, that's exactly what the kernel already does with the normal > permissions. They get set to RX after init and are never changed. > > See the previous patch, we establish a read-write alias and write there. > > You seem to lack basic understanding of how the kernel works in this > regard, which makes me very nervous about you touching any of this. > > I must also say I really dislike your extra/random permssion calls all > over the place. They don't really get us anything afaict. Why can't you > plumb into the existing set_memory_*() family? I have responded to your comments on your other email. Please read my response there. Thanks. Madhavan
Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
Apologies for the late reply. I was on vacation. Please see my response below: On 11/13/23 02:54, Peter Zijlstra wrote: > On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote: >> From: Madhavan T. Venkataraman >> >> Implement a hypervisor function, kvm_protect_memory() that calls the >> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to >> set specified permissions on a list of guest pages. >> >> Using the protect_memory() function, set proper EPT permissions for all >> guest pages. >> >> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static >> sections and the boot-time read-only sections. This enables to make sure >> a compromised guest will not be able to change its main physical memory >> page permissions. However, this also disable any feature that may change >> the kernel's text section (e.g., ftrace, Kprobes), but they can still be >> used on kernel modules. >> >> Module loading/unloading, and eBPF JIT is allowed without restrictions >> for now, but we'll need a way to authenticate these code changes to >> really improve the guests' security. We plan to use module signatures, >> but there is no solution yet to authenticate eBPF programs. >> >> Being able to use ftrace and Kprobes in a secure way is a challenge not >> solved yet. We're looking for ideas to make this work. >> >> Likewise, the JUMP_LABEL feature cannot work because the kernel's text >> section is read-only. > > What is the actual problem? As is the kernel text map is already RO and > never changed. For the JUMP_LABEL optimization, the text needs to be patched at some point. That patching requires a writable mapping of the text page at the time of patching. In this Heki feature, we currently lock down the kernel text at the end of kernel boot just before kicking off the init process. The lockdown is implemented by setting the permissions of a text page to R_X in the extended page table and not allowing write permissions in the EPT after that. So, jump label patching during kernel boot is not a problem. But doing it after kernel boot is a problem. The lockdown is just for the current Heki implementation. In the future, we plan to have a way of authenticating guest requests to change permissions on a text page. Once that is in place, permissions on text pages can be changed on the fly to support features that depend on text patching - FTrace, KProbes, etc. Madhavan
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
Apologies for the late reply. I was on vacation. Please see my response below: On 11/13/23 02:19, Peter Zijlstra wrote: > On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote: >> From: Madhavan T. Venkataraman >> >> X86 uses a function called __text_poke() to modify executable code. This >> patching function is used by many features such as KProbes and FTrace. >> >> Update the permissions counters for the text page so that write >> permissions can be temporarily established in the EPT to modify the >> instructions in that page. >> >> Cc: Borislav Petkov >> Cc: Dave Hansen >> Cc: H. Peter Anvin >> Cc: Ingo Molnar >> Cc: Kees Cook >> Cc: Madhavan T. Venkataraman >> Cc: Mickaël Salaün >> Cc: Paolo Bonzini >> Cc: Sean Christopherson >> Cc: Thomas Gleixner >> Cc: Vitaly Kuznetsov >> Cc: Wanpeng Li >> Signed-off-by: Madhavan T. Venkataraman >> --- >> >> Changes since v1: >> * New patch >> --- >> arch/x86/kernel/alternative.c | 5 >> arch/x86/mm/heki.c| 49 +++ >> include/linux/heki.h | 14 ++ >> 3 files changed, 68 insertions(+) >> >> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c >> index 517ee01503be..64fd8757ba5c 100644 >> --- a/arch/x86/kernel/alternative.c >> +++ b/arch/x86/kernel/alternative.c >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, >> const void *src, size_t l >> */ >> pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL); >> >> +heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot); >> /* >> * The lock is not really needed, but this allows to avoid open-coding. >> */ >> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void >> *addr, const void *src, size_t l >> } >> >> local_irq_restore(flags); >> + >> pte_unmap_unlock(ptep, ptl); >> +heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot); >> + >> return addr; >> } > > This makes no sense, we already use a custom CR3 with userspace alias > for the actual pages to write to, why are you then frobbing permissions > on that *again* ? Today, the permissions for a guest page in the extended page table (EPT) are RWX (unless permissions are restricted for some specific reason like for shadow page table pages). In this Heki feature, we don't allow RWX by default in the EPT. We only allow those permissions in the EPT that the guest page actually needs. E.g., for a text page, it is R_X in both the guest page table and the EPT. For text patching, the above code establishes an alternate mapping in the guest page table that is RW_ so that the text can be patched. That needs to be reflected in the EPT so that the EPT permissions will change from R_X to RWX. In other words, RWX is allowed only as necessary. At the end of patching, the EPT permissions are restored to R_X. Does that address your comment? Madhavan
Re: [PATCH v1 2/9] KVM: x86/mmu: Add support for prewrite page tracking
On 5/5/23 12:31, Sean Christopherson wrote: > On Fri, May 05, 2023, Micka�l Sala�n wrote: >> >> On 05/05/2023 18:28, Sean Christopherson wrote: >>> I have no doubt that we'll need to solve performance and scaling issues >>> with the >>> memory attributes implementation, e.g. to utilize xarray multi-range support >>> instead of storing information on a per-4KiB-page basis, but AFAICT, the >>> core >>> idea is sound. And a very big positive from a maintenance perspective is >>> that >>> any optimizations, fixes, etc. for one use case (CoCo vs. hardening) should >>> also >>> benefit the other use case. >>> >>> [1] https://lore.kernel.org/all/20230311002258.852397-22-sea...@google.com >>> [2] https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com >>> [3] https://lore.kernel.org/all/y1a1i9vbj%2fpvm...@google.com >> >> I agree, I used this mechanism because it was easier at first to rely on a >> previous work, but while I was working on the MBEC support, I realized that >> it's not the optimal way to do it. >> >> I was thinking about using a new special EPT bit similar to >> EPT_SPTE_HOST_WRITABLE, but it may not be portable though. What do you >> think? > > On x86, SPTEs are even more ephemeral than memslots. E.g. for historical > reasons, > KVM zaps all SPTEs if _any_ memslot is deleted, which is problematic if the > guest > is moving around BARs, using option ROMs, etc. > > ARM's pKVM tracks metadata in its stage-2 PTEs, i.e. doesn't need an xarray to > otrack attributes, but that works only because pKVM is more privileged than > the > host kernel, and the shared vs. private memory attribute that pKVM cares about > is very, very restricted in how it can be used and changed. > > I tried shoehorning private vs. shared metadata into x86's SPTEs in the past, > and > it ended up being a constant battle with the kernel, e.g. page migration, and > with > KVM itself, e.g. the above memslot mess. Sorry for the delay in responding to this. I wanted to study the KVM code and fully understand your comment before responding. Yes, I quite agree with you. I will make an attempt to address this in the next version. I am working on it right now. Thanks. Madhavan
Re: [PATCH v1 3/9] virt: Implement Heki common code
Sorry for the delay. See inline... On 5/8/23 12:29, Wei Liu wrote: > On Fri, May 05, 2023 at 05:20:40PM +0200, Mickaël Salaün wrote: >> From: Madhavan T. Venkataraman >> >> Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use >> the hypervisor to enhance guest virtual machine security. >> >> Configuration >> = >> >> Define the config variables for the feature. This feature depends on >> support from the architecture as well as the hypervisor. >> >> Enabling HEKI >> = >> >> Define a kernel command line parameter "heki" to turn the feature on or >> off. By default, Heki is on. > > For such a newfangled feature can we have it off by default? Especially > when there are unsolved issues around dynamically loaded code. > Yes. We can certainly do that. >> > [...] >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 3604074a878b..5cf5a7a97811 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -297,6 +297,7 @@ config X86 >> select FUNCTION_ALIGNMENT_4B >> imply IMA_SECURE_AND_OR_TRUSTED_BOOTif EFI >> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> +select ARCH_SUPPORTS_HEKI if X86_64 > > Why is there a restriction on X86_64? > We want to get the PoC working and reviewed on X64 first. We have tested this only on X64 so far. >> >> config INSTRUCTION_DECODER >> def_bool y >> diff --git a/arch/x86/include/asm/sections.h >> b/arch/x86/include/asm/sections.h >> index a6e8373a5170..42ef1e33b8a5 100644 >> --- a/arch/x86/include/asm/sections.h >> +++ b/arch/x86/include/asm/sections.h > [...] >> >> +#ifdef CONFIG_HEKI >> + >> +/* >> + * Gather all of the statically defined sections so heki_late_init() can >> + * protect these sections in the host page table. >> + * >> + * The sections are defined under "SECTIONS" in vmlinux.lds.S >> + * Keep this array in sync with SECTIONS. >> + */ > > This seems a bit fragile, because it requires constant attention from > people who care about this functionality. Can this table be > automatically generated? > We realize that. But I don't know of a way this can be automatically generated. Also, the permissions for each section is specific to the use of that section. The developer who introduces a new section is the one who will know what the permissions should be. If any one has any ideas of how we can generate this table automatically or even just add a build time check of some sort, please let us know. Thanks. Madhavan > Thanks, > Wei. > >> +struct heki_va_range __initdata heki_va_ranges[] = { >> +{ >> +.va_start = _stext, >> +.va_end = _etext, >> +.attributes = HEKI_ATTR_MEM_NOWRITE | HEKI_ATTR_MEM_EXEC, >> +}, >> +{ >> +.va_start = __start_rodata, >> +.va_end = __end_rodata, >> +.attributes = HEKI_ATTR_MEM_NOWRITE, >> +}, >> +#ifdef CONFIG_UNWINDER_ORC >> +{ >> +.va_start = __start_orc_unwind_ip, >> +.va_end = __stop_orc_unwind_ip, >> +.attributes = HEKI_ATTR_MEM_NOWRITE, >> +}, >> +{ >> +.va_start = __start_orc_unwind, >> +.va_end = __stop_orc_unwind, >> +.attributes = HEKI_ATTR_MEM_NOWRITE, >> +}, >> +{ >> +.va_start = orc_lookup, >> +.va_end = orc_lookup_end, >> +.attributes = HEKI_ATTR_MEM_NOWRITE, >> +}, >> +#endif /* CONFIG_UNWINDER_ORC */ >> +}; >> +