Thank you very much for your detailed review, Dave! Responses inline below.
> On 5/7/25 04:14, Elena Reshetova wrote: > > In case an SGX vulnerability is discovered and TCB recovery > > for SGX is triggered, Intel specifies a process that must be > > followed for a given vulnerability. Steps to mitigate can vary > > based on vulnerability type, affected components, etc. > > In some cases, a vulnerability can be mitigated via a runtime > > recovery flow by shutting down all running SGX enclaves, > > clearing enclave page cache (EPC), applying a microcode patch > > that does not require a reboot (via late microcode loading) and > > restarting all SGX enclaves. > > Plain language and brevity have a lot of value in changelogs. There's a > substantial amount of irrelevant content here. > > > Even when the above-described runtime recovery flow to mitigate the > > SGX vulnerability is followed, the SGX attestation evidence will > > still reflect the security SVN version being equal to the previous > > state of security SVN (containing vulnerability) that created > > and managed the enclave until the runtime recovery event. This > > limitation currently can be only overcome via a platform reboot, > > which negates all the benefits from the rebootless late microcode > > loading and not required in this case for functional or security > > purposes. > > Can this please be trimmed down? Sure, I can work on trimming the text. > > Actually, I think I wrote changelogs for this once upon a time. Could > you please go dig those up and use them? Could you please suggest where can I find them? Was it for the previous submission done by Cathy? > > > SGX architecture introduced a new instruction called ENCLS[EUPDATESVN] > > to Ice Lake [1]. > > Is it really on "Ice Lake" parts? Like, does it show up on > INTEL_ICELAKE? If not, this is only confusing and mostly irrelevant > information. Ok, will remove. > > > It allows updating security SVN version, given that EPC > > is completely empty. The latter is required for security reasons > > in order to reason that enclave security posture is as secure as the > > security SVN version of the TCB that created it. > > > > Additionally it is important to ensure that while ENCLS[EUPDATESVN] > > runs, no concurrent page creation happens in EPC, because it might > > result in #GP delivered to the creator. Legacy SW might not be prepared > > to handle such unexpected #GPs and therefore this patch introduces > > a locking mechanism in sgx_(vepc_)open to ensure no concurrent EPC > > allocations can happen. > > > > Implement ENCLS[EUPDATESVN] and enable kernel to opportunistically > > call it during sgx_(vepc_)open(). > > > > [1] > > https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true > > These become stale almost immediately. Please also cite the document title. Sure, I can add the title, but I dont have another link. > > > arch/x86/include/asm/sgx.h | 42 ++++++++++++------- > > arch/x86/kernel/cpu/sgx/driver.c | 4 ++ > > arch/x86/kernel/cpu/sgx/encl.c | 1 + > > arch/x86/kernel/cpu/sgx/encls.h | 5 +++ > > arch/x86/kernel/cpu/sgx/main.c | 70 > ++++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/sgx.h | 3 ++ > > arch/x86/kernel/cpu/sgx/virt.c | 6 +++ > > 7 files changed, 116 insertions(+), 15 deletions(-) > > Gah, how did this get squished back down to a single patch? It was > multiple patches before. There are multiple logical things going on here > and they need to be broken out. Yes, but the code actually did get smaller, so it didn’t feel like it is worth breaking this into different patches. But I will follow your suggestion. > > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > > index 6a0069761508..8ac026ef6365 100644 > > --- a/arch/x86/include/asm/sgx.h > > +++ b/arch/x86/include/asm/sgx.h > > @@ -27,22 +27,26 @@ > > /* The bitmask for the EPC section type. */ > > #define SGX_CPUID_EPC_MASK GENMASK(3, 0) > > > > +/* EUPDATESVN support in CPUID.0x12.0.EAX */ > > +#define SGX_CPUID_EUPDATESVN BIT(10) > > + > > enum sgx_encls_function { > > - ECREATE = 0x00, > > - EADD = 0x01, > > - EINIT = 0x02, > > - EREMOVE = 0x03, > > - EDGBRD = 0x04, > > - EDGBWR = 0x05, > > - EEXTEND = 0x06, > > - ELDU = 0x08, > > - EBLOCK = 0x09, > > - EPA = 0x0A, > > - EWB = 0x0B, > > - ETRACK = 0x0C, > > - EAUG = 0x0D, > > - EMODPR = 0x0E, > > - EMODT = 0x0F, > > + ECREATE = 0x00, > > + EADD = 0x01, > > + EINIT = 0x02, > > + EREMOVE = 0x03, > > + EDGBRD = 0x04, > > + EDGBWR = 0x05, > > + EEXTEND = 0x06, > > + ELDU = 0x08, > > + EBLOCK = 0x09, > > + EPA = 0x0A, > > + EWB = 0x0B, > > + ETRACK = 0x0C, > > + EAUG = 0x0D, > > + EMODPR = 0x0E, > > + EMODT = 0x0F, > > + EUPDATESVN = 0x18, > > }; > > > > /** > > @@ -73,6 +77,11 @@ enum sgx_encls_function { > > * public key does not match > IA32_SGXLEPUBKEYHASH. > > * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified > because it > > * is in the PENDING or MODIFIED state. > > + * %SGX_INSUFFICIENT_ENTROPY: Insufficient entropy in RNG. > > + * %SGX_EPC_NOT_READY: EPC is not ready for SVN > update. > > + * %SGX_NO_UPDATE: EUPDATESVN was successful, but > CPUSVN was not > > + * updated because current SVN was not newer > than > > + * CPUSVN. > > * %SGX_UNMASKED_EVENT: An unmasked event, e.g. INTR, was > received > > */ > > enum sgx_return_code { > > @@ -81,6 +90,9 @@ enum sgx_return_code { > > SGX_CHILD_PRESENT = 13, > > SGX_INVALID_EINITTOKEN = 16, > > SGX_PAGE_NOT_MODIFIABLE = 20, > > + SGX_INSUFFICIENT_ENTROPY = 29, > > + SGX_EPC_NOT_READY = 30, > > + SGX_NO_UPDATE = 31, > > SGX_UNMASKED_EVENT = 128, > > }; > > I'd *much* rather that these mechanical constant introductions and > mindless refactoring (like reindenting) not be mixed with actual logic code. Understood. > > > > > diff --git a/arch/x86/kernel/cpu/sgx/driver.c > b/arch/x86/kernel/cpu/sgx/driver.c > > index 7f8d1e11dbee..669e44d61f9f 100644 > > --- a/arch/x86/kernel/cpu/sgx/driver.c > > +++ b/arch/x86/kernel/cpu/sgx/driver.c > > @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file > *file) > > struct sgx_encl *encl; > > int ret; > > > > + ret = sgx_inc_usage_count(); > > + if (ret) > > + return ret; > > + > > encl = kzalloc(sizeof(*encl), GFP_KERNEL); > > if (!encl) > > return -ENOMEM; > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > index 279148e72459..3b54889ae4a4 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -765,6 +765,7 @@ void sgx_encl_release(struct kref *ref) > > WARN_ON_ONCE(encl->secs.epc_page); > > > > kfree(encl); > > + sgx_dec_usage_count(); > > } > > > > /* > > diff --git a/arch/x86/kernel/cpu/sgx/encls.h > b/arch/x86/kernel/cpu/sgx/encls.h > > index 99004b02e2ed..788acf8ec563 100644 > > --- a/arch/x86/kernel/cpu/sgx/encls.h > > +++ b/arch/x86/kernel/cpu/sgx/encls.h > > @@ -233,4 +233,9 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, > void *addr) > > return __encls_2(EAUG, pginfo, addr); > > } > > > > +/* Update CPUSVN at runtime. */ > > +static inline int __eupdatesvn(void) > > +{ > > + return __encls_ret_1(EUPDATESVN, ""); > > +} > > #endif /* _X86_ENCLS_H */ > > diff --git a/arch/x86/kernel/cpu/sgx/main.c > b/arch/x86/kernel/cpu/sgx/main.c > > index 8ce352fc72ac..2872567cd22b 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -15,6 +15,7 @@ > > #include <linux/sysfs.h> > > #include <linux/vmalloc.h> > > #include <asm/sgx.h> > > +#include <asm/archrandom.h> > > #include "driver.h" > > #include "encl.h" > > #include "encls.h" > > @@ -914,6 +915,73 @@ int sgx_set_attribute(unsigned long > *allowed_attributes, > > } > > EXPORT_SYMBOL_GPL(sgx_set_attribute); > > > > +static bool sgx_has_eupdatesvn; > > We have CPUID "caches" of sorts. Why open code this? You mean X86_FEATURE_*? SGX CPUID is only defined in SGX code currently (btw, I am not sure why they are made special) so it doesn’t support this. Is this a past decision that you want to change? > > > +static atomic_t sgx_usage_count; > > Is 32 bits enough for sgx_usage_count? What goes boom when it overflows? I can increase the value easily, but even with current number we are talking about 2^32 enclaves, i dont think this is realistic to happen in practice. > > > +static DEFINE_MUTEX(sgx_svn_lock); > > What does this lock protect? Will add description. > > > +/** > > + * sgx_updatesvn() - Issue ENCLS[EUPDATESVN] > > + * If EPC is empty, this instruction will update CPUSVN to the currently > > + * loaded microcode update SVN and generate new cryptographic assets. > > This is *NOT* EUPDATESVN. "this instruction" is not what's happening here. > > sgx_updatesvn() _tries_ to update the SVN. Most of the time, there will > be no update and that's OK. This should only be called with EPC is empty. Will fix description. > > > + * Return: > > + * 0: Success or not supported > > + * errno on error > > I'm not a big fan of filling thing out just because. > > What value is there in saying "errno on error"? Will remove. > > > + */ > > +static int sgx_update_svn(void) > > +{ > > + int retry = RDRAND_RETRY_LOOPS; > > + int ret; > > + > > + if (!sgx_has_eupdatesvn) > > + return 0; > > This looks goofy. Why is it OK to just silently ignore an update > request? (I know the answer, but it needs to be obvious) Will add the explicit comment. > > > + do { > > + ret = __eupdatesvn(); > > + } while (ret == SGX_INSUFFICIENT_ENTROPY && --retry); > > > for (i = 0; i < RDRAND_RETRY_LOOPS; i++) { > ret = __eupdatesvn(); > > /* Stop on success or unexpected errors: */ > if (ret != SGX_INSUFFICIENT_ENTROPY) > break; > } Will change. > > > + if (!ret || ret == SGX_NO_UPDATE) { > > + /* > > + * SVN successfully updated, or it was already up-to-date. > > + * Let users know when the update was successful. > > + */ > > + if (!ret) > > + pr_info("SVN updated successfully\n"); > > + return 0; > > + } > > Isn't this a lot simpler? > > if (ret == SGX_NO_UPDATE) > return 0; > > if (!ret) { > pr_info("SVN updated successfully\n"); > return 0; > } Sure, will change. > > > + /* > > + * EUPDATESVN was called when EPC is empty, all other error > > + * codes are unexcepted except running out of entropy. > > ^ unexpected > > Spell check, please. Will fix. > > > + */ > > + if (ret != SGX_INSUFFICIENT_ENTROPY) > > + ENCLS_WARN(ret, "EUPDATESVN"); > > + return ret; > > +} > > The indentation here is backwards. The error paths should be indented > and the success path at the lowest indent whenever possible. This, for > example: Will fix. > > if (ret == SGX_NO_UPDATE) > return 0; > > if (ret) { > ENCLS_WARN(ret, "EUPDATESVN"); > return ret; > } > > pr_info("SVN updated successfully\n"); > return 0; > > Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it > was supposed to be horribly rare. Shouldn't we warn on it? The entropy comes from RDSEED in this case, not RDRAND. This is not a horribly rare but very realistic failure. > > > +int sgx_inc_usage_count(void) > > +{ > > No comments, eh? > > What does success _mean_? What does failure mean? Will add. > > > + int ret; > > + > > + if (atomic_inc_not_zero(&sgx_usage_count)) > > + return 0; > > + > > + guard(mutex)(&sgx_svn_lock); > > + > > + if (atomic_inc_not_zero(&sgx_usage_count)) > > + return 0; > > + > > + ret = sgx_update_svn(); > > + if (!ret) > > + atomic_inc(&sgx_usage_count); > > + return ret; > > +} > > Gah, this is 100% *NOT* obvious what's going on. Yet there are zero > comments on it. The lock is uncommented. The atomic is uncommented. > > What does any of this mean? What do the components do? Will fix. > > > + > > +void sgx_dec_usage_count(void) > > +{ > > + atomic_dec(&sgx_usage_count); > > +} > > > > > static int __init sgx_init(void) > > { > > int ret; > > @@ -947,6 +1015,8 @@ static int __init sgx_init(void) > > if (sgx_vepc_init() && ret) > > goto err_provision; > > > > + sgx_has_eupdatesvn = (cpuid_eax(SGX_CPUID) & > SGX_CPUID_EUPDATESVN); > > + > > return 0; > > > > err_provision: > > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > > index d2dad21259a8..f5940393d9bd 100644 > > --- a/arch/x86/kernel/cpu/sgx/sgx.h > > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > > @@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void) > > } > > #endif > > > > +int sgx_inc_usage_count(void); > > +void sgx_dec_usage_count(void); > > + > > void sgx_update_lepubkeyhash(u64 *lepubkeyhash); > > > > #endif /* _X86_SGX_H */ > > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c > > index 7aaa3652e31d..e548de340c2e 100644 > > --- a/arch/x86/kernel/cpu/sgx/virt.c > > +++ b/arch/x86/kernel/cpu/sgx/virt.c > > @@ -255,12 +255,18 @@ static int sgx_vepc_release(struct inode *inode, > struct file *file) > > xa_destroy(&vepc->page_array); > > kfree(vepc); > > > > + sgx_dec_usage_count(); > > return 0; > > } > > > > static int sgx_vepc_open(struct inode *inode, struct file *file) > > { > > struct sgx_vepc *vepc; > > + int ret; > > + > > + ret = sgx_inc_usage_count(); > > + if (ret) > > + return ret; > > > > vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL); > > if (!vepc) > > I think I'd do this in at least 4 patches: > > 1. Introduce the usage count tracking: the atomic and the open/release > "hooks", maybe without error handling on the open() side > 2. Introduce the EUPDATESVN mechanical bits: the CPUID bit, the > enumeration, the bool, the new error enum values > 3. Introduce the mechanical eupdatesvn function. The retry loop and the > "no entropy" handling > 4. Plumb #3 into #1 > > #4 is your place to argue if EUPDATESVN failures should cascade to open(). I will use this breakout for the next version, thank you! Best Regards, Elena.