On Tue, Sep 03, 2019 at 05:26:39PM +0300, Jarkko Sakkinen wrote:
> From: Sean Christopherson <sean.j.christopher...@intel.com>
> 
> Enumerate Enclave Page Cache (EPC) sections via CPUID and add the data
> structures necessary to track EPC pages so that they can be allocated,
> freed and managed. As a system may have multiple EPC sections, invoke
> CPUID on SGX sub-leafs until an invalid leaf is encountered.
> 
> On NUMA systems, a node can have at most one bank. A bank can be at

Is that a DRAM bank or what exactly is a "bank" here?

> most part of two nodes. SGX supports both nodes with a single memory
> controller and also sub-cluster nodes with severals memory controllers

s/severals/several/

> on a single die.
> 
> For simplicity, support a maximum of eight EPC sections. Exisiting

s/Exisiting/Existing/g

Please introduce a spellchecker into your patch creation workflow and
run all your text through it.

> client hardware supports only a single section, while upcoming server
> hardware will support at most eight sections. Bounding the number of
> sections also allows the section ID to be embedded along with a page's
> offset in a single unsigned long, enabling easy retrieval of both the
> VA and PA for a given page.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopher...@intel.com>
> Co-developed-by: Jarkko Sakkinen <jarkko.sakki...@linux.intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakki...@linux.intel.com>
> Co-developed-by: Suresh Siddha <suresh.b.sid...@intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.sid...@intel.com>
> Co-developed-by: Serge Ayoun <serge.ay...@intel.com>
> Signed-off-by: Serge Ayoun <serge.ay...@intel.com>

As before, your SOB needs to come last as you're handling the patch now
but you know already. :)

> ---
>  arch/x86/Kconfig                  |  14 +++
>  arch/x86/kernel/cpu/Makefile      |   1 +
>  arch/x86/kernel/cpu/sgx/Makefile  |   2 +-
>  arch/x86/kernel/cpu/sgx/main.c    | 158 ++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/reclaim.c |  84 ++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h     |  67 +++++++++++++
>  6 files changed, 325 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/cpu/sgx/main.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 222855cc0158..2a8988aaa074 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1934,6 +1934,20 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
>  
>         If unsure, say y.
>  
> +config INTEL_SGX
> +     bool "Intel SGX core functionality"

This sounds like there's other functionality which will have a separate
config option(s) ?

It is not in this patchset though...

> +     depends on X86_64 && CPU_SUP_INTEL
> +     select SRCU
> +     select MMU_NOTIFIER
> +     help
> +       Intel(R) SGX is a set of CPU instructions that can be used by
> +       applications to set aside private regions of code and data, referred
> +       to as enclaves. An enclave's private memory can only be accessed by
> +       code running within the enclave. Accesses from outside the enclave,
> +       including other enclaves, are disallowed by hardware.
> +
> +       If unsure, say N.
> +
>  config EFI
>       bool "EFI runtime service support"
>       depends on ACPI
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index d7a1e5a9331c..97deac5108df 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -45,6 +45,7 @@ obj-$(CONFIG_X86_MCE)                       += mce/
>  obj-$(CONFIG_MTRR)                   += mtrr/
>  obj-$(CONFIG_MICROCODE)                      += microcode/
>  obj-$(CONFIG_X86_CPU_RESCTRL)                += resctrl/
> +obj-$(CONFIG_INTEL_SGX)                      += sgx/
>  
>  obj-$(CONFIG_X86_LOCAL_APIC)         += perfctr-watchdog.o
>  
> diff --git a/arch/x86/kernel/cpu/sgx/Makefile 
> b/arch/x86/kernel/cpu/sgx/Makefile
> index 4432d935894e..fa930e292110 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -1 +1 @@
> -obj-y += encls.o
> +obj-y += encls.o main.o reclaim.o
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> new file mode 100644
> index 000000000000..e2317f6e4374
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -0,0 +1,158 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2016-17 Intel Corporation.
> +
> +#include <linux/freezer.h>
> +#include <linux/highmem.h>
> +#include <linux/kthread.h>
> +#include <linux/pagemap.h>
> +#include <linux/ratelimit.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include "arch.h"
> +#include "sgx.h"
> +
> +struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
> +EXPORT_SYMBOL_GPL(sgx_epc_sections);

This export gets removed again in patch 12. Please audit your whole
patchset for stuff being added and later removed and kill it so that the
diffstat is decreased.

> +
> +int sgx_nr_epc_sections;
> +
> +static __init void sgx_free_epc_section(struct sgx_epc_section *section)
> +{
> +     struct sgx_epc_page *page;
> +
> +     while (!list_empty(&section->page_list)) {
> +             page = list_first_entry(&section->page_list,
> +                                     struct sgx_epc_page, list);
> +             list_del(&page->list);
> +             kfree(page);
> +     }
> +
> +     while (!list_empty(&section->unsanitized_page_list)) {
> +             page = list_first_entry(&section->unsanitized_page_list,
> +                                     struct sgx_epc_page, list);
> +             list_del(&page->list);
> +             kfree(page);
> +     }
> +
> +     memunmap(section->va);
> +}
> +
> +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long 
> index,
> +                                    struct sgx_epc_section *section)

If the "free" function above is the counterpart of this, then this
should be called sgx_alloc_epc_section() or so.

> +{
> +     unsigned long nr_pages = size >> PAGE_SHIFT;
> +     struct sgx_epc_page *page;
> +     unsigned long i;
> +
> +     section->va = memremap(addr, size, MEMREMAP_WB);
> +     if (!section->va)
> +             return -ENOMEM;
> +
> +     section->pa = addr;
> +     spin_lock_init(&section->lock);
> +     INIT_LIST_HEAD(&section->page_list);
> +     INIT_LIST_HEAD(&section->unsanitized_page_list);
> +
> +     for (i = 0; i < nr_pages; i++) {
> +             page = kzalloc(sizeof(*page), GFP_KERNEL);
> +             if (!page)
> +                     goto out;

<---- newline here.

> +             page->desc = (addr + (i << PAGE_SHIFT)) | index;
> +             list_add_tail(&page->list, &section->unsanitized_page_list);
> +             section->free_cnt++;
> +     }
> +
> +     return 0;
> +out:
> +     sgx_free_epc_section(section);
> +     return -ENOMEM;
> +}
> +
> +static __init void sgx_page_cache_teardown(void)
> +{
> +     int i;
> +
> +     for (i = 0; i < sgx_nr_epc_sections; i++)
> +             sgx_free_epc_section(&sgx_epc_sections[i]);
> +}
> +
> +/**
> + * A section metric is concatenated in a way that @low bits 12-31 define the
> + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
> + * metric.
> + */
> +static inline u64 sgx_calc_section_metric(u64 low, u64 high)
> +{
> +     return (low & GENMASK_ULL(31, 12)) +
> +            ((high & GENMASK_ULL(19, 0)) << 32);
> +}
> +
> +static __init int sgx_page_cache_init(void)
> +{
> +     u32 eax, ebx, ecx, edx, type;
> +     u64 pa, size;
> +     int ret;
> +     int i;
> +
> +     BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1));
> +
> +     for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) {

                        ^                        ^ - what are those brackets 
for?

> +             cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF,
> +                         &eax, &ebx, &ecx, &edx);
> +
> +             type = eax & SGX_CPUID_SUB_LEAF_TYPE_MASK;
> +             if (type == SGX_CPUID_SUB_LEAF_INVALID)
> +                     break;

<---- newline here.

> +             if (type != SGX_CPUID_SUB_LEAF_EPC_SECTION) {
> +                     pr_err_once("sgx: Unknown sub-leaf type: %u\n", type);
                                     ^^^^

That's done with:

#undef pr_fmt
#define pr_fmt(fmt)     "sgx: " fmt

for the whole compilation unit or you can simply raise it into sgx.h for
the whole sgx pile.


> +                     return -ENODEV;
> +             }

<---- newline here.

Yeah, let's space out those a bit, for better readability.

> +             if (i == SGX_MAX_EPC_SECTIONS) {
> +                     pr_warn("sgx: More than "
> +                             __stringify(SGX_MAX_EPC_SECTIONS)
> +                             " EPC sections\n");

Huh, what's wrong with using "%d" like a normal printk does?

> +                     break;
> +             }
> +
> +             pa = sgx_calc_section_metric(eax, ebx);
> +             size = sgx_calc_section_metric(ecx, edx);

This size comes from CPUID but it might be prudent to sanity-check it
nevertheless, before doing the memremap().

> +             pr_info("sgx: EPC section 0x%llx-0x%llx\n", pa, pa + size - 1);
> +
> +             ret = sgx_init_epc_section(pa, size, i, &sgx_epc_sections[i]);
> +             if (ret) {
> +                     sgx_page_cache_teardown();

So even if one section fails to allocate, we teardown the whole thing?
I.e., can't run with only 7 or so? IOW, do we absolutely have to fail
here or can we fail more gracefully?

> +                     return ret;
> +             }
> +
> +             sgx_nr_epc_sections++;
> +     }
> +
> +     if (!sgx_nr_epc_sections) {
> +             pr_err("sgx: There are zero EPC sections.\n");
> +             return -ENODEV;
> +     }
> +
> +     return 0;
> +}
> +
> +static __init int sgx_init(void)
> +{
> +     int ret;
> +
> +     if (!boot_cpu_has(X86_FEATURE_SGX))
> +             return false;
> +
> +     ret = sgx_page_cache_init();
> +     if (ret)
> +             return ret;
> +
> +     ret = sgx_page_reclaimer_init();
> +     if (ret) {
> +             sgx_page_cache_teardown();
> +             return ret;
> +     }
> +
> +     return 0;
> +}
> +
> +arch_initcall(sgx_init);

Why does this have to be an arch initcall and can't it run after
detect_sgx() in init_intel()? You'd need to run it only once but that's
easy.

> diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c 
> b/arch/x86/kernel/cpu/sgx/reclaim.c
> new file mode 100644
> index 000000000000..042769f03be9
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/reclaim.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2016-19 Intel Corporation.
> +
> +#include <linux/freezer.h>
> +#include <linux/highmem.h>
> +#include <linux/kthread.h>
> +#include <linux/pagemap.h>
> +#include <linux/ratelimit.h>
> +#include <linux/slab.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> +#include "encls.h"
> +#include "sgx.h"
> +
> +static struct task_struct *ksgxswapd_tsk;
> +
> +static void sgx_sanitize_section(struct sgx_epc_section *section)
> +{
> +     struct sgx_epc_page *page, *tmp;
> +     LIST_HEAD(secs_list);
> +     int ret;
> +
> +     while (!list_empty(&section->unsanitized_page_list)) {
> +             if (kthread_should_stop())
> +                     return;
> +
> +             spin_lock(&section->lock);
> +
> +             page = list_first_entry(&section->unsanitized_page_list,
> +                                     struct sgx_epc_page, list);
> +
> +             ret = __eremove(sgx_epc_addr(page));
> +             if (!ret)
> +                     list_move(&page->list, &section->page_list);
> +             else
> +                     list_move_tail(&page->list, &secs_list);
> +
> +             spin_unlock(&section->lock);
> +
> +             cond_resched();
> +     }
> +
> +     list_for_each_entry_safe(page, tmp, &secs_list, list) {
> +             if (kthread_should_stop())
> +                     return;
> +
> +             ret = __eremove(sgx_epc_addr(page));
> +             if (!WARN_ON_ONCE(ret)) {
> +                     spin_lock(&section->lock);
> +                     list_move(&page->list, &section->page_list);
> +                     spin_unlock(&section->lock);
> +             } else {
> +                     list_del(&page->list);
> +                     kfree(page);
> +             }
> +
> +             cond_resched();
> +     }
> +}

I could use a sentence or two above this function explaining what the
idea behind those page lists is and why we're moving off pages to and
from lists, what the unsanitized_page_list is, how it is being used,
etc. That probably has come up already so pointing me to the text would
suffice too.

> +
> +static int ksgxswapd(void *p)
> +{
> +     int i;
> +
> +     set_freezable();
> +
> +     for (i = 0; i < sgx_nr_epc_sections; i++)
> +             sgx_sanitize_section(&sgx_epc_sections[i]);
> +
> +     return 0;
> +}
> +
> +int sgx_page_reclaimer_init(void)
> +{
> +     struct task_struct *tsk;
> +
> +     tsk = kthread_run(ksgxswapd, NULL, "ksgxswapd");
> +     if (IS_ERR(tsk))
> +             return PTR_ERR(tsk);
> +
> +     ksgxswapd_tsk = tsk;
> +
> +     return 0;
> +}
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> new file mode 100644
> index 000000000000..3009ec816339
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> +#ifndef _X86_SGX_H
> +#define _X86_SGX_H
> +
> +#include <linux/bitops.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/rwsem.h>
> +#include <linux/types.h>
> +#include <asm/asm.h>
> +#include <uapi/asm/sgx_errno.h>
> +
> +struct sgx_epc_page {
> +     unsigned long desc;
> +     struct list_head list;
> +};
> +
> +/**
> + * struct sgx_epc_section
> + *
> + * The firmware can define multiple chunks of EPC to the different areas of 
> the

My usual question: what if fw doesn't? Can we define our own chunks or
do we need special firmware support for the whole EPC thing to even
exist?

> + * physical memory e.g. for memory areas of the each node. This structure is
> + * used to store EPC pages for one EPC section and virtual memory area where
> + * the pages have been mapped.
> + */

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to