On 12/16/20 5:50 AM, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.

It makes it harder to comment when I'm not on cc.

Hint, hint... ;)

We need a bit more information here as well.  What's the relationship
between NUMA nodes and sections?  How does the BIOS tell us which NUMA
nodes a section is in?  Is it the same or different from normal RAM and
PMEM?

> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index c519fc5f6948..0da510763c47 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -13,6 +13,13 @@
>  #include "encl.h"
>  #include "encls.h"
>  
> +struct sgx_numa_node {
> +     struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS];
> +     int nr_sections;
> +};

So, we have a 'NUMA node' structure already: pg_data_t.  Why don't we
just hang the epc sections off there?

> +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES];

Hmm...  Time to see if I can still do math.

#define SGX_MAX_EPC_SECTIONS            8

(sizeof(struct sgx_epc_section *) + sizeof(int)) * 8 * MAX_NUMNODES

CONFIG_NODES_SHIFT=10 (on Fedora)
#define MAX_NUMNODES (1 << NODES_SHIFT)

12*8*1024 = ~100k.  Yikes.  For *EVERY* system that enables SGX,
regardless if they are NUMA or not.'

Trivial is great, this may be too trivial.

Adding a list_head to pg_data_t and sgx_epc_section would add
SGX_MAX_EPC_SECTIONS*sizeof(list_head)=192 bytes plus 16 bytes per
*present* NUMA node.

>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
>   *
> @@ -485,14 +511,19 @@ static struct sgx_epc_page 
> *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
>   */
>  struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  {
> -     struct sgx_epc_section *section;
>       struct sgx_epc_page *page;
> +     int nid = numa_node_id();
>       int i;
>  
> -     for (i = 0; i < sgx_nr_epc_sections; i++) {
> -             section = &sgx_epc_sections[i];
> +     page = __sgx_alloc_epc_page_from_node(nid);
> +     if (page)
> +             return page;
>  
> -             page = __sgx_alloc_epc_page_from_section(section);
> +     for (i = 0; i < sgx_nr_numa_nodes; i++) {
> +             if (i == nid)
> +                     continue;

Yikes.  That's a horribly inefficient loop.  Consider if nodes 0 and
1023 were the only ones with EPC.  What would this loop do?  I think
it's a much better idea to keep a nodemask_t of nodes that have EPC.
Then, just do bitmap searches.

> +             page = __sgx_alloc_epc_page_from_node(i);
>               if (page)
>                       return page;
>       }
> @@ -661,11 +692,28 @@ static inline u64 __init sgx_calc_section_metric(u64 
> low, u64 high)
>              ((high & GENMASK_ULL(19, 0)) << 32);
>  }
>  
> +static int __init sgx_pfn_to_nid(unsigned long pfn)
> +{
> +     pg_data_t *pgdat;
> +     int nid;
> +
> +     for (nid = 0; nid < nr_node_ids; nid++) {
> +             pgdat = NODE_DATA(nid);
> +
> +             if (pfn >= pgdat->node_start_pfn &&
> +                 pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
> +                     return nid;
> +     }
> +
> +     return 0;
> +}

I'm not positive this works.  I *thought* these ->node_start_pfn and
->node_spanned_pages are really only guaranteed to cover memory which is
managed by the kernel and has 'struct page' for it.

EPC doesn't have a 'struct page', so won't necessarily be covered by the
pgdat-> and zone-> ranges.  I *think* you may have to go all the way
back to the ACPI SRAT for this.

It would also be *possible* to have an SRAT constructed like this:

0->1GB System RAM - Node 0
1->2GB Reserved   - Node 1
2->3GB System RAM - Node 0

Where the 1->2GB is EPC.  The Node 0 pg_data_t would be:

        pgdat->node_start_pfn = 0
        pgdat->node_spanned_pages = 3GB


Reply via email to