> On May 14, 2019, at 1:25 AM, Alexandre Chartre <alexandre.char...@oracle.com> 
> wrote:
> 
> 
>> On 5/14/19 9:09 AM, Peter Zijlstra wrote:
>>> On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:
>>> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
>>> <alexandre.char...@oracle.com> wrote:
>>>> 
>>>> pcpu_base_addr is already mapped to the KVM address space, but this
>>>> represents the first percpu chunk. To access a per-cpu buffer not
>>>> allocated in the first chunk, add a function which maps all cpu
>>>> buffers corresponding to that per-cpu buffer.
>>>> 
>>>> Also add function to clear page table entries for a percpu buffer.
>>>> 
>>> 
>>> This needs some kind of clarification so that readers can tell whether
>>> you're trying to map all percpu memory or just map a specific
>>> variable.  In either case, you're making a dubious assumption that
>>> percpu memory contains no secrets.
>> I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
>> does contain secrits, invalidating that premise.
> 
> The current code unconditionally maps the entire first percpu chunk
> (pcpu_base_addr). So it assumes it doesn't contain any secret. That is
> mainly a simplification for the POC because a lot of core information
> that we need, for example just to switch mm, are stored there (like
> cpu_tlbstate, current_task...).

I don’t think you should need any of this.

> 
> If the entire first percpu chunk effectively has secret then we will
> need to individually map only buffers we need. The kvm_copy_percpu_mapping()
> function is added to copy mapping for a specified percpu buffer, so
> this used to map percpu buffers which are not in the first percpu chunk.
> 
> Also note that mapping is constrained by PTE (4K), so mapped buffers
> (percpu or not) which do not fill a whole set of pages can leak adjacent
> data store on the same pages.
> 
> 

I would take a different approach: figure out what you need and put it in its 
own dedicated area, kind of like cpu_entry_area.

One nasty issue you’ll have is vmalloc: the kernel stack is in the vmap range, 
and, if you allow access to vmap memory at all, you’ll need some way to ensure 
that *unmap* gets propagated. I suspect the right choice is to see if you can 
avoid using the kernel stack at all in isolated mode.  Maybe you could run on 
the IRQ stack instead.

Reply via email to