On Mon, Sep 25, 2017 at 9:36 AM, Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > > this issue was discussed at Plumbers and it seems there may be > a solution in sight. The work on 'speculative page faults' will > remove mm->mmap_sem in favor of srcu approach with sequence numbers > and we will be able to do find_vma() and vma->vm_ops->access() from > the non-sleepable context. > From bpf program point of view it probably be a new helper > bpf_probe_read_harder() ;) or something that will try normal > pagefault_disabled read first and if it fails will try > srcu_read_lock+vma->access approach. >
Thank you Alexei for your reply and sorry for the delay, I just finally found the time over the weekend to go over your message more deeply. I applied the speculative page fault patch to my tree to better understand the implications of your comment and indeed this patch (way over my head!) seems a huge leap forward because it allows us to lookup a VMA without taking any lock, so we can do it in a non-sleepable context. However, I am still missing how this could be a resolutive fix. Let's imagine for example the case I mentioned above where we have a fork() child and right after the fork all VMAs referring to mapped files will not have any valid PTEs (but the file is already in the page cache). In this case, there's little we can do beside grabbing the VMA and asking some vma->vm_ops to give us the page corresponding to the address we're looking for. With the speculative fault, we can do it also from a BPF helper, however some vm_ops methods are not ready to be called in a non-sleepable context. For example, for filemap: - fault() is not safe because it consistently ends up in a might_sleep() invocation [1][2] - map_pages() seems safe (but is it also for other VMA implementations?) - access() is not defined So, which ones would this BPF helper call in order to guarantee usefulness while not causing blocking? Just calling vm_ops->access() wouldn't help in this case since it's not defined. Looking at the code for __access_remote_vm(), it seems it does a mix of get_user_pages() (which in turn calls vm_ops->fault() and/or vm_ops->map_pages()) and as a fallback it uses vm_ops->access(), but of course that one can sleep. Perhaps the solution is much simpler and I just didn't grasp all the implications of this work? (sorry again, it's the first time I dabble in this subsystem). Thanks [1] https://github.com/torvalds/linux/blob/v4.13/mm/filemap.c#L2372 [2] https://github.com/torvalds/linux/blob/v4.13/include/linux/pagemap.h#L496 _______________________________________________ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev