Re: [RFC] [PATCH] Pre-emption control for userspace

Khalid Aziz Tue, 04 Mar 2014 09:57:44 -0800

Thanks for the review. Please see my comments inline below.


On 03/04/2014 06:56 AM, Oleg Nesterov wrote:

On 03/03, Khalid Aziz wrote:

  kernel/sched/preempt_delay.c                    |  39 ++++++


Why? This can go into proc/ as well.

Sure. No strong reason to keep these functions in separate file. Thesefunctions can go into proc/fs/base.c.

+static void
+close_preempt_delay_vmops(struct vm_area_struct *vma)
+{
+       struct preemp_delay_mmap_state *state;
+
+       state = (struct preemp_delay_mmap_state *) vma->vm_private_data;
+       BUG_ON(!state || !state->task);
+
+       state->page->mapping = NULL;
+       /* point delay request flag pointer back to old flag in task_struct */
+       state->task->sched_preempt_delay.delay_req =
+                       &state->task->sched_preempt_delay.delay_flag;
+       state->task->sched_preempt_delay.mmap_state = NULL;
+       vfree(state->kaddr);
+       kfree(state);
+       vma->vm_private_data = NULL;
+}


Suppose that state->task != current. Then this can race with do_exit()
which cleanups ->mmap_state too. OTOH do_exit() unmaps this region, it
is not clear why it can't rely in vm_ops->close().

Hmm. In fact I think do_exit() should crash after munmap? ->mmap_state
should be NULL ?? Perhaps I misread this patch completely...

do_exit() unmaps mmap_state->uaddr, and frees up mmap_state->kaddr andmmap_state. mmap_state should not be NULL after unmap. vfree() andkfree() are tolerant of pointers that have already been freed. On theother hand mmap_state can be NULL in do_exit() if do_exit() andclose_preempt_delay_vmops() were to race sinceclose_preempt_delay_vmops() sets mmap_state to NULL just before it freesit up. Could they indeed race, because the thread happens to be killedjust as it had called munmap()? I can protect against that with arefcount in mmap_state. Do you feel this is necessary/helpful to do?

+static int
+tid_preempt_delay_mmap(struct file *file, struct vm_area_struct *vma)
+{
+       int retval = 0;
+       void *kaddr = NULL;
+       struct preemp_delay_mmap_state *state = NULL;
+       struct inode *inode = file_inode(file);
+       struct task_struct *task;
+       struct page *page;
+
+       /*
+        * Validate args:
+        * - Only offset 0 support for now
+        * - size should be PAGE_SIZE
+        */
+       if (vma->vm_pgoff != 0 || (vma->vm_end - vma->vm_start) != PAGE_SIZE) {
+               retval = -EINVAL;
+               goto error;
+       }
+
+       /*
+        * Only one mmap allowed at a time
+        */
+       if (current->sched_preempt_delay.mmap_state != NULL) {
+               retval = -EEXIST;
+               goto error;


This assumes that we are going to setup current->sched_preempt_delay.mmap_state,
but what if the task opens /proc/random_tid/sched_preempt_delay ?

Good point. A thread should not be allowed to request preemption delayfor another thread. I would recommend leaving this code alone and addingfollowing code before this:


if (get_proc_task(inode) != current) {
        retval = -EPERM;
        goto error;
}

Sounds reasonable?

+       state = kzalloc(sizeof(struct preemp_delay_mmap_state), GFP_KERNEL);
+       kaddr = vmalloc_user(PAGE_SIZE);


Why vmalloc() ? We only need a single page?


Makes sense. I will switch to get_zeroed_page().

+       task = get_proc_task(inode);


And it seems that nobody does put_task_struct(state->task);

Good catch. I had caught the other two instances of get_proc_task() butmissed this one.

+       state->page = page;
+       state->kaddr = kaddr;
+       state->uaddr = (void *)vma->vm_start;


This is used by do_exit(). But ->vm_start can be changed by mremap() ?


Hmm. And mremap() can do vm_ops->close() too. But the new vma will
have the same vm_ops/vm_private_data, so exit_mmap() will try to do
this again... Perhaps I missed something, but I bet this all can't be
right.

Would you say sys_munmap() of mmap_state->uaddr is not even needed sinceexit_mm() will do this any way further down in do_exit()? If I were toremove this sys_munmap(), that could simplify the race issues as well.

+       state->task = task;
+
+       /* Clear the current delay request flag */
+       task->sched_preempt_delay.delay_flag = 0;
+
+       /* Point delay request flag pointer to the newly allocated memory */
+       task->sched_preempt_delay.delay_req = (unsigned char *)kaddr;
+
+       task->sched_preempt_delay.mmap_state = state;
+       vma->vm_private_data = state;
+       vma->vm_ops = &preempt_delay_vmops;
+       vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_SHARED | VM_WRITE;


This probably also needs VM_IO, to protect from madvise(MADV_DOFORK).


Yes, you are right. I will add that.

VM_SHARED/VM_WRITE doesn't look right.

VM_SHARED is wrong but VM_WRITE is needed I think since the thread willwrite to the mmap'd page to signal to request preemption delay.


Oleg.


I appreciate your taking the time to review this code. Thank you very much.

--
Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] Pre-emption control for userspace

Reply via email to