Re: [PATCH 1/2] New system call, unshare

2005-08-22 Thread Al Viro
On Mon, Aug 08, 2005 at 03:46:06PM +0100, Alan Cox wrote:
> On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
> > 
> > [PATCH 1/2] unshare system call: System Call handler function sys_unshare
> 
> 
> Given the complexity of the kernel code involved and the obscurity of
> the functionality why not just do another clone() in userspace to
> unshare the things you want to unshare and then _exit the parent ?

Because you want to keep children?  Because you don't want to deal with
the implications for sessions/groups/etc.?

FWIW, syscall makes sense.  It is a valid primitive and the only reason
to keep it out of clone() (i.e. not making it just another flag to clone())
is that clone() is already cluttered _and_ uses bad calling conventions
for that stuff ("I want to retain " rather than "I want private ").
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Avi Kivity

Alan Cox wrote:


On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
 


[PATCH 1/2] unshare system call: System Call handler function sys_unshare
   




Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

 

suppose somebody wait()s for the parent? you've turned a synchronous 
operation into an asynchronous one.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread serge
Quoting Alan Cox ([EMAIL PROTECTED]):
> On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
> > 
> > [PATCH 1/2] unshare system call: System Call handler function sys_unshare
> 
> 
> Given the complexity of the kernel code involved and the obscurity of
> the functionality why not just do another clone() in userspace to
> unshare the things you want to unshare and then _exit the parent ?

The problem I had when I tried using just clone() was that it's
not possible to have a pam library clone() and have the process
being authenticated end up with the new namespace.  At least not
that I could figure out.  Seemed possible that cloning, exiting the
original thread, and returning from the new thread could work, but
it didn't seem to work when I tried it.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Alan Cox
On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
> 
> [PATCH 1/2] unshare system call: System Call handler function sys_unshare


Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] New system call, unshare

2005-08-08 Thread Janak Desai


[PATCH 1/2] unshare system call: System Call handler function sys_unshare

Signed-off-by: Janak Desai



 fork.c |  202 +++--
 1 files changed, 196 insertions(+), 6 deletions(-)



diff -Naurp 2.6.13-rc5-mm1/kernel/fork.c 2.6.13-rc5-mm1+unshare/kernel/fork.c
--- 2.6.13-rc5-mm1/kernel/fork.c2005-08-07 15:33:45.0 +
+++ 2.6.13-rc5-mm1+unshare/kernel/fork.c2005-08-07 19:03:49.0 
+
@@ -57,6 +57,17 @@ int nr_threads;  /* The idle threads do
 
 int max_threads;   /* tunable limit on nr_threads */
 
+/*
+ * mm_copy gets called from clone or unshare system calls. When called
+ * from clone, mm_struct may be shared depending on the clone flags
+ * argument, however, when called from the unshare system call, a private
+ * copy of mm_struct is made.
+ */
+enum mm_copy_share {
+   MAY_SHARE,
+   UNSHARE,
+};
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
  __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -447,16 +458,26 @@ void mm_release(struct task_struct *tsk,
}
 }
 
-static int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
+static int copy_mm(unsigned long clone_flags, struct task_struct * tsk,
+   enum mm_copy_share copy_share_action)
 {
struct mm_struct * mm, *oldmm;
int retval;
 
-   tsk->min_flt = tsk->maj_flt = 0;
-   tsk->nvcsw = tsk->nivcsw = 0;
+   /*
+* If the process memory is being duplicated as part of the
+* unshare system call, we are working with the current process
+* and not a newly allocated task strucutre, and should not
+* zero out fault info, context switch counts, mm and active_mm
+* fields.
+*/
+   if (copy_share_action == MAY_SHARE) {
+   tsk->min_flt = tsk->maj_flt = 0;
+   tsk->nvcsw = tsk->nivcsw = 0;
 
-   tsk->mm = NULL;
-   tsk->active_mm = NULL;
+   tsk->mm = NULL;
+   tsk->active_mm = NULL;
+   }
 
/*
 * Are we cloning a kernel thread?
@@ -973,7 +994,7 @@ static task_t *copy_process(unsigned lon
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
goto bad_fork_cleanup_sighand;
-   if ((retval = copy_mm(clone_flags, p)))
+   if ((retval = copy_mm(clone_flags, p, MAY_SHARE)))
goto bad_fork_cleanup_signal;
if ((retval = copy_keys(clone_flags, p)))
goto bad_fork_cleanup_mm;
@@ -1288,3 +1309,172 @@ void __init proc_caches_init(void)
sizeof(struct mm_struct), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 }
+
+/*
+ * unshare_mm is called from the unshare system call handler function to
+ * make a private copy of the mm_struct structure. It calls copy_mm with
+ * CLONE_VM flag cleard, to ensure that a private copy of mm_struct is made,
+ * and with mm_copy_share enum set to UNSHARE, to ensure that copy_mm
+ * does not clear fault info, context switch counts, mm and active_mm
+ * fields of the mm_struct.
+ */
+static int unshare_mm(unsigned long unshare_flags, struct task_struct *tsk)
+{
+   int retval = 0;
+   struct mm_struct *mm = tsk->mm;
+
+   /*
+* If the virtual memory is being shared, make a private
+* copy and disassociate the process from the shared virtual
+* memory.
+*/
+   if (atomic_read(&mm->mm_users) > 1) {
+   retval = copy_mm((unshare_flags & ~CLONE_VM), tsk, UNSHARE);
+
+   /*
+* If copy_mm was successful, decrement the number of users
+* on the original, shared, mm_struct.
+*/
+   if (!retval)
+   atomic_dec(&mm->mm_users);
+   }
+   return retval;
+}
+
+/*
+ * unshare_sighand is called from the unshare system call handler function to
+ * make a private copy of the sighand_struct structure. It calls copy_sighand
+ * with CLONE_SIGHAND cleared to ensure that a new signal handler structure
+ * is cloned from the current shared one.
+ */
+static int unshare_sighand(unsigned long unshare_flags, struct task_struct 
*tsk)
+{
+   int retval = 0;
+   struct sighand_struct *sighand = tsk->sighand;
+
+   /*
+* If the signal handlers are being shared, make a private
+* copy and disassociate the process from the shared signal
+* handlers.
+*/
+   if (atomic_read(&sighand->count) > 1) {
+   retval = copy_sighand((unshare_flags & ~CLONE_SIGHAND), tsk);
+
+   /*
+* If copy_sighand was successful, decrement the use count
+* on the original, shared, sighand_struct.
+*/
+   if (!retval)
+   atomic_dec(&sighand->count);
+   }
+   return retval;
+}
+
+/*
+ * unshare_namespa