Ray Fucillo wrote:
Nick Piggin wrote:

fork() can be changed so as not to set up page tables for
MAP_SHARED mappings. I think that has other tradeoffs like
initially causing several unavoidable faults reading
libraries and program text.

What kind of application are you using?


The application is a database system called Caché. We allocate a large shared memory segment for database cache, which in a large production environment may realistically be 1+GB on 32-bit platforms and much larger on 64-bit. At these sizes fork() is taking hundreds of miliseconds, which can become a noticeable bottleneck for us. This performance characteristic seems to be unique to Linux vs other Unix implementations.



As Andi said, hugepages might be a very nice feature for you guys
to look into and might potentially give a performance increase with
reduced TLB pressure, not only your immediate fork problem.

Anyway, the attached patch is something you could try testing. If
you do so, then I would be very interested to see performance results.

Thanks,
Nick

--
SUSE Labs, Novell Inc.

Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c        2005-08-04 15:24:36.000000000 +1000
+++ linux-2.6/kernel/fork.c     2005-08-26 00:20:50.000000000 +1000
@@ -256,7 +256,6 @@ static inline int dup_mmap(struct mm_str
                 * Note that, exceptionally, here the vma is inserted
                 * without holding mm->mmap_sem.
                 */
-               spin_lock(&mm->page_table_lock);
                *pprev = tmp;
                pprev = &tmp->vm_next;
 
@@ -265,8 +264,11 @@ static inline int dup_mmap(struct mm_str
                rb_parent = &tmp->vm_rb;
 
                mm->map_count++;
-               retval = copy_page_range(mm, current->mm, tmp);
-               spin_unlock(&mm->page_table_lock);
+               if (!(file && (tmp->vm_flags & VM_SHARED))) {
+                       spin_lock(&mm->page_table_lock);
+                       retval = copy_page_range(mm, current->mm, tmp);
+                       spin_unlock(&mm->page_table_lock);
+               }
 
                if (tmp->vm_ops && tmp->vm_ops->open)
                        tmp->vm_ops->open(tmp);

Reply via email to