Gilles Chanteperdrix wrote:
> On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote:
>> Gilles Chanteperdrix wrote:
>>> On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> we are getting a lot of
>>>>>>
>>>>>> BUG: sleeping function called from invalid context at 
>>>>>> mm/page_alloc.c:1225
>>>>>> in_atomic():1, irqs_disabled():0
>>>>>>  [<c010305d>] show_trace_log_lvl+0x1a/0x2f
>>>>>>  [<c0103156>] show_trace+0x12/0x14
>>>>>>  [<c0103915>] dump_stack+0x16/0x18
>>>>>>  [<c010c4ab>] __might_sleep+0xcd/0xd3
>>>>>>  [<c0149488>] __alloc_pages+0x32/0x281
>>>>>>  [<c014fdd2>] copy_page_range+0x221/0x41e
>>>>>>  [<c010ec18>] copy_process+0x9e1/0xfe2
>>>>>>  [<c010f415>] do_fork+0x99/0x176
>>>>>>  [<c0100e75>] sys_clone+0x33/0x39
>>>>>>  [<c0102aaf>] syscall_call+0x7/0xb
>>>>>>  =======================
>>>>>>
>>>>>> here due to a Xenomai program issuing system() calls.
>>>>>>
>>>>>> After once again dissecting the "nice" mm code (sigh...), the reason
>>>>>> turned out to be plain simple:
>>>>>>
>>>>>> copy_pte_range(...);
>>>>>>   spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
>>>>>>   copy_one_pte(...);
>>>>>>     if (is_cow_mapping(vm_flags))
>>>>>>       alloc_page_vma(GFP_HIGHUSER, ...);
>>>>>>         __alloc_pages(...)
>>>>>>        might_sleep_if(gfp_mask & __GFP_WAIT);
>>>>>>
>>>>>> And this is true due to #define GFP_HIGHUSER (__GFP_WAIT | ...
>>>>>>
>>>>>> So the bad news is that the COW code in likely all i-pipe versions is
>>>>>> broken. But the good new is that this might be easily fixable by
>>>>>> providing the right gfp_mask. GFP_ATOMIC?
>>>>> It does not look like a good solution, you are going to empty the
>>>>> GFP_ATOMIC pools. The proper solution would rather be to look at the
>>>>> real mm code (I mean not the one I wrote) and see how they cope with
>>>>> this issue.
>>>> Mmpf. What are the chances for a quick fix within the next days? We have
>>>> to consider alternatives right now here because the whole system is
>>>> meant for production purpose next week (C-ELROB '07).
>>>>
>>>> OK, I'm already finding myself inside the code :-/. What about this
>>>> approach: We try to alloc with GFP_ATOMIC. Once this fails, we break
>>>> out, drop all locks (just like it happens in case of need_resched()),
>>>> try to fill up the pool, and restart then. What would reliably make
>>>> Linux refill its atomic pool?
>>>>
>>>> Alternative approach: preallocate the required pages before entering the
>>>> loop in copy_pte_range. But that may require more code changes.
>>> I would say the real fix is to drop momentarily the spinlock(s?) for 
>>> allocating.
>>>
>> Are you sure it's safe to drop locks in the (logical) middle of
>> copy_one_pte()? I can't tell yet from the few glances I took. It's just
>> my feeling that says "no" so far.
> 
> There is certainly something possible, since the vanilla kernel
> actually works without these warning.

Vanilla doesn't allocate pages from within copy_one_pte.


So, here comes proposal #1 for a solution, also addressing the broken
ENOMEM-error path of the current patch. Only compile tested yet. Might
this work?

Jan


---
 mm/memory.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6.20.15/mm/memory.c
===================================================================
--- linux-2.6.20.15.orig/mm/memory.c
+++ linux-2.6.20.15/mm/memory.c
@@ -498,7 +498,7 @@ copy_one_pte(struct mm_struct *dst_mm, s
 #ifdef CONFIG_IPIPE
                if (((vm_flags|src_mm->def_flags) & (VM_LOCKED|VM_PINNED)) == 
(VM_LOCKED|VM_PINNED)) {
                        struct page *old_page = vm_normal_page(vma, addr, pte);
-                       page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+                       page = alloc_page_vma(GFP_ATOMIC, vma, addr);
                        if (!page)
                                return -ENOMEM;
 
@@ -546,6 +546,8 @@ static int copy_pte_range(struct mm_stru
        spinlock_t *src_ptl, *dst_ptl;
        int progress = 0;
        int rss[2];
+       int nomem_retries = 10; /* Arbitrary limit */
+       int err = 0;
 
 again:
        rss[1] = rss[0] = 0;
@@ -574,8 +576,12 @@ again:
                        continue;
                }
                if (copy_one_pte(dst_mm, src_mm, dst_pte,
-                                src_pte, vma, addr, rss))
-                       return -ENOMEM;
+                                src_pte, vma, addr, rss)) {
+                       if (--nomem_retries == 0)
+                               err = -ENOMEM;
+                       break;
+               }
+               nomem_retries = 10;
                progress += 8;
        } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
 
@@ -585,9 +591,9 @@ again:
        add_mm_rss(dst_mm, rss[0], rss[1]);
        pte_unmap_unlock(dst_pte - 1, dst_ptl);
        cond_resched();
-       if (addr != end)
+       if (addr != end && !err)
                goto again;
-       return 0;
+       return err;
 }
 
 static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main

Reply via email to