Re: [Adeos-main] [COW-BUG] __alloc_pages called from atomic context

Gilles Chanteperdrix Tue, 07 Aug 2007 08:00:49 -0700

On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> Gilles Chanteperdrix wrote:
> > On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> On 8/7/07, Jan Kiszka <[EMAIL PROTECTED]> wrote:
> >>>> Gilles Chanteperdrix wrote:
> >>>>> Jan Kiszka wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> we are getting a lot of
> >>>>>>
> >>>>>> BUG: sleeping function called from invalid context at 
> >>>>>> mm/page_alloc.c:1225
> >>>>>> in_atomic():1, irqs_disabled():0
> >>>>>>  [<c010305d>] show_trace_log_lvl+0x1a/0x2f
> >>>>>>  [<c0103156>] show_trace+0x12/0x14
> >>>>>>  [<c0103915>] dump_stack+0x16/0x18
> >>>>>>  [<c010c4ab>] __might_sleep+0xcd/0xd3
> >>>>>>  [<c0149488>] __alloc_pages+0x32/0x281
> >>>>>>  [<c014fdd2>] copy_page_range+0x221/0x41e
> >>>>>>  [<c010ec18>] copy_process+0x9e1/0xfe2
> >>>>>>  [<c010f415>] do_fork+0x99/0x176
> >>>>>>  [<c0100e75>] sys_clone+0x33/0x39
> >>>>>>  [<c0102aaf>] syscall_call+0x7/0xb
> >>>>>>  =======================
> >>>>>>
> >>>>>> here due to a Xenomai program issuing system() calls.
> >>>>>>
> >>>>>> After once again dissecting the "nice" mm code (sigh...), the reason
> >>>>>> turned out to be plain simple:
> >>>>>>
> >>>>>> copy_pte_range(...);
> >>>>>>   spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> >>>>>>   copy_one_pte(...);
> >>>>>>     if (is_cow_mapping(vm_flags))
> >>>>>>       alloc_page_vma(GFP_HIGHUSER, ...);
> >>>>>>         __alloc_pages(...)
> >>>>>>        might_sleep_if(gfp_mask & __GFP_WAIT);
> >>>>>>
> >>>>>> And this is true due to #define GFP_HIGHUSER (__GFP_WAIT | ...
> >>>>>>
> >>>>>> So the bad news is that the COW code in likely all i-pipe versions is
> >>>>>> broken. But the good new is that this might be easily fixable by
> >>>>>> providing the right gfp_mask. GFP_ATOMIC?
> >>>>> It does not look like a good solution, you are going to empty the
> >>>>> GFP_ATOMIC pools. The proper solution would rather be to look at the
> >>>>> real mm code (I mean not the one I wrote) and see how they cope with
> >>>>> this issue.
> >>>> Mmpf. What are the chances for a quick fix within the next days? We have
> >>>> to consider alternatives right now here because the whole system is
> >>>> meant for production purpose next week (C-ELROB '07).
> >>>>
> >>>> OK, I'm already finding myself inside the code :-/. What about this
> >>>> approach: We try to alloc with GFP_ATOMIC. Once this fails, we break
> >>>> out, drop all locks (just like it happens in case of need_resched()),
> >>>> try to fill up the pool, and restart then. What would reliably make
> >>>> Linux refill its atomic pool?
> >>>>
> >>>> Alternative approach: preallocate the required pages before entering the
> >>>> loop in copy_pte_range. But that may require more code changes.
> >>> I would say the real fix is to drop momentarily the spinlock(s?) for 
> >>> allocating.
> >>>
> >> Are you sure it's safe to drop locks in the (logical) middle of
> >> copy_one_pte()? I can't tell yet from the few glances I took. It's just
> >> my feeling that says "no" so far.
> >
> > There is certainly something possible, since the vanilla kernel
> > actually works without these warning.
>
> Vanilla doesn't allocate pages from within copy_one_pte.


The fact that you are in a hurry should not be an excuse to propose a
fix which is much worse than the bug itself.

-- 
                                               Gilles Chanteperdrix

_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main

Re: [Adeos-main] [COW-BUG] __alloc_pages called from atomic context

Reply via email to