subject:"\[openib\-general\] Re\: \[PATCH\]\[RFC\]\[0\/4\] InfiniBand userspace verbs implementation"

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread Andrea Arcangeli

On Wed, May 11, 2005 at 05:53:36PM -0500, Timur Tabi wrote:
> Andrea Arcangeli wrote:
> 
> >If the problem appears again even after the last fix for the COW I did
> >last year, than it means we've another yet another bug to fix.
> 
> All of my memory pinning test cases pass when I use get_user_pages() with 
> kernels 2.6.7 and later.

Well then your problem was the cow bug, that was corrupting userland
with O_DIRECT too...
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread Timur Tabi

Andrea Arcangeli wrote:
If the problem appears again even after the last fix for the COW I did
last year, than it means we've another yet another bug to fix.
All of my memory pinning test cases pass when I use get_user_pages() with kernels 2.6.7 
and later.

--
Timur Tabi
Staff Software Engineer
[EMAIL PROTECTED]
One thing a Southern boy will never say is,
"I don't think duct tape will fix it."
 -- Ed Smylie, NASA engineer for Apollo 13
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread Andrea Arcangeli

On Wed, May 11, 2005 at 09:42:24PM +0100, Hugh Dickins wrote:
> proposed patches) there is no such migration of pages; that we'd prefer
> to implement migration in such a way that mlock does not inhibit it
> (though there might prove to be strong arguments defeating that);
> and that get_user_pages _must_ prevent migration (and if there
> were already such migration, I'd be saying it _does_ prevent it).

Indeed, mlock is a virtual pin and as such it won't be guaranteed to
always prevent migration. While get_user_pages is a physical pin on the
physical page so it has to prevent migration.

I think for him the physical pin is better since I guess IB would break
(at least unless you've some method to call to stop IB, adjust the IB
dma tracking, and restart IB, that hotplug can call). For the short term
using only get_user_pages sounds simpler IMHO.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread Andrea Arcangeli

On Wed, May 11, 2005 at 04:12:41PM -0400, William Jordan wrote:
> If I am reading you correctly, you are saying that mlock currently
> prevents pages from migrating around to unfragment memory, but
> get_user_pages does not prevent this? If this is the case, this could

This is not the case. Infact get_user_pages is a stronger pin than
mlock. But if you call it by hand and you plan to write to the page, you
have to use the "write=1" flag, this is fundamental if you want to write
to the physical page from userland while it's being tracked by IB dma.

In short you should not use mlock and you should use only
get_user_pages(write=1).

If the problem appears again even after the last fix for the COW I did
last year, than it means we've another yet another bug to fix.

Using mlock for this is unnecessary. mlock is a "virtual" pin and it
provides weaker guarantees than what you need. You need _physical_ pin
and get_user_pages(write=1) is the only one that will give it to you.

write=0 is ok too if you're never ever going to write to the page with
the cpu from userland.

In the old days there was the concept that get_user_pages wasn't a
"pte-pin", but that was infact broken in the way COW was working with threads,
but this is fixed now that is really a "pte-pin" again (like in 2.2
which never had the corruption cow bug!) even though the pte may
temporarily be set to swapcache or null. In current 2.6 you're
guaranteed that despite the pte may be temporarly be set to not-present,
the next minor fault will bring into memory the very same physical page
that was there before. At least unless you map the thing writeprotect
(i.e. write=0) and you write to it from userland.. ;).
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread Hugh Dickins

On Wed, 11 May 2005, William Jordan wrote:
> On 5/7/05, Hugh Dickins <[EMAIL PROTECTED]> wrote:
> > > My understanding is that mlock() could in theory allow the page to be 
> > > moved,
> > > but that currently nothing in the kernel would actually move it.  However,
> > > that could change in the future to allow hot-swapping of RAM.
> > 
> > That's my understanding too, that nothing currently does so.  Aside from
> > hot-swapping RAM, there's also a need to be able to migrate pages around
> > RAM, either to unfragment memory allowing higher-order allocations to
> > succeed more often, or to get around extreme dmamem/normal-mem/highmem
> > imbalances without dedicating huge reserves.  Those would more often
> > succeed if uninhibited by mlock.
> 
> If I am reading you correctly, you are saying that mlock currently
> prevents pages from migrating around to unfragment memory, but
> get_user_pages does not prevent this?

No, not what I meant at all.  I'm saying that currently (aside from
proposed patches) there is no such migration of pages; that we'd prefer
to implement migration in such a way that mlock does not inhibit it
(though there might prove to be strong arguments defeating that);
and that get_user_pages _must_ prevent migration (and if there
were already such migration, I'd be saying it _does_ prevent it).

Hugh
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-11 Thread William Jordan

On 5/7/05, Hugh Dickins <[EMAIL PROTECTED]> wrote:
> > My understanding is that mlock() could in theory allow the page to be moved,
> > but that currently nothing in the kernel would actually move it.  However,
> > that could change in the future to allow hot-swapping of RAM.
> 
> That's my understanding too, that nothing currently does so.  Aside from
> hot-swapping RAM, there's also a need to be able to migrate pages around
> RAM, either to unfragment memory allowing higher-order allocations to
> succeed more often, or to get around extreme dmamem/normal-mem/highmem
> imbalances without dedicating huge reserves.  Those would more often
> succeed if uninhibited by mlock.

Hugh,

If I am reading you correctly, you are saying that mlock currently
prevents pages from migrating around to unfragment memory, but
get_user_pages does not prevent this? If this is the case, this could
very easily be the problem Timur was experiencing. He is using
get_user_pages to lock pages long term (for the life of the process,
beyond the bounds of a single system call).

If it is possible for a page to be migrated in physical memory during
extreme virtual memory pressure while the reference count is held with
get_user_pages, that would cause the problem where the hardware is no
longer mapped to the same page as the application.

BTW: In earlier kernels, I experienced the same issues in our IB
drivers when trying to pin pages using only get_user_pages.

-- 
Bill Jordan
InfiniCon Systems
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-07 Thread Hugh Dickins

On Sat, 7 May 2005, Timur Tabi wrote:
> 
> > Oh, well, maybe, but what is the real problem?
> > Are you sure that copy-on-write doesn't come into it?
> 
> No, but I do know that my test case doesn't call fork(), so it's reproducible
> without involving COW.  Of course, I'm sure someone's going to tell me now
> that COW comes into effect even without fork().  If so, please explain.

I'll try.  COW comes into effect whenever you're sharing a page and
then need to make private changes to it.  Fork is one way of sharing
(with ancestor and descendant processes).  Using the empty zero page
is another way of sharing (with all other processes and parts of your
own address space with a readonly page full of zeroes).  Using a file
page from the page cache is another way of sharing.

None of those is actually your case, but our test for whether a page
is shared has been inadequate: oversimplifying, if page_count is more
than 1 then we have to assume it is shared and do the copy-on-write
(if the modifications are to be private).  But there are various places
where the page_count is temporarily raised (e.g. while paging out),
which we cannot distinguish, so occasionally we'll copy on write even
when it's not necessary, but we lack the information to tell us so.

In particular, of course, get_user_pages raises page_count to pin
the page: so making a page appear shared when it's not shared at all.

> The short answer: under "extreme" memory pressure, the data inside a page
> pinned by get_user_pages() is swapped out, moved, or deleted (I'm not sure
> which).  Some other data is placed into that physical location.
> 
> By extreme memory pressure, I mean having the process allocate and touch as
> much memory as possible.  Something like this:
> 
> num_bytes = get_amount_of_physical_ram();
> char *p = malloc(num_bytes);
> for (i=0; i p[i] = 0;
> 
> The above over-simplified code fails on earlier 2.6 kernels (or earlier
> versions of glibc that accompany most distros the use the earlier 2.6
> kernels).  Either malloc() returns NULL, or the p[i]=0 part causes a segfault.
> I haven't bothered to trace down why.  But when it does work, the page pinned
> by get_user_pages() changes.

Which has to be a bug with get_user_pages, which has no other purpose
than to pin the pages.  I cannot criticize you for working around it
to get your app working on lots of releases, but what _we_ have to do
is fix get_user_pages - and it appears that Andrea did so a year ago.

I'm surprised if it's as simple as you describe (you do say over-
simplified, maybe the critical points have fallen out), since GUP
users would have complained long ago if it wasn't doing the job in
normal cases of memory pressure.  Andrea's case does involve the
process independently trying to touch a page it has pinned for I/O
with get_user_pages.  Or (and I've only just thought of this, suspect
it might be exactly your case) not touch, but apply get_user_pages
again to a page already so pinned (while memory pressure has caused
try_to_unmap_one temporarily to detach it from the user address space
- the aspect of the problem that Andrea's fix attacks).

> My understanding is that mlock() could in theory allow the page to be moved,
> but that currently nothing in the kernel would actually move it.  However,
> that could change in the future to allow hot-swapping of RAM.

That's my understanding too, that nothing currently does so.  Aside from
hot-swapping RAM, there's also a need to be able to migrate pages around
RAM, either to unfragment memory allowing higher-order allocations to
succeed more often, or to get around extreme dmamem/normal-mem/highmem
imbalances without dedicating huge reserves.  Those would more often
succeed if uninhibited by mlock.

> So I need to take into account distro vendors that use an earlier kernel, like
> 2.6.5, and back-port the patch from 2.6.7.  The distro vendor will keep the
> 2.6.5 version number, which is why I can't rely on it.
> 
> I need to know exactly what the fix is, so that when I scan mm/rmap.c, I know
> what to look for.  Currently, I look for this regex:
> 
> try_to_unmap_one.*vm_area_struct
> 
> which seems to work.  However, now I think it's just a coincidence.

Perhaps any release based on 2.6.7 or above, or any release which
mentions "get_user_pages" in its mm/rmap.c or mm/objrmap.c?

> > By the way, please don't be worried when soon the try_to_unmap_one
> > comment and code that you identified above disappear.  When I'm
> > back in patch submission mode, I'll be sending Andrew a patch which
> > removes it, instead reworking can_share_swap_page to rely on the
> > page_mapcount instead of page_count, which avoids the ironical
> > behaviour my comment refers to, and allows an awkward page migration
> > case to proceed (once unpinned).  Andrea and I now both prefer this
> > page_mapcount approach.
> 
> Ugh, that means my regex is probably going to break.  Not only that, but I
> don't understand what you're saying either.  Tryin

Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

2005-05-07 Thread Timur Tabi

Hugh Dickins wrote:
Oh, well, maybe, but what is the real problem?
Are you sure that copy-on-write doesn't come into it?
No, but I do know that my test case doesn't call fork(), so it's reproducible without
involving COW. Of course, I'm sure someone's going to tell me now that COW comes into
effect even without fork(). If so, please explain.

I haven't reread through the whole thread, but my recollection is
that you never quite said what the real problem is: you'd found some
time ago that get_user_pages sometimes failed to pin the pages for
your complex app, so were forced to mlock too; but couldn't provide
any simple test case for the failure (which can indeed be a lot of
work to devise), so we were all in the dark as to what went wrong.
The short answer: under "extreme" memory pressure, the data inside a page pinned by
get_user_pages() is swapped out, moved, or deleted (I'm not sure which). Some other data
is placed into that physical location.

By extreme memory pressure, I mean having the process allocate and touch as much memory as
possible. Something like this:

num_bytes = get_amount_of_physical_ram();
char *p = malloc(num_bytes);
for (i=0; i
The above over-simplified code fails on earlier 2.6 kernels (or earlier versions of glibc
that accompany most distros the use the earlier 2.6 kernels). Either malloc() returns
NULL, or the p[i]=0 part causes a segfault. I haven't bothered to trace down why. But
when it does work, the page pinned by get_user_pages() changes.

But you've now found that 2.6.7 and later kernels allow your app to
work correctly without mlock, good. get_user_pages is certainly the
right tool to use for such pinning. (On the question of whether
mlock guarantees that user virtual addresses map to the same physical
addresses, I prefer Arjan's view that it does not; but accept that
there might prove to be difficulties in holding that position.)
My understanding is that mlock() could in theory allow the page to be moved, but that
currently nothing in the kernel would actually move it. However, that could change in the
future to allow hot-swapping of RAM.

So, it works now, you've exonerated today's get_user_pages, and you've
identified at least one get_user_pages fix which went in at that time:
do we really need to chase this further?
My driver needs to support all 2.4 and 2.6 kernel versions. My makefile scans the kernel
source tree with 'grep' to identify various characterists, and I use #ifdefs to
conditionally compile code depending on what features are present in the kernel. I can't
use the kernel version number, because that's not reliable - distros will incorporate
patches from future kernels without changing the version ID.

So I need to take into account distro vendors that use an earlier kernel, like 2.6.5, and
back-port the patch from 2.6.7. The distro vendor will keep the 2.6.5 version number,
which is why I can't rely on it.

I need to know exactly what the fix is, so that when I scan mm/rmap.c, I know what to look
for. Currently, I look for this regex:

try_to_unmap_one.*vm_area_struct
which seems to work. However, now I think it's just a coincidence.
By the way, please don't be worried when soon the try_to_unmap_one
comment and code that you identified above disappear. When I'm
back in patch submission mode, I'll be sending Andrew a patch which
removes it, instead reworking can_share_swap_page to rely on the
page_mapcount instead of page_count, which avoids the ironical
behaviour my comment refers to, and allows an awkward page migration
case to proceed (once unpinned). Andrea and I now both prefer this
page_mapcount approach.
Ugh, that means my regex is probably going to break. Not only that, but I don't
understand what you're saying either. Trying to understand the VM is really hard.

I guess in this specific case, it doesn't really matter, because calling mlock() when I
should be calling get_user_pages() is not a bad thing.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

1 2 >

1 - 100 of 132 matches

Mail list logo