Re: Ummunotify: progress at last!

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 12:37:03PM -0700, Roland Dreier wrote:
   No, there is no mmap. Like this:
   
   u64 my_counter = 0;
   
   ibv_set_mmu_counter(verbs, my_counter);
   [..]
   while (my_counter != last_my_counter) {
   last_my_counter = my_counter;
   ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier 
 as well
   }
   
   The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
   the 8 byte counter you specified without having to the mmap thing. As
   I understand it this is what perfevents does.
 
 I was trying to look at how perf events handles this, and AFAICT it
 looks like kernel/perf_event.c just supports mmap().  Can you expand on
 what you meant here?
 
 (I was trying to figure out how one would handle the case where
 userspace gives us a counter in highmem -- doing kmap_atomic() seems to
 be to only option but then I'm not sure if I want to deal with that...)

I think I was mistaken here, disregard..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-24 Thread Roland Dreier
   I wonder, is that a win?  I guess you don't even have to pin it, just do
   copy_to_user() to update the counter, but mmap doesn't seem so bad.
  
  Not sure you can call copy_to_user from a mmu_notifier callback? What
  if it faults?

Good point... we could defer it to a context where we could block but...

  I think the idea would be that by letting user space select the
  counter location it could place it in a sensible cachline/etc and
  probably avoid a pointer indirection.

OK, I have to look at the perf stuff.  That makes sense but OTOH
userspace could (unknowingly) pick a highmen page and then we end up
having to do kmap_atomic or something from an mmu notifier callback.
-- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-24 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 10:59:42PM -0700, Roland Dreier wrote:

 That is all definitely doable.  I wonder if it's better to get rid of
 the dedicated fd though.  After all, having the fd means a fancy app can
 do poll() or sigio or whatever internally.  Being able to integrate into
 an fd-driven event loop seems like a pretty big thing to give up.

Not sure, adding the fd is only a small amount of work. But on the
other hand integrating with a completion channel might bet better
suited to verbs's API, and even less work..

 Also, having a uverbs syscall that is exactly like read() seems a bit
 of a stretch, even within the ugliness that is the uverbs interface.
 (We love all our children, but sometimes we have to be realistic)

I agree, but if the FD is eliminated then that seems like the best bet
- it wouldn't be exactly like read, it would be a multiplexed uverb
command executed over the uverbs fd.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 12:06:50PM -0400, Jeff Squyres wrote:
 IBM has found a resource that they think will be able to progress Roland's 
 ummunotify work.
 
 After a few discussions in Sonoma last week and some off-list emails, here's 
 what we decided:
 
 1. Take Roland's last code drop (Roland: can you re-send the last copy of 
 your code?).
 
 2. Do not convert it to the perf events kernel framework as the Linux kernel 
 community requested.  Instead, migrate the functionality into the ibv code 
 base.  Roland thinks that most of the code should be adaptable without too 
 many changes.  Here's the highlights of the new functionality:
 
a. Add a new flag to ibv_reg_mr() that does the same function as 
 UMMUNOTIFY_REGISTER_REGION
b. ibv_dereg_mr() always performs the equivalent of
UMMUNOTIFY_UNREGISTER_REGION (if necessary)

c. Make a new device somewhere (under /dev/infiniband?) that
performs the same functions as /dev/ummunotify (open it to mmap
the counter into user space, and read events when something
interesting happens)

I would prefer to do this by adding a new verbs call that returns a fd
directly. Ie use ib_uverbs_alloc_event_file and act like
ibv_create_comp_channel.

The main reason for the new FD is so it can be polled on..

You can also avoid the mmap scheme by doing what perf events does,
pass in a pointer from userspace and have the kernel pin that page it
is on.

So, I'd suggest

fd = ibv_create_mmu_monitor(verbs, counter);
[..]
poll(fd);
[..]
if (counter != last_counter)
  [..]
close(fd);

Refuse to create more than one mmu_monitor for each verbs for now.

I looked at this for a little while at Sonoma and I think it is quite
straightforward, I'm happy to look over anything.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote:

 The main reason for the new FD is so it can be polled on..

What do you poll on the fd for?

With ummunotify, you only read() from the fd when (counter != last_counter).  
Were you thinking that the poll() would be for something else?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 01:17:40PM -0400, Jeff Squyres wrote:
 On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote:
 
  The main reason for the new FD is so it can be polled on..
 
 What do you poll on the fd for?
 
 With ummunotify, you only read() from the fd when (counter !=
 last_counter).  Were you thinking that the poll() would be for
 something else?

poll() is for apps that want to get the notifications without
spinning on the counter. If you don't think that is worth doing it
does simplify things alot, just add two new verbs calls:

ibv_set_mmu_counter(verbs, my_counter);
ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 1:29 PM, Jason Gunthorpe wrote:

  What do you poll on the fd for?
 
 poll() is for apps that want to get the notifications without
 spinning on the counter.

Ah, ok.

I think even with the ummunotify interface, that would work, too.  Meaning: 
since you have to read() from the fd to get event details, poll() would *also* 
tell you if there was something to read (in addition to checking if 
(last_counter != counter)).  The counter is a fast way of checking -- e.g., if 
you need to check in your fast path (which MPI's likely will).  poll() could be 
used if you don't care if the check is slow.

 If you don't think that is worth doing it
 does simplify things alot, just add two new verbs calls:
 
 ibv_set_mmu_counter(verbs, my_counter);
 ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));

I have no real opinion on whether the mmap/read should be hidden by the above 
ibv calls or not.  Either is fine with me.  I would *assume* that 
ibv_get_mmu_notifications() is non-blocking, right?  E.g., if you ask for N and 
only M are available (where M  N), then the call returns with only M items 
filled (and M could be 0).  Perhaps you need another parameter to indicate how 
many items in my_list were actually filled?  Or is that the return value?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 03:17:12PM -0400, Jeff Squyres wrote:

  If you don't think that is worth doing it
  does simplify things alot, just add two new verbs calls:
  
  ibv_set_mmu_counter(verbs, my_counter);
  ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
 
 I have no real opinion on whether the mmap/read should be hidden by
 the above ibv calls or not.  Either is fine with me.  I would
 *assume* that ibv_get_mmu_notifications() is non-blocking, right?
 E.g., if you ask for N and only M are available (where M  N), then
 the call returns with only M items filled (and M could be 0).
 Perhaps you need another parameter to indicate how many items in
 my_list were actually filled?  Or is that the return value?
 
Right, non-blocking, return value indicates length, like read().

These are not hiding mmap/read, they are new uverbs 'syscalls' that
get the kernel to perform that operation.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote:

   ibv_set_mmu_counter(verbs, my_counter);
   ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
 
 These are not hiding mmap/read, they are new uverbs 'syscalls' that
 get the kernel to perform that operation.

Oh -- so there's 2 mechanisms to get the counter info (for example):

1. the above uverb
2. mmap

Right?

I don't really have an opinion here -- I'm not really an owner of the ibv 
API.  As long as there is a fast/mmap way for me to get the counter without an 
extra function call, I'm happy.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 04:01:21PM -0400, Jeff Squyres wrote:
 On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote:
 
ibv_set_mmu_counter(verbs, my_counter);
ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
  
  These are not hiding mmap/read, they are new uverbs 'syscalls' that
  get the kernel to perform that operation.
 
 Oh -- so there's 2 mechanisms to get the counter info (for example):
 
 1. the above uverb
 2. mmap
 
 Right?

No, there is no mmap. Like this:

u64 my_counter = 0;

ibv_set_mmu_counter(verbs, my_counter);
[..]
while (my_counter != last_my_counter) {
last_my_counter = my_counter;
ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier as well
}

The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
the 8 byte counter you specified without having to the mmap thing. As
I understand it this is what perfevents does.

Integrating with the verbs api avoids the need for another device
file. That is good. Eliminating poll() from the API can remove the
dedicated fd entirely. Within verbs I guess we could replace poll()
with something like a completion channel (??) if anyone cares.

So the user space visible functionality boils down to 2 new 'syscalls'
and the new flag to ibv_reg_mr.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Roland Dreier
  What do you poll on the fd for?

  With ummunotify, you only read() from the fd when (counter !=
  last_counter).  Were you thinking that the poll() would be for
  something else?

The ummunotify fd supported poll() (and SIGIO etc).  You don't have to
use it if you don't want to, but I made it as much of a normal fd as
possible.

 - R.
-- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 10:55:08PM -0700, Roland Dreier wrote:
   I would prefer to do this by adding a new verbs call that returns a fd
   directly. Ie use ib_uverbs_alloc_event_file and act like
   ibv_create_comp_channel.
   
   The main reason for the new FD is so it can be polled on..
 
 Agree, we don't want a new device node I don't think -- too hard to
 associate an fd you get from a separate open() with a uverbs context.

Right, that would suck..
 
   You can also avoid the mmap scheme by doing what perf events does,
   pass in a pointer from userspace and have the kernel pin that page it
   is on.
 
 I wonder, is that a win?  I guess you don't even have to pin it, just do
 copy_to_user() to update the counter, but mmap doesn't seem so bad.

Not sure you can call copy_to_user from a mmu_notifier callback? What
if it faults?

I think the idea would be that by letting user space select the
counter location it could place it in a sensible cachline/etc and
probably avoid a pointer indirection.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Roland Dreier
  No, there is no mmap. Like this:
  
  u64 my_counter = 0;
  
  ibv_set_mmu_counter(verbs, my_counter);
  [..]
  while (my_counter != last_my_counter) {
  last_my_counter = my_counter;
  ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier as 
  well
  }
  
  The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
  the 8 byte counter you specified without having to the mmap thing. As
  I understand it this is what perfevents does.
  
  Integrating with the verbs api avoids the need for another device
  file. That is good. Eliminating poll() from the API can remove the
  dedicated fd entirely. Within verbs I guess we could replace poll()
  with something like a completion channel (??) if anyone cares.

That is all definitely doable.  I wonder if it's better to get rid of
the dedicated fd though.  After all, having the fd means a fancy app can
do poll() or sigio or whatever internally.  Being able to integrate into
an fd-driven event loop seems like a pretty big thing to give up.

Also, having a uverbs syscall that is exactly like read() seems a bit
of a stretch, even within the ugliness that is the uverbs interface.
(We love all our children, but sometimes we have to be realistic)

 - R.
-- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html