Re: Ummunotify: progress at last!
On Wed, Apr 07, 2010 at 12:37:03PM -0700, Roland Dreier wrote: > > No, there is no mmap. Like this: > > > > u64 my_counter = 0; > > > > ibv_set_mmu_counter(verbs, &my_counter); > > [..] > > while (my_counter != last_my_counter) { > > last_my_counter = my_counter; > > ibv_get_mmu_notifications(verbs, ...); // <- I am a memory barrier > as well > > } > > > > The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to > > the 8 byte counter you specified without having to the mmap thing. As > > I understand it this is what perfevents does. > > I was trying to look at how perf events handles this, and AFAICT it > looks like kernel/perf_event.c just supports mmap(). Can you expand on > what you meant here? > > (I was trying to figure out how one would handle the case where > userspace gives us a counter in highmem -- doing kmap_atomic() seems to > be to only option but then I'm not sure if I want to deal with that...) I think I was mistaken here, disregard.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
> No, there is no mmap. Like this: > > u64 my_counter = 0; > > ibv_set_mmu_counter(verbs, &my_counter); > [..] > while (my_counter != last_my_counter) { > last_my_counter = my_counter; > ibv_get_mmu_notifications(verbs, ...); // <- I am a memory barrier as > well > } > > The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to > the 8 byte counter you specified without having to the mmap thing. As > I understand it this is what perfevents does. I was trying to look at how perf events handles this, and AFAICT it looks like kernel/perf_event.c just supports mmap(). Can you expand on what you meant here? (I was trying to figure out how one would handle the case where userspace gives us a counter in highmem -- doing kmap_atomic() seems to be to only option but then I'm not sure if I want to deal with that...) -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 10:59:42PM -0700, Roland Dreier wrote: > That is all definitely doable. I wonder if it's better to get rid of > the dedicated fd though. After all, having the fd means a fancy app can > do poll() or sigio or whatever internally. Being able to integrate into > an fd-driven event loop seems like a pretty big thing to give up. Not sure, adding the fd is only a small amount of work. But on the other hand integrating with a completion channel might bet better suited to verbs's API, and even less work.. > Also, having a uverbs "syscall" that is exactly like read() seems a bit > of a stretch, even within the ugliness that is the uverbs interface. > (We love all our children, but sometimes we have to be realistic) I agree, but if the FD is eliminated then that seems like the best bet - it wouldn't be exactly like read, it would be a multiplexed uverb command executed over the uverbs fd. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
> > I wonder, is that a win? I guess you don't even have to pin it, just do > > copy_to_user() to update the counter, but mmap doesn't seem so bad. > > Not sure you can call copy_to_user from a mmu_notifier callback? What > if it faults? Good point... we could defer it to a context where we could block but... > I think the idea would be that by letting user space select the > counter location it could place it in a sensible cachline/etc and > probably avoid a pointer indirection. OK, I have to look at the perf stuff. That makes sense but OTOH userspace could (unknowingly) pick a highmen page and then we end up having to do kmap_atomic or something from an mmu notifier callback. -- Roland Dreier For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
> No, there is no mmap. Like this: > > u64 my_counter = 0; > > ibv_set_mmu_counter(verbs, &my_counter); > [..] > while (my_counter != last_my_counter) { > last_my_counter = my_counter; > ibv_get_mmu_notifications(verbs, ...); // <- I am a memory barrier as > well > } > > The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to > the 8 byte counter you specified without having to the mmap thing. As > I understand it this is what perfevents does. > > Integrating with the verbs api avoids the need for another device > file. That is good. Eliminating poll() from the API can remove the > dedicated fd entirely. Within verbs I guess we could replace poll() > with something like a completion channel (??) if anyone cares. That is all definitely doable. I wonder if it's better to get rid of the dedicated fd though. After all, having the fd means a fancy app can do poll() or sigio or whatever internally. Being able to integrate into an fd-driven event loop seems like a pretty big thing to give up. Also, having a uverbs "syscall" that is exactly like read() seems a bit of a stretch, even within the ugliness that is the uverbs interface. (We love all our children, but sometimes we have to be realistic) - R. -- Roland Dreier For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 10:55:08PM -0700, Roland Dreier wrote: > > I would prefer to do this by adding a new verbs call that returns a fd > > directly. Ie use ib_uverbs_alloc_event_file and act like > > ibv_create_comp_channel. > > > > The main reason for the new FD is so it can be polled on.. > > Agree, we don't want a new device node I don't think -- too hard to > associate an fd you get from a separate open() with a uverbs context. Right, that would suck.. > > You can also avoid the mmap scheme by doing what perf events does, > > pass in a pointer from userspace and have the kernel pin that page it > > is on. > > I wonder, is that a win? I guess you don't even have to pin it, just do > copy_to_user() to update the counter, but mmap doesn't seem so bad. Not sure you can call copy_to_user from a mmu_notifier callback? What if it faults? I think the idea would be that by letting user space select the counter location it could place it in a sensible cachline/etc and probably avoid a pointer indirection. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
> I would prefer to do this by adding a new verbs call that returns a fd > directly. Ie use ib_uverbs_alloc_event_file and act like > ibv_create_comp_channel. > > The main reason for the new FD is so it can be polled on.. Agree, we don't want a new device node I don't think -- too hard to associate an fd you get from a separate open() with a uverbs context. > You can also avoid the mmap scheme by doing what perf events does, > pass in a pointer from userspace and have the kernel pin that page it > is on. I wonder, is that a win? I guess you don't even have to pin it, just do copy_to_user() to update the counter, but mmap doesn't seem so bad. I'll have to look at the perf code. -- Roland Dreier For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
> What do you poll on the fd for? > With ummunotify, you only read() from the fd when (counter != > last_counter). Were you thinking that the poll() would be for > something else? The ummunotify fd supported poll() (and SIGIO etc). You don't have to use it if you don't want to, but I made it as much of a normal fd as possible. - R. -- Roland Dreier For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 04:01:21PM -0400, Jeff Squyres wrote: > On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote: > > > > > ibv_set_mmu_counter(verbs, &my_counter); > > > > ibv_get_mmu_notifications(verbs, &my_list, sizeof(my_list)); > > > > These are not hiding mmap/read, they are new uverbs 'syscalls' that > > get the kernel to perform that operation. > > Oh -- so there's 2 mechanisms to get the counter info (for example): > > 1. the above uverb > 2. mmap > > Right? No, there is no mmap. Like this: u64 my_counter = 0; ibv_set_mmu_counter(verbs, &my_counter); [..] while (my_counter != last_my_counter) { last_my_counter = my_counter; ibv_get_mmu_notifications(verbs, ...); // <- I am a memory barrier as well } The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to the 8 byte counter you specified without having to the mmap thing. As I understand it this is what perfevents does. Integrating with the verbs api avoids the need for another device file. That is good. Eliminating poll() from the API can remove the dedicated fd entirely. Within verbs I guess we could replace poll() with something like a completion channel (??) if anyone cares. So the user space visible functionality boils down to 2 new 'syscalls' and the new flag to ibv_reg_mr. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote: > > > ibv_set_mmu_counter(verbs, &my_counter); > > > ibv_get_mmu_notifications(verbs, &my_list, sizeof(my_list)); > > These are not hiding mmap/read, they are new uverbs 'syscalls' that > get the kernel to perform that operation. Oh -- so there's 2 mechanisms to get the counter info (for example): 1. the above uverb 2. mmap Right? I don't really have an opinion here -- I'm not really an "owner" of the ibv API. As long as there is a fast/mmap way for me to get the counter without an extra function call, I'm happy. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 03:17:12PM -0400, Jeff Squyres wrote: > > If you don't think that is worth doing it > > does simplify things alot, just add two new verbs calls: > > > > ibv_set_mmu_counter(verbs, &my_counter); > > ibv_get_mmu_notifications(verbs, &my_list, sizeof(my_list)); > > I have no real opinion on whether the mmap/read should be hidden by > the above ibv calls or not. Either is fine with me. I would > *assume* that ibv_get_mmu_notifications() is non-blocking, right? > E.g., if you ask for N and only M are available (where M < N), then > the call returns with only M items filled (and M could be 0). > Perhaps you need another parameter to indicate how many items in > my_list were actually filled? Or is that the return value? Right, non-blocking, return value indicates length, like read(). These are not hiding mmap/read, they are new uverbs 'syscalls' that get the kernel to perform that operation. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 1:29 PM, Jason Gunthorpe wrote: > > What do you poll on the fd for? > > poll() is for apps that want to get the notifications without > spinning on the counter. Ah, ok. I think even with the ummunotify interface, that would work, too. Meaning: since you have to read() from the fd to get event details, poll() would *also* tell you if there was something to read (in addition to checking if (last_counter != counter)). The counter is a fast way of checking -- e.g., if you need to check in your fast path (which MPI's likely will). poll() could be used if you don't care if the check is slow. > If you don't think that is worth doing it > does simplify things alot, just add two new verbs calls: > > ibv_set_mmu_counter(verbs, &my_counter); > ibv_get_mmu_notifications(verbs, &my_list, sizeof(my_list)); I have no real opinion on whether the mmap/read should be hidden by the above ibv calls or not. Either is fine with me. I would *assume* that ibv_get_mmu_notifications() is non-blocking, right? E.g., if you ask for N and only M are available (where M < N), then the call returns with only M items filled (and M could be 0). Perhaps you need another parameter to indicate how many items in my_list were actually filled? Or is that the return value? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 01:17:40PM -0400, Jeff Squyres wrote: > On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote: > > > The main reason for the new FD is so it can be polled on.. > > What do you poll on the fd for? > > With ummunotify, you only read() from the fd when (counter != > last_counter). Were you thinking that the poll() would be for > something else? poll() is for apps that want to get the notifications without spinning on the counter. If you don't think that is worth doing it does simplify things alot, just add two new verbs calls: ibv_set_mmu_counter(verbs, &my_counter); ibv_get_mmu_notifications(verbs, &my_list, sizeof(my_list)); Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote: > The main reason for the new FD is so it can be polled on.. What do you poll on the fd for? With ummunotify, you only read() from the fd when (counter != last_counter). Were you thinking that the poll() would be for something else? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 12:06:50PM -0400, Jeff Squyres wrote: > IBM has found a resource that they think will be able to progress Roland's > ummunotify work. > > After a few discussions in Sonoma last week and some off-list emails, here's > what we decided: > > 1. Take Roland's last code drop (Roland: can you re-send the last copy of > your code?). > > 2. Do not convert it to the perf events kernel framework as the Linux kernel > community requested. Instead, migrate the functionality into the ibv code > base. Roland thinks that most of the code should be adaptable without too > many changes. Here's the highlights of the new functionality: > >a. Add a new flag to ibv_reg_mr() that does the same function as > UMMUNOTIFY_REGISTER_REGION >b. ibv_dereg_mr() always performs the equivalent of >UMMUNOTIFY_UNREGISTER_REGION (if necessary) >c. Make a new device somewhere (under /dev/infiniband?) that >performs the same functions as /dev/ummunotify (open it to mmap >the counter into user space, and read events when something >"interesting" happens) I would prefer to do this by adding a new verbs call that returns a fd directly. Ie use ib_uverbs_alloc_event_file and act like ibv_create_comp_channel. The main reason for the new FD is so it can be polled on.. You can also avoid the mmap scheme by doing what perf events does, pass in a pointer from userspace and have the kernel pin that page it is on. So, I'd suggest fd = ibv_create_mmu_monitor(verbs, &counter); [..] poll(fd); [..] if (counter != last_counter) [..] close(fd); Refuse to create more than one mmu_monitor for each verbs for now. I looked at this for a little while at Sonoma and I think it is quite straightforward, I'm happy to look over anything. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html