Re: [openib-general] Re: uCM create connection ID
Libor Michalek wrote: I see. This appears to come from a difference between the event reporting model used by the kernel CM versus the usermode CM (callback versus calldown). Do you block the destroy on a lock while a callback for that cm_id is active? I wouldn't say that the difference is attributed to callback vs. calldown, in both cases it's a matter of serializing the destroy with the event. Yes - the destroy call in the kernel blocks while there's a callback in progress. After destroy returns, the CM guarantees that no additional callbacks will be received by the user. The blocking destroy call was the reason for letting the user destroy the cm_id by returning a non-zero value from the callback. Maybe there's a way to assist the user here. Can we report a destruction event, or require a second call to indicate that an event has been processed? A destruction event could work, but with some limits which might make it impracticle. The user would have to be really carefull not to do _anything_ with the object after calling destroy, and only cleanup in the same thread that is used to get the destroy completion event. The destroy completion event could be retreived and processed before the original destroy call returns. Also, the user would need to make sure that they are getting events in a _single_ thread, since multiple event get threads could pose the same problem as before. I agree that the destroy event could occur before destroy returns, so the user would need to be careful there. Arlin mentioned that there's a put event call that needs to be invoked after getting an event. If so, then the CM can track the number of outstanding events that are in process. It could then either delay the destroy call while an event is outstanding, or delay reporting the destroy event until all events have been processed. This should handle the multi-thread issues. It may also make sense to have the uCM serialize all events to a single cm_id anyway. I ended up doing this in the kernel, which simplified the application's event handling. Otherwise, the events can end up being processed out of order. E.g. a REJ is reported, followed by a MRA to a REQ. We could build the serialization table for the API consumer, have all cm_id calls and events go through a level of indirection in a table locked against multiple threads. This was the way we ended up doing it in our old code for the userCM that we used for uDAPL. I had left this out since it seems reasonable that not all apps would want/need this guarantee from the API, and that they could implement it themselves if they did want it... I could be wrong. I need to spend more time looking at the uCM API/implementation to see if there's a way to help protect against reporting/processing events. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: uCM create connection ID
On Thu, Jun 30, 2005 at 09:13:28AM -0700, Sean Hefty wrote: > Libor Michalek wrote: > > Assume that the userspace 'struct ib_cm_event' contains the cm_id as > > well as a new 'u64 context' which is inherited from the cm_id, and is > > set at the time of the cm_id creation. This is what I'm assuming that > > Arlin would like to see. > > > > In the case of two threads accessing the CM at once there's a race > > condition if you are going to use the 'context' variable as a pointer > > to memory: > > > > Thread 1 Thread 2 > > - --- > > cm_object = malloc(sizeof(*cm_object) > > ib_cm_create_id(&cm_object->cm_id, > > (u64)cm_object) > > > > ib_cm_event_get(&event) > > ib_cm_destroy_id(cm_object->cm_id) > > free(cm_object); > > process_event((void *)event->context); > > I see. This appears to come from a difference between the event reporting > model used by the kernel CM versus the usermode CM (callback versus > calldown). Do you block the destroy on a lock while a callback for that cm_id is active? I wouldn't say that the difference is attributed to callback vs. calldown, in both cases it's a matter of serializing the destroy with the event. > Maybe there's a way to assist the user here. Can we report a > destruction event, or require a second call to indicate that an event has > been processed? A destruction event could work, but with some limits which might make it impracticle. The user would have to be really carefull not to do _anything_ with the object after calling destroy, and only cleanup in the same thread that is used to get the destroy completion event. The destroy completion event could be retreived and processed before the original destroy call returns. Also, the user would need to make sure that they are getting events in a _single_ thread, since multiple event get threads could pose the same problem as before. Blocking on the destroy seems like it could be error prone, that you could easily deadlock the user, who probably has a lock around the object which contains the cm_id... We could build the serialization table for the API consumer, have all cm_id calls and events go through a level of indirection in a table locked against multiple threads. This was the way we ended up doing it in our old code for the userCM that we used for uDAPL. I had left this out since it seems reasonable that not all apps would want/need this guarantee from the API, and that they could implement it themselves if they did want it... I could be wrong. -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: uCM create connection ID
Libor Michalek wrote: Assume that the userspace 'struct ib_cm_event' contains the cm_id as well as a new 'u64 context' which is inherited from the cm_id, and is set at the time of the cm_id creation. This is what I'm assuming that Arlin would like to see. In the case of two threads accessing the CM at once there's a race condition if you are going to use the 'context' variable as a pointer to memory: Thread 1 Thread 2 - --- cm_object = malloc(sizeof(*cm_object) ib_cm_create_id(&cm_object->cm_id, (u64)cm_object) ib_cm_event_get(&event) ib_cm_destroy_id(cm_object->cm_id) free(cm_object); process_event((void *)event->context); I see. This appears to come from a difference between the event reporting model used by the kernel CM versus the usermode CM (callback versus calldown). Maybe there's a way to assist the user here. Can we report a destruction event, or require a second call to indicate that an event has been processed? In the latter case, destruction could block while the event is being processed. Not sure if either of these would help if the user processed events using multiple threads, but I think with additional serialization in the CM it might be able to work. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: uCM create connection ID
On Wed, Jun 29, 2005 at 11:16:19AM -0700, Sean Hefty wrote: > Libor Michalek wrote: > >>Is it possible for a consumer of uCM to provide a context with the > >>create_id that could be returned with the event? I will have some scale > >>up issues if I have to walk a list looking for a uCM provided connection > >>ID instead of a context that could point directly to the appropriate > >>uDAPL CM object. > > > > It would be easy to add in a context variable. I had left it out on > > purpose, since it's easy to get into a situation where using the context > > as a pointer you can end up referencing deallocated memory. However, I > > suppose it should be there for flexability. > > Can you explain the situation where the application could reference > deallocated memory? I would think that the uCM could take steps that would > make it impossible for a well written app from doing this. Assume that the userspace 'struct ib_cm_event' contains the cm_id as well as a new 'u64 context' which is inherited from the cm_id, and is set at the time of the cm_id creation. This is what I'm assuming that Arlin would like to see. In the case of two threads accessing the CM at once there's a race condition if you are going to use the 'context' variable as a pointer to memory: Thread 1 Thread 2 - --- cm_object = malloc(sizeof(*cm_object) ib_cm_create_id(&cm_object->cm_id, (u64)cm_object) ib_cm_event_get(&event) ib_cm_destroy_id(cm_object->cm_id) free(cm_object); process_event((void *)event->context); If you're going to insist on using threads, the context variable should provide a reference into some table which contains the cm_object and that table (e.g. lookup, add, remove) needs to be sync'd with a thread lock. -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: uCM create connection ID
Libor Michalek wrote: Is it possible for a consumer of uCM to provide a context with the create_id that could be returned with the event? I will have some scale up issues if I have to walk a list looking for a uCM provided connection ID instead of a context that could point directly to the appropriate uDAPL CM object. It would be easy to add in a context variable. I had left it out on purpose, since it's easy to get into a situation where using the context as a pointer you can end up referencing deallocated memory. However, I suppose it should be there for flexability. Can you explain the situation where the application could reference deallocated memory? I would think that the uCM could take steps that would make it impossible for a well written app from doing this. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general