Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
Grigori Goronzy writes: > On 2015-06-09 22:52, Francisco Jerez wrote: >>> + >>> + if (blocking) >>> + hev().wait(); >>> + >> >> hard_event::wait() may fail, so this should probably be done before the >> ret_object() call to avoid leaks. > > Alright... C++ exceptions are a minefield. :) > So is virtually any approach to error handling :P, it's not really more difficult to write exception-safe code than it is to write equivalent returned-error-code-safe code, in fact it's IME less difficult as long as you stick to RAII (which is a healthy practice on its own). >> Is there any reason you didn't make >> the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()? >> > > Must be an oversight. I think I did that, or at least I intended to do > so. > >> Same comment as above. Also note that this is being more strict than >> the spec requires (which I believe is what Tom was referring to). From >> the CL 1.2 spec: >> >> | If blocking_write is CL_TRUE, the OpenCL implementation copies the >> data >> | referred to by ptr and enqueues the write operation in the >> | command-queue. The memory pointed to by ptr can be reused by the >> | application after the clEnqueueWriteBufferRect call returns. >> >> The spec is giving you no guarantee that the write to the actual memory >> object will be complete by the time the clEnqueueWriteBufferRect call >> returns -- Only that your data will have been buffered somewhere and >> the >> memory pointed to by the argument can be reused immediately by the >> application. The reason why I was reluctant to make this change last >> time it came up was that it's likely to hurt performance unnecessarily >> because the wait() call blocks until *all* previous commands in the >> same >> queue have completed execution, even though in the most common case the >> copy is performed synchronously using soft_copy_op(), so the wait() >> call >> is redundant even for blocking copies. >> > > OK, maybe we could drop the wait completely for all of the "write" > calls. > I think those should also call event::wait_signalled() just to make sure that the event action has been executed already -- It may not have executed immediately if there were any user events in the dependency graph. >> The case with blocking reads is similar, the copy is handled >> synchronously using soft_copy_op() when no user events are present in >> the list of dependencies, so calling wait() on the event is unnecessary >> to guarantee that the execution of the read has completed, and will >> cause a pipe_context flush and wait until the most recent fence is >> signalled. >> > > I think it's reasonable to expect that the event is ready for profile > queries after a blocking read has finished. That was the initial > motivation for this patch. Other implementations behave like that. I > didn't expect wait() to completely flush everything. Won't that cause a > lot of needless flushing with event wait lists? > hard_event::wait() flushes the command queue, what in turn attaches a fence to the event object marking the end of the execution of the last batch of commands, which arguably contains whatever operations were performed by the event action. This assumption breaks in the case of soft_copy_op() because it doesn't submit any actual commands to the GPU, so calling hard_event::wait() is sub-optimal (it will wait for commands which are completely unrelated to the copy operation), this can be fixed by using the weaker version of wait() that doesn't care about the GPU being already done with the work. Thanks. >> Ideally we would have a weaker variant of event::wait() >> (e.g. wait_signalled()) that doesn't flush and just waits for the >> associated action call-back to have been executed without giving any >> guarantees about the corresponding GPU command. The event interface >> doesn't expose such a functionality right now, I'm attaching two >> (completely untested) patches implementing it, you should be able to >> use >> them as starting point to fix blocking transfers. >> > > Thanks, I'll look into that later when I get some free time. > > Grigori signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
On 2015-06-09 22:52, Francisco Jerez wrote: + + if (blocking) + hev().wait(); + hard_event::wait() may fail, so this should probably be done before the ret_object() call to avoid leaks. Alright... C++ exceptions are a minefield. :) Is there any reason you didn't make the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()? Must be an oversight. I think I did that, or at least I intended to do so. Same comment as above. Also note that this is being more strict than the spec requires (which I believe is what Tom was referring to). From the CL 1.2 spec: | If blocking_write is CL_TRUE, the OpenCL implementation copies the data | referred to by ptr and enqueues the write operation in the | command-queue. The memory pointed to by ptr can be reused by the | application after the clEnqueueWriteBufferRect call returns. The spec is giving you no guarantee that the write to the actual memory object will be complete by the time the clEnqueueWriteBufferRect call returns -- Only that your data will have been buffered somewhere and the memory pointed to by the argument can be reused immediately by the application. The reason why I was reluctant to make this change last time it came up was that it's likely to hurt performance unnecessarily because the wait() call blocks until *all* previous commands in the same queue have completed execution, even though in the most common case the copy is performed synchronously using soft_copy_op(), so the wait() call is redundant even for blocking copies. OK, maybe we could drop the wait completely for all of the "write" calls. The case with blocking reads is similar, the copy is handled synchronously using soft_copy_op() when no user events are present in the list of dependencies, so calling wait() on the event is unnecessary to guarantee that the execution of the read has completed, and will cause a pipe_context flush and wait until the most recent fence is signalled. I think it's reasonable to expect that the event is ready for profile queries after a blocking read has finished. That was the initial motivation for this patch. Other implementations behave like that. I didn't expect wait() to completely flush everything. Won't that cause a lot of needless flushing with event wait lists? Ideally we would have a weaker variant of event::wait() (e.g. wait_signalled()) that doesn't flush and just waits for the associated action call-back to have been executed without giving any guarantees about the corresponding GPU command. The event interface doesn't expose such a functionality right now, I'm attaching two (completely untested) patches implementing it, you should be able to use them as starting point to fix blocking transfers. Thanks, I'll look into that later when I get some free time. Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
Hi Grigori, Grigori Goronzy writes: > Wrap MapBuffer and MapImage as hard_event actions, like other > operations. This enables correct profiling. Also make sure to wait > for events to finish when blocking is requested by the caller. > --- > src/gallium/state_trackers/clover/api/transfer.cpp | 50 > -- > 1 file changed, 46 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp > b/src/gallium/state_trackers/clover/api/transfer.cpp > index fdb9405..4986f53 100644 > --- a/src/gallium/state_trackers/clover/api/transfer.cpp > +++ b/src/gallium/state_trackers/clover/api/transfer.cpp > @@ -270,6 +270,18 @@ namespace { > src_obj->resource(q), src_orig); >}; > } > + > + /// > + /// Resource mapping > + /// > + template > + std::function > + map_resource_op(command_queue &q, T obj, cl_map_flags flags, bool > blocking, > + const vector_t &origin, const vector_t ®ion, void > **map) { > + return [=, &q](event &) { > + *map = obj->resource(q).add_map(q, flags, blocking, origin, region); > + }; > + } > } > > CLOVER_API cl_int > @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + hard_event::wait() may fail, so this should probably be done before the ret_object() call to avoid leaks. Is there any reason you didn't make the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()? > return CL_SUCCESS; > > } catch (error &e) { > @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + Same comment as above. Also note that this is being more strict than the spec requires (which I believe is what Tom was referring to). From the CL 1.2 spec: | If blocking_write is CL_TRUE, the OpenCL implementation copies the data | referred to by ptr and enqueues the write operation in the | command-queue. The memory pointed to by ptr can be reused by the | application after the clEnqueueWriteBufferRect call returns. The spec is giving you no guarantee that the write to the actual memory object will be complete by the time the clEnqueueWriteBufferRect call returns -- Only that your data will have been buffered somewhere and the memory pointed to by the argument can be reused immediately by the application. The reason why I was reluctant to make this change last time it came up was that it's likely to hurt performance unnecessarily because the wait() call blocks until *all* previous commands in the same queue have completed execution, even though in the most common case the copy is performed synchronously using soft_copy_op(), so the wait() call is redundant even for blocking copies. The case with blocking reads is similar, the copy is handled synchronously using soft_copy_op() when no user events are present in the list of dependencies, so calling wait() on the event is unnecessary to guarantee that the execution of the read has completed, and will cause a pipe_context flush and wait until the most recent fence is signalled. Ideally we would have a weaker variant of event::wait() (e.g. wait_signalled()) that doesn't flush and just waits for the associated action call-back to have been executed without giving any guarantees about the corresponding GPU command. The event interface doesn't expose such a functionality right now, I'm attaching two (completely untested) patches implementing it, you should be able to use them as starting point to fix blocking transfers. > return CL_SUCCESS; > > } catch (error &e) { > @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + Same comment as above for image reads and writes. > return CL_SUCCESS; > > } catch (error &e) { > @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, mem, obj_origin, obj_pitch, region); > validate_map_flags(mem, flags); > > - void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, > region); > + void *map = nullptr; > + auto hev = create( > + q, CL_COMMAND_MAP_BUFFER, deps, > + map_resource_op(q, &mem, flags, blocking, obj_origin, region, &map)); > > - ret_object(rd_ev, create(q, CL_COMMAND_MAP_BUFFER, deps)); > +
Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
cc Francisco On Thu, May 28, 2015 at 10:10:19AM +0200, Grigori Goronzy wrote: > Wrap MapBuffer and MapImage as hard_event actions, like other > operations. This enables correct profiling. Also make sure to wait > for events to finish when blocking is requested by the caller. > --- > src/gallium/state_trackers/clover/api/transfer.cpp | 50 > -- > 1 file changed, 46 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp > b/src/gallium/state_trackers/clover/api/transfer.cpp > index fdb9405..4986f53 100644 > --- a/src/gallium/state_trackers/clover/api/transfer.cpp > +++ b/src/gallium/state_trackers/clover/api/transfer.cpp > @@ -270,6 +270,18 @@ namespace { > src_obj->resource(q), src_orig); >}; > } > + > + /// > + /// Resource mapping > + /// > + template > + std::function > + map_resource_op(command_queue &q, T obj, cl_map_flags flags, bool > blocking, > + const vector_t &origin, const vector_t ®ion, void > **map) { > + return [=, &q](event &) { > + *map = obj->resource(q).add_map(q, flags, blocking, origin, region); > + }; > + } > } > > CLOVER_API cl_int > @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + I think a few other people have tried to make this change and it was concluded that it is not necessary. Francisco can correct me if I'm wrong about this and probably provide more details as to why. > return CL_SUCCESS; > > } catch (error &e) { > @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, mem, obj_origin, obj_pitch, region); > validate_map_flags(mem, flags); > > - void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, > region); > + void *map = nullptr; > + auto hev = create( > + q, CL_COMMAND_MAP_BUFFER, deps, > + map_resource_op(q, &mem, flags, blocking, obj_origin, region, &map)); > > - ret_object(rd_ev, create(q, CL_COMMAND_MAP_BUFFER, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + > return map; > > } catch (error &e) { > @@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, img, origin, region); > validate_map_flags(img, flags); > > - void *map = img.resource(q).add_map(q, flags, blocking, origin, region); > + void *map = nullptr; > + auto hev = create( > + q, CL_COMMAND_MAP_IMAGE, deps, > + map_resource_op(q, &img, flags, blocking, origin, region, &map)); > > - ret_object(rd_ev, create(q, CL_COMMAND_MAP_IMAGE, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + > return map; > > } catch (error &e) { > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
On 28.05.2015 10:10, Grigori Goronzy wrote: > Wrap MapBuffer and MapImage as hard_event actions, like other > operations. This enables correct profiling. Also make sure to wait > for events to finish when blocking is requested by the caller. > --- Ping? > src/gallium/state_trackers/clover/api/transfer.cpp | 50 > -- > 1 file changed, 46 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp > b/src/gallium/state_trackers/clover/api/transfer.cpp > index fdb9405..4986f53 100644 > --- a/src/gallium/state_trackers/clover/api/transfer.cpp > +++ b/src/gallium/state_trackers/clover/api/transfer.cpp > @@ -270,6 +270,18 @@ namespace { > src_obj->resource(q), src_orig); >}; > } > + > + /// > + /// Resource mapping > + /// > + template > + std::function > + map_resource_op(command_queue &q, T obj, cl_map_flags flags, bool > blocking, > + const vector_t &origin, const vector_t ®ion, void > **map) { > + return [=, &q](event &) { > + *map = obj->resource(q).add_map(q, flags, blocking, origin, region); > + }; > + } > } > > CLOVER_API cl_int > @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, mem, obj_origin, obj_pitch, region); > validate_map_flags(mem, flags); > > - void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, > region); > + void *map = nullptr; > + auto hev = create( > + q, CL_COMMAND_MAP_BUFFER, deps, > + map_resource_op(q, &mem, flags, blocking, obj_origin, region, &map)); > > - ret_object(rd_ev, create(q, CL_COMMAND_MAP_BUFFER, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + > return map; > > } catch (error &e) { > @@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, img, origin, region); > validate_map_flags(img, flags); > > - void *map = img.resource(q).add_map(q, flags, blocking, origin, region); > + void *map = nullptr; > + auto hev = create( > + q, CL_COMMAND_MAP_IMAGE, deps, > + map_resource_op(q, &img, flags, blocking, origin, region, &map)); > > - ret_object(rd_ev, create(q, CL_COMMAND_MAP_IMAGE, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + > return map; > > } catch (error &e) { > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
Wrap MapBuffer and MapImage as hard_event actions, like other operations. This enables correct profiling. Also make sure to wait for events to finish when blocking is requested by the caller. --- src/gallium/state_trackers/clover/api/transfer.cpp | 50 -- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp b/src/gallium/state_trackers/clover/api/transfer.cpp index fdb9405..4986f53 100644 --- a/src/gallium/state_trackers/clover/api/transfer.cpp +++ b/src/gallium/state_trackers/clover/api/transfer.cpp @@ -270,6 +270,18 @@ namespace { src_obj->resource(q), src_orig); }; } + + /// + /// Resource mapping + /// + template + std::function + map_resource_op(command_queue &q, T obj, cl_map_flags flags, bool blocking, + const vector_t &origin, const vector_t ®ion, void **map) { + return [=, &q](event &) { + *map = obj->resource(q).add_map(q, flags, blocking, origin, region); + }; + } } CLOVER_API cl_int @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, region)); ret_object(rd_ev, hev); + + if (blocking) + hev().wait(); + return CL_SUCCESS; } catch (error &e) { @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, region)); ret_object(rd_ev, hev); + + if (blocking) + hev().wait(); + return CL_SUCCESS; } catch (error &e) { @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, region)); ret_object(rd_ev, hev); + + if (blocking) + hev().wait(); + return CL_SUCCESS; } catch (error &e) { @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, region)); ret_object(rd_ev, hev); + + if (blocking) + hev().wait(); + return CL_SUCCESS; } catch (error &e) { @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, validate_object(q, mem, obj_origin, obj_pitch, region); validate_map_flags(mem, flags); - void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, region); + void *map = nullptr; + auto hev = create( + q, CL_COMMAND_MAP_BUFFER, deps, + map_resource_op(q, &mem, flags, blocking, obj_origin, region, &map)); - ret_object(rd_ev, create(q, CL_COMMAND_MAP_BUFFER, deps)); + ret_object(rd_ev, hev); ret_error(r_errcode, CL_SUCCESS); + + if (blocking) + hev().wait(); + return map; } catch (error &e) { @@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking, validate_object(q, img, origin, region); validate_map_flags(img, flags); - void *map = img.resource(q).add_map(q, flags, blocking, origin, region); + void *map = nullptr; + auto hev = create( + q, CL_COMMAND_MAP_IMAGE, deps, + map_resource_op(q, &img, flags, blocking, origin, region, &map)); - ret_object(rd_ev, create(q, CL_COMMAND_MAP_IMAGE, deps)); + ret_object(rd_ev, hev); ret_error(r_errcode, CL_SUCCESS); + + if (blocking) + hev().wait(); + return map; } catch (error &e) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev