On 08/03/2018 05:37 PM, Cesar Philippidis wrote: >> But I still see no rationale why blocks is used here, and I wonder >> whether something like num_gangs = grids * 64 would give similar results.
> My original intent was to keep the load proportional to the block size. > So, in the case were a block size is limited by shared-memory or the > register file capacity, the runtime wouldn't excessively over assign > gangs to the multiprocessor units if their state is going to be swapped > out even more than necessary. So, that's your rationale. Please add a comment describing this. Thanks, - Tom