On Wed, 3 Feb 2016, Nathan Sidwell wrote:
> You can only override at runtime those dimensions that you said you'd override
> at runtime when you compiled your program.

Ah, I see.  That's not obvious to me, so perhaps added documentation can be
expanded to explain that?  (I now see that the plugin silently drops
user-provided dimensions where a value recorded at compile time is present;
not sure if that'd be worth a runtime diagnostic, could be very noisy)
 
> > I don't see why you say that because cuDeviceGetAttribute provides
> > CU_DEVICE_ATTRIBUTE_WARP_SIZE, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK,
> > CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X (which is not too useful for this case)
> > and cuFuncGetAttribute that allows to get a per-function thread limit.
> > There's a patch on gomp-nvptx branch that adds querying some of those to
> > the plugin.
> 
> thanks.  There doesn't appear to be one for number of physical CTAs though,
> right?

Sorry, I don't understand the question: CTA is a logical entity.  One could
derive limit of possible concurrent CTAs from number of SMs
(CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT) multiplied by how many CTAs fit on
one multiprocessor.  The latter figure can be taken as a rough worst-case
value, or semi-intelligent per-kernel estimate based on register limits
(there's code on gomp-nvptx branch that does this), or one can use the cuOcc*
API to ask the driver for a precise per-kernel figure.

Alexander

Reply via email to