viiccwen opened a new issue, #967:
URL: https://github.com/apache/mahout/issues/967

   ### Description
   
   reference: 
[comment](https://github.com/apache/mahout/pull/918#discussion_r2724248036)
   
   the `launch_l2_norm_batch` function can attempt to launch an invalid CUDA 
kernel when `num_samples` exceeds `CUDA_MAX_GRID_DIM_1D` (65535).
   
   ### Root Cause
   
   When `num_samples > 65535`, even with `blocks_per_sample = 1`, the 
calculated `gridSize = num_samples * 1 = num_samples` still exceeds the CUDA 1D 
grid dimension limit (65535), leading to an invalid kernel launch.
   
   The existing code attempts to reduce `blocks_per_sample` when `gridSize > 
max_grid`:
   
   
https://github.com/apache/mahout/blob/ef00f92eb236414d2ae15c01f4a32944f8d4eb2a/qdp/qdp-kernels/src/amplitude.cu#L613-L620
   
   However, when `num_samples > max_grid`, even with `blocks_per_sample = 1`, 
`gridSize = num_samples` still exceeds the limit, causing a CUDA error.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to