viiccwen opened a new pull request, #1108: URL: https://github.com/apache/mahout/pull/1108
### Related Issues Closes #1107 ### Changes - [x] Bug fix - [ ] New feature - [ ] Refactoring - [ ] Documentation - [ ] Test - [ ] CI/CD pipeline - [ ] Other ### Why This PR fixes misaligned vector loads in the batch amplitude and batched norm CUDA kernels. When batch samples have an odd length, the base address of later samples is not guaranteed to be aligned for `double2` / `float2` loads. The existing kernels could therefore trigger misaligned memory accesses and surface `CUDA_ERROR_MISALIGNED_ADDRESS`. ### How - Updated `amplitude_encode_batch_kernel` to use vectorized `double2` loads only when the sample base is aligned - Added scalar fallback for misaligned sample bases and odd tails in the batch amplitude kernel - Updated `l2_norm_batch_kernel` with the same alignment-aware load logic - Updated `l2_norm_batch_kernel_f32` with the same alignment-aware load logic - Refreshed kernel comments to reflect the new aligned fast path plus scalar fallback behavior ## Tests - Added a regression test for odd-length batched amplitude encoding - Added a regression test for odd-length batched L2 norm reduction (f64) - Added a regression test for odd-length batched L2 norm reduction (f32) ## Checklist - [x] Added or updated unit tests for all changes - [ ] Added or updated documentation for all changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
