aloha1357 opened a new pull request, #1386: URL: https://github.com/apache/mahout/pull/1386
### Related Issues <!-- Closes #123 --> related #1385 ### Changes - [ ] Bug fix - [ ] New feature - [x] Refactoring - [ ] Documentation - [ ] Test - [ ] CI/CD pipeline - [ ] Other ### Why The original phase encoding and IQP encoding kernels suffered from GPU thread divergence due to conditional branching (`if (val != 0.0)` or `if ((x >> i) & 1U)`). Furthermore, the normalization factor (`norm_factor`) was being redundantly calculated inside the GPU kernel, consuming extra cycles. Eliminating these inefficiencies significantly improves the kernel's execution speed on the GPU. ### How - **Replaced Conditional Branching:** In both `phase.cu` and `iqp.cu`, the `if` conditions checking bit states were replaced with boolean arithmetic casting and multiplication (e.g., `phases[bit] * (double)((idx >> bit) & 1U)`). This ensures that all threads in a warp follow the exact same instruction path, eliminating warp divergence. - **Host-side Pre-calculation:** Moved the `norm_factor` calculation to the host (CPU) before launching the kernel in `phase.cu`, passing the result as an immutable parameter. - **Added Explanatory Comments:** Included inline documentation near the bitwise arithmetic lines to aid code reviewers in understanding the optimizations. ## Checklist - [x] Added or updated unit tests for all changes (Verified passing against existing CI test suite) - [x] Added or updated documentation for all changes (Added explanatory inline comments for PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
