aloha1357 opened a new pull request, #1389:
URL: https://github.com/apache/mahout/pull/1389

   ### Related Issues
   
   <!-- Closes #123 -->
   related #1385 
   
   ### Changes
   
   - [ ] Bug fix
   - [x] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Why
   
   PR1–PR5 delivered the specialized `ImplicitHadamardOzakiEngine` 
(matrix-free, +/-1 perfect quantization, Kronecker-blocked FWT for IQP). 
However, the full AdaptiveGEMM research engine (`AdaptiveOzakiEngine`) 
providing mixed-precision graded-ring (Ozaki + CRT over 7 primes, hybrid 
FP64/INT8 TC, Phase26 persistent kernels, general A @ B for *arbitrary* 
matrices) was only present in the final research snapshot and standalone pybind 
(`adaptive_gemm_py`).
   
   To finalize the pipeline, we must hook the general engine for "non-Hadamard 
logic" — i.e., any case where the second operand is not the special structured 
Hadamard matrix. This enables future general Tensor Core accelerated linear 
algebra inside QDP (beyond pure IQP FWT) while reusing the same Ozaki INT8 TC 
machinery.
   
   ### How
   
   - Git archaeology transplant of `AdaptiveOzaki.cu` (full 
hybrid/persistent/general GEMM implementation) from `pr-final-version` into the 
clean PR chain.
   - Updated `qdp/qdp-kernels/build.rs` to compile `AdaptiveOzaki.cu` alongside 
the Implicit path.
   - Added `launch_adaptive_ozaki_gemm` C FFI entry point (wraps 
`AdaptiveOzakiEngine::execute` with default Phase26Hybrid config) + matching 
declaration and no-cuda stub in `lib.rs`.
   - Added `// PR6:` inline English comments on all changed sites.
   - Verified end-to-end: `wsl -e bash -ic 'export 
PATH=/usr/local/cuda/bin:$PATH && cd .../qdp && cargo test --workspace 
--exclude qdp-python --lib'` passes with 0 failures (builds the new CUDA 
symbols successfully).
   
   The new public kernel symbol `launch_adaptive_ozaki_gemm` can now be called 
from Rust (qdp-core) or exposed upward, providing the general non-Hadamard TC 
path that complements `launch_iqp_encode_tc` (Hadamard-specialized).
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes (Verified passing against 
existing CI test suite — the --lib tests exercise build + Rust wrappers; GPU 
execution of new path covered by existing bench harness)
   - [x] Added or updated documentation for all changes (Added explanatory 
inline comments for PR; this PR06 doc on internal-dev-notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to