[PR] [QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors [mahout]

via GitHub Sat, 07 Feb 2026 06:05:16 -0800


viiccwen opened a new pull request, #1029:
URL: https://github.com/apache/mahout/pull/1029


   ### Purpose of PR
   
   This PR adds full float32 amplitude support in QDP:
   
   - **Core:** amplitude batch encoding from GPU float32 pointers via 
`encode_batch_from_gpu_ptr_f32` / `encode_batch_from_gpu_ptr_f32_with_stream`.
   - **Kernels:** New `launch_amplitude_encode_batch_f32` and batch f32 L2 norm 
path for amplitude batch.
   - **Allocation:** `GpuStateVector::new_batch` is refactored to take a 
`precision` argument; all batch call sites (amplitude, angle, basis, iqp, 
encoding pipeline) pass the correct precision.
   - **Python:** float32 + amplitude batch encoding still returns a clear “not 
yet supported” error (will be the follow-up PR).
   - **Tests:** Core unit tests for f32 amplitude (success and error paths).
   
   ### Changes
   
   #### qdp-kernels
   
   - **amplitude.cu**
     - Add `amplitude_encode_batch_kernel_f32` (float32 input → cuComplex state 
batch).
     - Add `launch_amplitude_encode_batch_f32` and wire it in the FFI.
   - **lib.rs**
     - Declare and link `launch_amplitude_encode_batch_f32`; add stub for 
non-Linux / no-CUDA builds.
   
   #### qdp-core
   
   - **lib.rs**
     - Add `encode_batch_from_gpu_ptr_f32` / 
`encode_batch_from_gpu_ptr_f32_with_stream` (2D float32 amplitude): validate 
input, allocate f32 batch state, run `launch_l2_norm_batch_f32`, validate 
norms, run `launch_amplitude_encode_batch_f32`, convert to engine precision, 
return DLPack.
     - Add `QdpEngine::precision()` for callers that need the engine’s output 
precision.
   - **gpu/memory.rs**
     - Refactor `GpuStateVector::new_batch` to take a fourth argument 
`precision: Precision` and allocate either Float32 or Float64 batch buffer 
accordingly.
   - **gpu/encodings (amplitude, angle, basis, iqp)**
     - Update all `new_batch(...)` call sites to pass `Precision::Float64`.
   - **encoding/mod.rs**
     - Use `engine.precision()` and pass it to `GpuStateVector::new_batch` for 
the streaming pipeline.
   
   #### qdp-core tests
   
   - **gpu_ptr_encoding.rs**
     - 2D f32 batch: success (`encode_batch_from_gpu_ptr_f32`, 
`encode_batch_from_gpu_ptr_f32_with_stream`), error (num_samples zero, 
sample_size zero, null pointer, sample_size exceeds state length, zero-norm 
sample).
   - **dlpack.rs**
     - Update batch shape test to call `new_batch(..., Precision::Float64)`.
   
   ### Related Issues or PRs
   closes #1028
   
   ### Changes Made
   <!-- Please mark one with an "x"   -->
   - [ ] Bug fix
   - [x] New feature
   - [x] Refactoring
   - [ ] Documentation
   - [x] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Breaking Changes
   <!-- Does this PR introduce a breaking change? -->
   - [ ] Yes
   - [x] No
   
   ### Checklist
   <!-- Please mark each item with an "x" when complete -->
   <!-- If not all items are complete, please open this as a **Draft PR**.
   Once all requirements are met, mark as ready for review. -->
   
   - [x] Added or updated unit tests for all changes
   - [ ] Added or updated documentation for all changes
   - [x] Successfully built and ran all unit tests or manual tests locally
   - [x] PR title follows "MAHOUT-XXX: Brief Description" format (if related to 
an issue)
   - [x] Code follows ASF guidelines
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors [mahout]

Reply via email to