ryankert01 commented on code in PR #1310:
URL: https://github.com/apache/mahout/pull/1310#discussion_r3332479661
##########
qdp/qdp-core/src/gpu/encodings/basis.rs:
##########
@@ -541,6 +541,29 @@ impl QuantumEncoder for BasisEncoder {
Ok(batch_state_vector)
}
+ #[cfg(target_os = "linux")]
+ unsafe fn encode_from_gpu_ptr_f32(
+ &self,
+ device: &Arc<CudaDevice>,
+ input_d: *const c_void,
+ input_len: usize,
+ num_qubits: usize,
+ stream: *mut c_void,
+ ) -> Result<GpuStateVector> {
+ // Delegate to the workhorse `_with_stream` fn (see angle.rs for
rationale).
+ // `input_len` is unused — basis is always one index per sample — but
kept on the
+ // signature to match the trait shape used by amplitude / angle.
+ let _ = input_len;
Review Comment:
The caller-supplied `input_len` is silently discarded here. A debug
assertion would catch misuse at zero release cost:
```rust
debug_assert_eq!(input_len, 1, "basis encoder takes exactly one index per
sample");
```
##########
qdp/qdp-core/src/gpu/encodings/mod.rs:
##########
@@ -135,6 +135,34 @@ pub trait QuantumEncoder: Send + Sync + 'static {
)))
}
+ /// Encode a single sample from an existing GPU pointer (zero-copy) using
an f32 input.
+ /// Default: not supported.
+ ///
+ /// This is the f32 counterpart of
[`encode_from_gpu_ptr`](Self::encode_from_gpu_ptr). The
+ /// sibling batch variant is
[`encode_batch_from_gpu_ptr_f32`](Self::encode_batch_from_gpu_ptr_f32).
+ /// Adding a new encoder with f32 zero-copy support should override
**both** this method and
+ /// the batch variant; the previous arrangement (single-sample as a
standalone `pub unsafe
+ /// fn` on each encoder type, batch on the trait) made the pattern
accidentally divergent.
+ ///
+ /// # Safety
+ /// Caller must ensure `input_d` points to valid GPU memory with at least
`input_len`
+ /// `f32` elements on the same device as `device`, and `stream` is either
null or a valid
+ /// CUDA stream associated with `device`.
+ #[cfg(target_os = "linux")]
+ unsafe fn encode_from_gpu_ptr_f32(
Review Comment:
The `# Safety` doc should note that this path skips
`validate_cuda_input_ptr` (the engine entry point calls it before delegating,
but callers going through the trait directly don't get that check). Worth one
sentence here so it's not a surprise:
```
/// The [] entry point validates against automatically;
/// callers invoking this method directly are responsible for that check.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]