Re: [PR] MAHOUT-878 Add CUDA Torch Tensor Support for QDP Python Binding [mahout]

via GitHub Tue, 20 Jan 2026 07:48:52 -0800


shiavm006 commented on code in PR #881:
URL: https://github.com/apache/mahout/pull/881#discussion_r2708578026



##########
qdp/qdp-core/src/gpu/encodings/amplitude.rs:
##########
@@ -95,11 +95,15 @@ impl QuantumEncoder for AmplitudeEncoder {
 
                 // GPU-accelerated norm for medium+ inputs, CPU fallback for 
tiny payloads
                 let inv_norm = if host_data.len() >= GPU_NORM_THRESHOLD {
-                    Self::calculate_inv_norm_gpu(
-                        _device,
-                        *input_slice.device_ptr() as *const f64,
-                        host_data.len(),
-                    )?
+                    // SAFETY: input_slice was just allocated and copied from 
host_data,
+                    // so the pointer is valid and contains host_data.len() 
elements
+                    unsafe {

Review Comment:
   We’re calling AmplitudeEncoder::calculate_inv_norm_gpu twice here: once 
directly and once within the unsafe block. Since the function is now pub unsafe 
fn, the first call should no longer compile, and even if it did, we’d be doing 
redundant work. We should keep only the unsafe call with the safety 
justification comment



##########
qdp/qdp-python/src/lib.rs:
##########
@@ -171,6 +171,145 @@ fn validate_tensor(tensor: &Bound<'_, PyAny>) -> 
PyResult<()> {
     Ok(())
 }
 
+/// Check if a PyTorch tensor is on a CUDA device
+fn is_cuda_tensor(tensor: &Bound<'_, PyAny>) -> PyResult<bool> {
+    let device = tensor.getattr("device")?;
+    let device_type: String = device.getattr("type")?.extract()?;
+    Ok(device_type == "cuda")
+}
+
+/// Get the CUDA device index from a PyTorch tensor
+fn get_tensor_device_id(tensor: &Bound<'_, PyAny>) -> PyResult<i32> {
+    let device = tensor.getattr("device")?;
+    let device_index: i32 = device.getattr("index")?.extract()?;
+    Ok(device_index)
+}
+
+/// Validate a CUDA tensor for direct GPU encoding
+/// Checks: dtype=float64, contiguous, non-empty, device_id matches engine
+fn validate_cuda_tensor_for_encoding(
+    tensor: &Bound<'_, PyAny>,
+    expected_device_id: usize,
+    encoding_method: &str,
+) -> PyResult<()> {
+    // Check encoding method support (currently only amplitude is supported 
for CUDA tensors)
+    if encoding_method != "amplitude" {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor encoding currently only supports 'amplitude' method, 
got '{}'. \
+             Use tensor.cpu() to convert to CPU tensor for other encoding 
methods.",
+            encoding_method
+        )));
+    }
+
+    // Check dtype is float64
+    let dtype = tensor.getattr("dtype")?;
+    let dtype_str: String = dtype.str()?.extract()?;
+    if !dtype_str.contains("float64") {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor must have dtype float64, got {}. Use 
tensor.to(torch.float64)",
+            dtype_str
+        )));
+    }
+
+    // Check contiguous
+    let is_contiguous: bool = tensor.call_method0("is_contiguous")?.extract()?;
+    if !is_contiguous {
+        return Err(PyRuntimeError::new_err(
+            "CUDA tensor must be contiguous. Use tensor.contiguous()",
+        ));
+    }
+
+    // Check non-empty
+    let numel: usize = tensor.call_method0("numel")?.extract()?;
+    if numel == 0 {
+        return Err(PyRuntimeError::new_err("CUDA tensor cannot be empty"));
+    }
+
+    // Check device matches engine
+    let tensor_device_id = get_tensor_device_id(tensor)?;
+    if tensor_device_id as usize != expected_device_id {
+        return Err(PyRuntimeError::new_err(format!(
+            "Device mismatch: tensor is on cuda:{}, but engine is on cuda:{}. \
+             Move tensor with tensor.to('cuda:{}')",
+            tensor_device_id, expected_device_id, expected_device_id
+        )));
+    }
+
+    Ok(())
+}
+
+/// DLPack tensor information extracted from a PyCapsule
+struct DLPackTensorInfo {
+    data_ptr: *const f64,
+    shape: Vec<i64>,
+    /// CUDA device ID from DLPack metadata.
+    /// Currently unused but kept for potential future device validation or 
multi-GPU support.
+    #[allow(dead_code)]
+    device_id: i32,
+}
+
+/// Extract GPU pointer from PyTorch tensor's __dlpack__() capsule
+///
+/// # Safety
+/// The returned `data_ptr` points to GPU memory owned by the source tensor.
+/// The caller must ensure the source tensor remains alive and unmodified
+/// for the entire duration that `data_ptr` is in use. Python's GIL ensures
+/// the tensor won't be garbage collected during `encode()`, but the caller
+/// must not deallocate or resize the tensor while encoding is in progress.
+fn extract_dlpack_tensor(_py: Python<'_>, tensor: &Bound<'_, PyAny>) -> 
PyResult<DLPackTensorInfo> {

Review Comment:
   use tensor.data_ptr() + tensor.shape directly and avoid DLPack for this 
direction



##########
qdp/qdp-python/src/lib.rs:
##########
@@ -171,6 +171,145 @@ fn validate_tensor(tensor: &Bound<'_, PyAny>) -> 
PyResult<()> {
     Ok(())
 }
 
+/// Check if a PyTorch tensor is on a CUDA device
+fn is_cuda_tensor(tensor: &Bound<'_, PyAny>) -> PyResult<bool> {
+    let device = tensor.getattr("device")?;
+    let device_type: String = device.getattr("type")?.extract()?;
+    Ok(device_type == "cuda")
+}
+
+/// Get the CUDA device index from a PyTorch tensor
+fn get_tensor_device_id(tensor: &Bound<'_, PyAny>) -> PyResult<i32> {
+    let device = tensor.getattr("device")?;
+    let device_index: i32 = device.getattr("index")?.extract()?;
+    Ok(device_index)
+}
+
+/// Validate a CUDA tensor for direct GPU encoding
+/// Checks: dtype=float64, contiguous, non-empty, device_id matches engine
+fn validate_cuda_tensor_for_encoding(
+    tensor: &Bound<'_, PyAny>,
+    expected_device_id: usize,
+    encoding_method: &str,
+) -> PyResult<()> {
+    // Check encoding method support (currently only amplitude is supported 
for CUDA tensors)
+    if encoding_method != "amplitude" {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor encoding currently only supports 'amplitude' method, 
got '{}'. \
+             Use tensor.cpu() to convert to CPU tensor for other encoding 
methods.",
+            encoding_method
+        )));
+    }
+
+    // Check dtype is float64
+    let dtype = tensor.getattr("dtype")?;
+    let dtype_str: String = dtype.str()?.extract()?;
+    if !dtype_str.contains("float64") {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor must have dtype float64, got {}. Use 
tensor.to(torch.float64)",
+            dtype_str
+        )));
+    }
+
+    // Check contiguous
+    let is_contiguous: bool = tensor.call_method0("is_contiguous")?.extract()?;
+    if !is_contiguous {
+        return Err(PyRuntimeError::new_err(
+            "CUDA tensor must be contiguous. Use tensor.contiguous()",
+        ));
+    }
+
+    // Check non-empty
+    let numel: usize = tensor.call_method0("numel")?.extract()?;
+    if numel == 0 {
+        return Err(PyRuntimeError::new_err("CUDA tensor cannot be empty"));
+    }
+
+    // Check device matches engine
+    let tensor_device_id = get_tensor_device_id(tensor)?;
+    if tensor_device_id as usize != expected_device_id {
+        return Err(PyRuntimeError::new_err(format!(
+            "Device mismatch: tensor is on cuda:{}, but engine is on cuda:{}. \
+             Move tensor with tensor.to('cuda:{}')",
+            tensor_device_id, expected_device_id, expected_device_id
+        )));
+    }
+
+    Ok(())
+}
+
+/// DLPack tensor information extracted from a PyCapsule
+struct DLPackTensorInfo {
+    data_ptr: *const f64,
+    shape: Vec<i64>,
+    /// CUDA device ID from DLPack metadata.
+    /// Currently unused but kept for potential future device validation or 
multi-GPU support.
+    #[allow(dead_code)]
+    device_id: i32,
+}
+
+/// Extract GPU pointer from PyTorch tensor's __dlpack__() capsule
+///
+/// # Safety
+/// The returned `data_ptr` points to GPU memory owned by the source tensor.
+/// The caller must ensure the source tensor remains alive and unmodified
+/// for the entire duration that `data_ptr` is in use. Python's GIL ensures
+/// the tensor won't be garbage collected during `encode()`, but the caller
+/// must not deallocate or resize the tensor while encoding is in progress.
+fn extract_dlpack_tensor(_py: Python<'_>, tensor: &Bound<'_, PyAny>) -> 
PyResult<DLPackTensorInfo> {

Review Comment:
   Using __dlpack__ to get the device pointer is convenient, but we’re 
currently not calling the DLManagedTensor.deleter, and we rename the capsule to 
used_dltensor, so the deleter won’t run at GC either. That likely means a small 
leak per call on PyTorch’s side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] MAHOUT-878 Add CUDA Torch Tensor Support for QDP Python Binding [mahout]

Reply via email to