rich7420 commented on code in PR #929:
URL: https://github.com/apache/mahout/pull/929#discussion_r2724290189


##########
qdp/qdp-python/src/lib.rs:
##########
@@ -65,6 +64,14 @@ impl QuantumTensor {
             return Err(PyRuntimeError::new_err("Invalid DLPack tensor 
pointer"));
         }
 
+        if let Some(stream) = stream {
+            if stream > 0 {
+                qdp_core::dlpack::synchronize_stream(stream as *mut 
std::ffi::c_void).map_err(
+                    |e| PyRuntimeError::new_err(format!("CUDA stream sync 
failed: {}", e)),
+                )?;
+            }
+        }
+

Review Comment:
   DLPack stream handling treats special CUDA stream values 1 (legacy default) 
and 2 (per-thread default) as raw pointers, which can lead to invalid 
cudaStream_t usage and undefined behavior. The array API spec explicitly 
defines these values as non-pointer sentinel values for CUDA streams. The 
current logic calls synchronize_stream for any stream > 0, passing 1/2 
directly. I think this should be mapped to cudaStreamLegacy/cudaStreamPerThread 
(or avoided) per spec.
   ref: 
https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to