rich7420 commented on code in PR #751:
URL: https://github.com/apache/mahout/pull/751#discussion_r2658926866
##########
qdp/qdp-core/src/lib.rs:
##########
@@ -221,38 +228,13 @@ impl QdpEngine {
"Sample size cannot be zero".into(),
));
}
- if sample_size > STAGE_SIZE_ELEMENTS {
- return Err(MahoutError::InvalidInput(format!(
- "Sample size {} exceeds staging buffer capacity {}
(elements)",
- sample_size, STAGE_SIZE_ELEMENTS
- )));
- }
-
- // Reuse a single norm buffer across chunks to avoid per-chunk
allocations.
- //
- // Important: the norm buffer must outlive the async kernels that
consume it.
- // Per-chunk allocation + drop can lead to use-after-free when the
next chunk
- // reuses the same device memory while the previous chunk is still
running.
Review Comment:
I think here has a potential problem in `qdp/qdp-core/src/lib.rs`. In the
`encode_from_parquet()` function in `qdp/qdp-core/src/lib.rs`, there is a
critical use-after-free bug in the lifetime management of `norm_buffer`. The
code allocates `norm_buffer` inside the `BatchEncode` scope at line 331-339,
and asynchronously launches `launch_l2_norm_batch` and
`launch_amplitude_encode_batch` kernels at line 343-375, which execute
asynchronously on `ctx.stream_compute`. However, when the `BatchEncode` scope
ends at line 376, `norm_buffer` is immediately dropped, and according to the
comment in `pipeline.rs:336`, when `CudaSlice` is dropped, it immediately calls
`cudaFree` to free GPU memory. The problem is that `ctx.sync_copy_stream()` at
line 378 only synchronizes the copy stream, not the compute stream, so when
`norm_buffer` is freed, kernels on the compute stream may still be executing,
causing kernels to access GPU memory that has already been freed. Even though
kernels execute seque
ntially on the same stream, if their execution time is long, `norm_buffer` may
still be freed before the kernels complete. In a loop processing multiple
chunks, the first chunk's `norm_buffer` is dropped before its kernel completes,
and while the second chunk's kernel will wait for the first to complete, the
first kernel is already accessing freed memory. The old code's comment
explicitly warned about this issue, stating that the norm buffer must outlive
the async kernels that consume it, and that per-chunk allocation plus drop can
lead to use-after-free. Plz correct me if I'm wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]