andrewmusselman opened a new pull request, #1323:
URL: https://github.com/apache/mahout/pull/1323
Fixes Issue #1320 .
Adds `_select_torch_device(torch, device_id)` in `qumat_qdp.loader`. It:
- returns `"cpu"` when CUDA isn't available,
- raises `ValueError` on out-of-range `device_id` (preserves prior
contract),
- checks `torch.cuda.get_device_capability(device_id)` against
`torch.cuda.get_arch_list()` and falls back to `"cpu"` with a clear
`warnings.warn` when the device's `sm_NN` isn't in the list,
- otherwise returns `f"cuda:{device_id}"`.
Both `qumat_qdp.loader.QuantumDataLoader._create_pytorch_iterator` and
`qumat_qdp.api.QdpBenchmark._run_throughput_pytorch` use the new helper
(the latter previously duplicated the same incomplete selection logic).
`testing/qdp_python/test_torch_ref.py` gets a mirror helper
`_torch_cuda_usable()` used by the two `@skipif`s that previously only
checked `is_available()`.
After this change, on an incompatible GPU:
- 8 pytorch-backend loader tests + 4 benchmark tests silently fall back
to CPU and pass (each emits one `UserWarning`),
- the 5 explicit GPU tests skip with `"CUDA not available or GPU
compute capability not supported by this PyTorch build"`.
Verified on Linux + GTX 1060 (sm_61) with PyTorch wheel targeting
sm_70+: tests that previously errored with
`cudaErrorNoKernelImageForDevice` now pass or skip cleanly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]