kevnzhao opened a new issue, #21190:
URL: https://github.com/apache/mxnet/issues/21190
## Description
(A clear and concise description of what the bug is.)
CUDA Toolkit 12.x is released last month. This is a major version so there
are API breaking changes.
When building MXNET against CUDA 12.1, the build failed. Error message is
pasted in below section.
### Error Message
(Paste the complete error message. Please also include stack trace by
setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running
your script.)
```
src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h: In
member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const
std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const
mxnet::RunContext&, bool, bool)':
--
965 |
src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:197:62:
error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo*
{aka cudaGraphExecUpdateResultInfo_st*}' for argument '3' to 'cudaError_t
cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t,
cudaGraphExecUpdateResultInfo*)'
966 | &error_node, &update_result));
967 | ^
968 |
src/api/operator/numpy/../../../imperative/../executor/../common/cuda_utils.h:99:22:
note: in definition of macro 'CUDA_CALL'
969 | cudaError_t e = (func); \
970 | ^~~~
971 | In file included from
src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:34:0,
972 | from src/api/operator/numpy/../../../imperative/imperative_utils.h:29,
973 | from src/api/operator/numpy/../utils.h:34,
974 | from src/api/operator/numpy/np_tensordot_op.cc:24:
975 | src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:
In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const
std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const
mxnet::RunContext&, bool, bool)':
976 |
src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:197:62:
error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo*
{aka cudaGraphExecUpdateResultInfo_st*}' for argument '3' to 'cudaError_t
cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t,
cudaGraphExecUpdateResultInfo*)'
977 | &error_node, &update_result));
978 | ^
979 |
src/api/operator/numpy/../../../imperative/../executor/../common/cuda_utils.h:99:22:
note: in definition of macro 'CUDA_CALL'
980 | cudaError_t e = (func); \
981 | ^~~~
```
## To Reproduce
(If you developed your own code, please provide a short script that
reproduces the error. For existing examples, please provide link.)
### Steps to reproduce
(Paste the commands you ran that produced the error.)
1. Install CUDA Toolkit 12.1 in the build machine.
2. Build with below commands.
```
export mxnet_variant=CU${CUDA_VERSION}
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1
USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 ADD_CFLAGS=-I/usr/include/openblas
ADD_LDFLAGS=-L/usr/lib64/lib",
```
## What have you tried to solve it?
This looks like caused below API breaking change. Code changes are needed to
support CUDA 12.x.
<img width="1098" alt="image"
src="https://user-images.githubusercontent.com/120480682/227885222-3fc02d5e-1158-444c-a832-6dee6f6f429e.png">
## Environment
***We recommend using our script for collecting the diagnostic information
with the following command***
`curl --retry 10 -s
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
| python3`
<details>
<summary>Environment Information</summary>
```
# Paste the diagnose.py command output here
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]