Re: [apache/incubator-mxnet] [RFC] Use TVMOp with GPU & Build without libcuda.so in CI (#18716)

Leonard Lausen Wed, 15 Jul 2020 09:11:28 -0700

> Violates the effort of removing libcuda.so totally, (would be great if 
> someone can elaborate the motivation behind it).


Many customers use a single mxnet build that supports gpu features and deploy 
it to both gpu and cpu machines. Due to the way how cuda containers are 
designed, libcuda.so won't be present on the cpu machines. That's why it's 
better to dlopen(cuda) only once needed. This not only affects tvmop but als 
nvrtc feature in mxnet.

Using the stubs is a workaround for using dlopen, but adds additional 
requirements for modifying the LD_LIBRARY_PATH on users cpu machines. That's 
not always feasible for users and for mxnet 1.6, which introduced nvrtc, users 
typically just disable the nvrtc feature to be able to deploy the libmxnet.so 
to both cpu and gpu machines. 

Why not fix the underlying problem and then enable tvmop feature?

> Also, When setting -DUSE_TVM_OP=OFF the CI checks would be stuck. 

That doesn't make sense as we are running CI successfully with tvm op disabled 
since a couple of months? Maybe you ran into some unrelated flakyness and need 
to retrigger the run? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18716#issuecomment-658846227

Re: [apache/incubator-mxnet] [RFC] Use TVMOp with GPU & Build without libcuda.so in CI (#18716)

Reply via email to