jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-393196924 @reminisce @zheng-da We have resolved all the comments, would you help to check if you have further comments on the change? @marcoabreu @zheng-da We see a lot of Jenkins failure recently after submitting new change and most of the failure happens at CPP:GPU Unittest (see http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10433/51/pipeline/726/), we tried on our local GPU and everything is fine, and we tried to re-trigger the Jenkins and it can pass sometimes (not stable, sometimes need to re-trigger several times to pass Jenkins), I think our change should not impact CPP:GPU testing, would you help to check if this is an known issue for Jenkins system or MXNet base code? Or is there any way to debug the failure issue on Jenkins? Thanks. I copied the failure log as below for your reference: ``` [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: Stopping: NaiveEngine [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:135: Stopped: NaiveEngine Starting... [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:137: Started: NaiveEngine Done... [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: Stopping: ThreadedEnginePooled terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted /work/runtime_functions.sh: line 476: 7 Aborted (core dumped) build/tests/mxnet_unit_tests build.py: 2018-05-30 14:18:38,174 Running of command in container failed (134): nvidia-docker run --rm -t --shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v /home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp build.py: 2018-05-30 14:18:38,175 You can try to get into the container by using the following command: nvidia-docker run --rm -t --shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v /home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 -ti --entrypoint /bin/bash mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp into container: False Traceback (most recent call last): File "ci/build.py", line 307, in <module> sys.exit(main()) File "ci/build.py", line 243, in main container_run(platform, docker_binary, shared_memory_size, command) File "ci/build.py", line 154, in container_run raise subprocess.CalledProcessError(ret, cmd) subprocess.CalledProcessError: Command 'nvidia-docker run --rm -t --shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v /home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp' returned non-zero exit status 134 script returned exit code 1 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services