zachgk opened a new issue #16359: Flaky Scala Nightly Release Profiler
URL: https://github.com/apache/incubator-mxnet/issues/16359
 
 
   There is a flaky test on the Scala nightly Jenkins pipeline that 
occasionally causes it to fail. Sample failure:
   ```
   - Example CI: Test GAN MNIST
   
   [ScalaTest-main-running-DiscoverySuite] INFO 
org.apache.mxnetexamples.profiler.ProfilerSuite - Running profiler test...
   
   [ScalaTest-main-running-DiscoverySuite] INFO 
org.apache.mxnetexamples.profiler.ProfilerSuite - profile file save to /tmp
   
   terminate called after throwing an instance of 'dmlc::Error'
   
     what():  [20:31:17] src/c_api/c_api_profile.cc:141: Check failed: 
!thread_profiling_data.calls_.empty(): 
   
   Stack trace:
   
     [bt] (0) /tmp/mxnet6726847146594737253/libmxnet.so(+0x49240b) 
[0x7feab739740b]
   
     [bt] (1) 
/tmp/mxnet6726847146594737253/libmxnet.so(mxnet::on_exit_api()+0x38a) 
[0x7feab947e7ea]
   
     [bt] (2) /tmp/mxnet6726847146594737253/libmxnet.so(MXExecutorFree+0x27) 
[0x7feab9451af7]
   
     [bt] (3) [0x7feb71018407]
   
   
   
   
   
   Aborted (core dumped)
   ```
   
   See pipeline at 
http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-publish-artifacts/job/master
 and a sample failure at 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-publish-artifacts/detail/master/121/pipeline/.
 The failing test suite was run other times. The same commit was used to build 
for CPU and did not show errors. The same actual build (binary, tests, jar) was 
also run on both centos7 and ubuntu18.04 without issues. So, it seems to be the 
result of something flaky while executing the code and rare.
   
   After some initial investigation by @samskalicky and I, the test that ran 
seems to be the scala Profiler Suite 
(https://github.com/apache/incubator-mxnet/blob/master/scala-package/examples/src/test/scala/org/apache/mxnetexamples/profiler/ProfilerSuite.scala#L33).
 It will set the profiler to running and targeting a temp file, run through a 
number of tests, and then stop the profiler. It does not seem like any of the 
tests were run before this error occurred, so it should probably be when 
starting the profiler. It runs the Scala method 
https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/Profiler.scala#L46
 calling the JNI method 
https://github.com/apache/incubator-mxnet/blob/master/scala-package/native/src/main/native/org_apache_mxnet_native_c_api.cc#L2699
 which calls `MXSetProfilerState(1)` in the engine.
   
   @samskalicky:
   It looks like it fails at this line:
   src/c_api/c_api_profile.cc:141: Check failed: 
!thread_profiling_data.calls_.empty()
   
   heres the relevant code:
   
https://github.com/apache/incubator-mxnet/blob/f5ba7358d7ff0629f48445cf9dc1ce7fe2fd8e84/src/c_api/c_api_profile.cc#L130-L141
 
   
   so looks like we push data on line 130 and do the check if theres any data 
on 141.
   
   on_enter_api is called at the beginning of some API
   on_exit_api is called when that same API exits
   
   Is it possible that Scala is setting the profiling option while some things 
are already running? so that when an API is called profiling is disabled, but 
when it exits its enabled?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to