barry-jin opened a new issue #19420:
URL: https://github.com/apache/incubator-mxnet/issues/19420


   ## Description
   
   1. Run GluonNLP [full suite of 
tests](https://github.com/dmlc/gluon-nlp/tree/master/tests) with `pytest` on 
`mxnet-cu102==2.0.0b20201022` will introduce threading error (see Error 
Message). 
   2. But run full suite of tests on `mxnet-cu102==2.0.0b20201016` will not 
introduce this error. 
   3. Also, run these tests separately will not introduce this error. 
   
   ### Error Message
   <details>
   <summary>Run GluonNLP pytest on `mxnet-cu102==2.0.0b20201022`</summary>
   
   ```
   [2020-10-22T21:15:51.430Z] ============================= test session starts 
==============================
   [2020-10-22T21:15:51.430Z] platform linux -- Python 3.6.9, pytest-6.1.1, 
py-1.9.0, pluggy-0.13.1
   [2020-10-22T21:15:51.432Z] rootdir: /workspace/gluon-nlp, configfile: 
pytest.ini
   [2020-10-22T21:15:51.432Z] plugins: cov-2.10.1
   [2020-10-22T21:15:52.426Z] collected 1283 items
   [2020-10-22T21:16:01.630Z] tests/test_attention_cell.py 
........................................... [  3%]
   [2020-10-22T21:16:06.668Z] 
......................................................................   [  8%]
   [2020-10-22T21:16:06.796Z] tests/test_data_batchify.py 
............................................ [ 12%]
   [2020-10-22T21:16:21.672Z] .................................                 
                       [ 14%]
   [2020-10-22T21:16:30.051Z] tests/test_data_filtering.py .....                
                       [ 15%]
   [2020-10-22T21:16:36.895Z] tests/test_data_loading.py .                      
                       [ 15%]
   [2020-10-22T21:16:37.213Z] tests/test_data_sampler.py 
............................................. [ 18%]
   [2020-10-22T21:16:38.566Z] 
........................................................................ [ 24%]
   [2020-10-22T21:16:40.003Z] 
........................................................................ [ 30%]
   [2020-10-22T21:16:40.579Z] 
........................................................................ [ 35%]
   [2020-10-22T21:16:41.143Z] 
........................................................................ [ 41%]
   [2020-10-22T21:16:42.040Z] 
........................................................................ [ 46%]
   [2020-10-22T21:16:42.299Z] ...............                                   
                       [ 48%]
   [2020-10-22T21:18:34.088Z] tests/test_data_tokenizers.py ..............      
                       [ 49%]
   [2020-10-22T21:18:34.095Z] tests/test_data_vocab.py .                        
                       [ 49%]
   [2020-10-22T21:22:22.268Z] tests/test_embedding.py ..                        
                       [ 49%]
   [2020-10-22T21:22:59.289Z] tests/test_gluon_block.py .....                   
                       [ 49%]
   [2020-10-22T21:22:59.328Z] tests/test_initializer.py ...                     
                       [ 49%]
   [2020-10-22T21:23:00.225Z] tests/test_layers.py ...........................  
                       [ 52%]
   [2020-10-22T21:23:00.312Z] tests/test_loss.py ........................       
                       [ 53%]
   [2020-10-22T21:37:39.851Z] tests/test_models.py 
................................................    [ 57%]
   [2020-10-22T21:38:46.438Z] tests/test_models_albert.py .................     
                       [ 59%]
   [2020-10-22T21:39:38.599Z] tests/test_models_bart.py ......                  
                       [ 59%]
   [2020-10-22T21:44:18.743Z] tests/test_models_bert.py ............            
                       [ 60%]
   [2020-10-22T21:46:00.142Z] tests/test_models_electra.py ........             
                       [ 61%]
   [2020-10-22T21:49:47.086Z] tests/test_models_gpt2.py .......F                
                       [ 61%]
   [2020-10-22T21:49:57.226Z] tests/test_models_mobilebert.py .....             
                       [ 62%]
   [2020-10-22T21:51:27.552Z] tests/test_models_roberta.py ....FF               
                       [ 62%]
   [2020-10-22T21:52:10.783Z] tests/test_models_transformer.py 
....................................... [ 65%]
   [2020-10-22T21:53:33.876Z] 
........................................................................ [ 71%]
   [2020-10-22T21:54:26.540Z] ..........................................FFFFF   
                       [ 74%]
   [2020-10-22T21:54:34.975Z] tests/test_models_transformer_xl.py ......        
                       [ 75%]
   [2020-10-22T21:55:47.820Z] tests/test_models_xlmr.py .FF                     
                       [ 75%]
   [2020-10-22T21:55:48.122Z] tests/test_op.py 
....................................................... [ 79%]
   [2020-10-22T21:55:48.754Z] 
........................................................................ [ 85%]
   [2020-10-22T21:55:49.195Z] ....                                              
                       [ 85%]
   [2020-10-22T21:56:20.712Z] tests/test_optimizer.py .                         
                       [ 85%]
   [2020-10-22T21:56:20.716Z] tests/test_pytest.py .                            
                       [ 85%]
   [2020-10-22T21:56:21.005Z] tests/test_sequence_sampler.py 
......................................... [ 89%]
   [2020-10-22T21:56:21.522Z] 
........................................................................ [ 94%]
   [2020-10-22T21:56:33.345Z] .......................................           
                       [ 97%]
   [2020-10-22T21:56:33.590Z] Fatal Python error: Aborted
   [2020-10-22T21:56:33.590Z] Thread 0x00007f92b9fff700 (most recent call 
first):
   [2020-10-22T21:56:33.590Z]   File "/usr/lib/python3.6/threading.py", line 
299 in wait
   [2020-10-22T21:56:33.590Z]   File "/usr/lib/python3.6/threading.py", line 
551 in wait
   [2020-10-22T21:56:33.590Z]   File 
"/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 59 in run
   [2020-10-22T21:56:33.590Z]   File "/usr/lib/python3.6/threading.py", line 
916 in _bootstrap_inner
   [2020-10-22T21:56:33.590Z]   File "/usr/lib/python3.6/threading.py", line 
884 in _bootstrap
   [2020-10-22T21:56:33.590Z] Current thread 0x00007f9457153740 (most recent 
call first):
   [2020-10-22T21:56:33.590Z]   File 
"/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66 in _launch
   [2020-10-22T21:56:33.590Z]   File 
"/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19 in __init__
   [2020-10-22T21:56:33.590Z]   File 
"/usr/lib/python3.6/multiprocessing/context.py", line 277 in _Popen
   [2020-10-22T21:56:33.590Z]   File 
"/usr/lib/python3.6/multiprocessing/process.py", line 105 in start
   [2020-10-22T21:56:33.590Z]   File 
"/usr/lib/python3.6/multiprocessing/pool.py", line 239 in _repopulate_pool
   [2020-10-22T21:56:33.591Z]   File 
"/usr/lib/python3.6/multiprocessing/pool.py", line 174 in __init__
   [2020-10-22T21:56:33.591Z]   File 
"/usr/lib/python3.6/multiprocessing/context.py", line 119 in Pool
   [2020-10-22T21:56:33.591Z]   File 
"/workspace/gluon-nlp/tests/test_utils_misc.py", line 87 in verify_download
   [2020-10-22T21:56:33.591Z]   File 
"/workspace/gluon-nlp/tests/test_utils_misc.py", line 102 in test_download_s3
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/python.py", line 184 in 
pytest_pyfunc_call
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in 
_multicall
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in 
<lambda>
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in 
_hookexec
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/python.py", line 1627 in 
runtest
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 163 in 
pytest_runtest_call
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in 
_multicall
   [2020-10-22T21:56:33.591Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in 
<lambda>
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in 
_hookexec
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 256 in 
<lambda>
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 310 in 
from_call
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 256 in 
call_runtest_hook
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 216 in 
call_and_report
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 127 in 
runtestprotocol
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 110 in 
pytest_runtest_protocol
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in 
_multicall
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in 
<lambda>
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in 
_hookexec
   [2020-10-22T21:56:33.592Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 338 in 
pytest_runtestloop
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in 
_multicall
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in 
<lambda>
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in 
_hookexec
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 313 in _main
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 257 in 
wrap_session
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 306 in 
pytest_cmdline_main
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in 
_multicall
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in 
<lambda>
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in 
_hookexec
   [2020-10-22T21:56:33.593Z]   File 
"/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-10-22T21:56:33.594Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/config/__init__.py", line 165 
in main
   [2020-10-22T21:56:33.594Z]   File 
"/root/.local/lib/python3.6/site-packages/_pytest/config/__init__.py", line 187 
in console_main
   [2020-10-22T21:56:33.594Z]   File 
"/root/.local/lib/python3.6/site-packages/pytest/__main__.py", line 5 in 
<module>
   [2020-10-22T21:56:33.594Z]   File "/usr/lib/python3.6/runpy.py", line 85 in 
_run_code
   [2020-10-22T21:56:33.594Z]   File "/usr/lib/python3.6/runpy.py", line 193 in 
_run_module_as_main
   [2020-10-22T22:00:07.664Z] ./gluon_nlp_job.sh: line 39:    44 Aborted        
         (core dumped) /bin/bash -o pipefail -c "$COMMAND"
   ```
   
   </details>
   
   ## To Reproduce
   ```
   Compute Environment: 
   Instance type: g4dn.4x
   vCPUs: 16 
   
   $ python3 -m pip install -U --quiet --pre "mxnet-cu102==2.0.0b20201022" -f 
https://dist.mxnet.io/python
   $ git remote set-url origin https://github.com/dmlc/gluon-nlp
   $ git fetch origin master:working
   $ git checkout working
   $ python3 -m pip install --quiet -e .[extras]
   $ python3 -m pytest --cov=. --cov-config=./.coveragerc --cov-report=xml 
--durations=50 --device="gpu" --runslow ./tests/
   ```
   
   ## What have you tried to solve it?
   
   Some observations: 
   
   1. The failed tests all use `mx.npx.waitall()`
   2. The test failed on `multiprocessing.Pool()`
   
   ## Environment
   
   ***We recommend using our script for collecting the diagnostic information 
with the following command***
   `curl --retry 10 -s 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
 | python3`
   
   <details>
   <summary>Environment Information</summary>
   
   ```
   Instance type: g4dn.4x
   MXNet version: mxnet-cu102==2.0.0b20201022
   python version: 3.6.9
   CUDNN_VERSION: 7.6.5.32
   CUDA_VERSION: 10.2.89
   NCCL_VERSION: 2.7.8
   ```
   
   </details>
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to