eric-haibin-lin opened a new issue #17332: Race condition in downloading model from model zoo in parallel URL: https://github.com/apache/incubator-mxnet/issues/17332 When i use horovod for training, and call `model = get_model(model_name, pretrained=True)` It complains with ``` Exception in thread Thread-5: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "tests/python/unittest/test_gluon_model_zoo.py", line 32, in fn model = get_model(model_name, pretrained=True, root='parallel_model/') File "/home/ubuntu/src/mxnet/python/mxnet/gluon/model_zoo/vision/__init__.py", line 152, in get_model return models[name](**kwargs) File "/home/ubuntu/src/mxnet/python/mxnet/gluon/model_zoo/vision/mobilenet.py", line 375, in mobilenet_v2_0_25 return get_mobilenet_v2(0.25, **kwargs) File "/home/ubuntu/src/mxnet/python/mxnet/gluon/model_zoo/vision/mobilenet.py", line 250, in get_mobilenet_v2 get_model_file('mobilenetv2_%s' % version_suffix, root=root), ctx=ctx) File "/home/ubuntu/src/mxnet/python/mxnet/gluon/model_zoo/model_store.py", line 115, in get_model_file os.remove(zip_file_path) FileNotFoundError: [Errno 2] No such file or directory: 'parallel_model/mobilenetv2_0.25-ae8f9392.zip' ``` The get_model API breaks if multiple processes are doing it at the same time.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services