ThomasDelteil commented on a change in pull request #10628: [MXNET-342] Fix the 
multi worker Dataloader
URL: https://github.com/apache/incubator-mxnet/pull/10628#discussion_r183161704
 
 

 ##########
 File path: python/mxnet/gluon/data/dataset.py
 ##########
 @@ -173,8 +173,15 @@ class RecordFileDataset(Dataset):
         Path to rec file.
     """
     def __init__(self, filename):
-        idx_file = os.path.splitext(filename)[0] + '.idx'
-        self._record = recordio.MXIndexedRecordIO(idx_file, filename, 'r')
+        self._filename = filename
+        self.reload_recordfile()
+
+    def reload_recordfile(self):
+        """
+        Reload the record file.
+        """
+        idx_file = os.path.splitext(self._filename)[0] + '.idx'
+        self._record = recordio.MXIndexedRecordIO(idx_file, self._filename, 
'r')
 
 Review comment:
   Ok digging a bit more, it seems that the `multiprocessing` package does not 
close file descriptors since it is simply calling `os.fork()`. I have updated 
the description of the PR to reflect the issue. tldr; a `file description` 
keeps track of the position it are in the file. When forking, all processes get 
a duplicate of the original `file descriptor` referring to the same `file 
description` and try to move the current offset of the `file description` at 
the same time, causing a crash.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to