ThomasDelteil commented on a change in pull request #10628: [MXNET-342] Fix the 
multi worker Dataloader
URL: https://github.com/apache/incubator-mxnet/pull/10628#discussion_r183161704
 
 

 ##########
 File path: python/mxnet/gluon/data/dataset.py
 ##########
 @@ -173,8 +173,15 @@ class RecordFileDataset(Dataset):
         Path to rec file.
     """
     def __init__(self, filename):
-        idx_file = os.path.splitext(filename)[0] + '.idx'
-        self._record = recordio.MXIndexedRecordIO(idx_file, filename, 'r')
+        self._filename = filename
+        self.reload_recordfile()
+
+    def reload_recordfile(self):
+        """
+        Reload the record file.
+        """
+        idx_file = os.path.splitext(self._filename)[0] + '.idx'
+        self._record = recordio.MXIndexedRecordIO(idx_file, self._filename, 
'r')
 
 Review comment:
   Ok digging a bit more, it seems that the `multiprocessing` package does not 
close file descriptors since it is simply calling `os.fork()`. I have updated 
the description of the PR to reflect the issue. tldr; a `file description` 
keeps track of the byte offset position it is in the file. When forking, all 
children processes get a duplicate of the original `file descriptor`, however 
they all refer to the same `file description` and when they try to move the 
current offset of the `file description` at the same time, they cause a crash.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to