ThomasDelteil commented on a change in pull request #10628: [MXNET-342] Fix the 
multi worker Dataloader
URL: https://github.com/apache/incubator-mxnet/pull/10628#discussion_r183161704
 
 

 ##########
 File path: python/mxnet/gluon/data/dataset.py
 ##########
 @@ -173,8 +173,15 @@ class RecordFileDataset(Dataset):
         Path to rec file.
     """
     def __init__(self, filename):
-        idx_file = os.path.splitext(filename)[0] + '.idx'
-        self._record = recordio.MXIndexedRecordIO(idx_file, filename, 'r')
+        self._filename = filename
+        self.reload_recordfile()
+
+    def reload_recordfile(self):
+        """
+        Reload the record file.
+        """
+        idx_file = os.path.splitext(self._filename)[0] + '.idx'
+        self._record = recordio.MXIndexedRecordIO(idx_file, self._filename, 
'r')
 
 Review comment:
   Ok digging a bit more, it seems that the `multiprocessing` package does not 
close file descriptors since it is simply calling `os.fork()`. I have updated 
the description of the PR to reflect the issue. tldr: file descriptors keep 
track of the position they are in the file. When forking, all process refer to 
the same file descriptor and try to move the current position at the same time, 
causing a crash.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to