[GitHub] sxjscience opened a new issue #10082: All workloads are pushed to the key queue in multi-processing DataLoader

GitBox Mon, 12 Mar 2018 18:22:39 -0700

sxjscience opened a new issue #10082: All workloads are pushed to the key queue 
in multi-processing DataLoader
URL: https://github.com/apache/incubator-mxnet/issues/10082
 
 
   In the current implementation of the multi-processing part of the 
DataLoader, all batches in the `batch_sampler` are pushed to the key queue. See 
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/dataloader.py#L218-L219
 . This will be inefficient and will result in an endless loop if the 
batch_sampler generates an infinite number of samples, e.g,
   ```python
   class InfiniteSampler(object):
       def __iter__(self):
           while(True):
               yield 1
   ```
   I find that pytorch will incrementally pre-fetch the new batches 
(http://pytorch.org/docs/master/_modules/torch/utils/data/dataloader.html#DataLoader).
 We'd better change the logic to be similar as pytorch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sxjscience opened a new issue #10082: All workloads are pushed to the key queue in multi-processing DataLoader

Reply via email to