sxjscience opened a new issue #10082: All workloads are pushed to the key queue in multi-processing DataLoader URL: https://github.com/apache/incubator-mxnet/issues/10082 In the current implementation of the multi-processing part of the DataLoader, all batches in the `batch_sampler` are pushed to the key queue. See https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/dataloader.py#L218-L219 . This will be inefficient and will result in an endless loop if the batch_sampler generates an infinite number of samples, e.g, ```python class InfiniteSampler(object): def __iter__(self): while(True): yield 1 ``` I find that pytorch will incrementally pre-fetch the new batches (http://pytorch.org/docs/master/_modules/torch/utils/data/dataloader.html#DataLoader). We'd better change the logic to be similar as pytorch.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services