We use petastorm, make_batch_reader. It does not provide random access, we 
partly solve that by making the DataLoader sequential, and then having the 
petastorm reader shuffle row groups. For us that means we get 100-400 samples 
in the same order for each new row group, always, which is quite an issue. But 
it is the best solution we have found for now. We also suspect that this way of 
training on parquet is very I/O heavy, so we are struggling to increase GPU 
utilization.





---
[Visit 
Topic](https://discuss.mxnet.apache.org/t/how-to-train-with-parquet-files/6980/4)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.apache.org/email/unsubscribe/2a33e89aa930245621bfb2b0db6b3969b6539b8f3ab5081aaa0197178f3328dc).

Reply via email to