We use petastorm, make_batch_reader. It does not provide random access, we partly solve that by making the DataLoader sequential, and then having the petastorm reader shuffle row groups. For us that means we get 100-400 samples in the same order for each new row group, always, which is quite an issue. But it is the best solution we have found for now. We also suspect that this way of training on parquet is very I/O heavy, so we are struggling to increase GPU utilization.
--- [Visit Topic](https://discuss.mxnet.apache.org/t/how-to-train-with-parquet-files/6980/4) or reply to this email to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.mxnet.apache.org/email/unsubscribe/2a33e89aa930245621bfb2b0db6b3969b6539b8f3ab5081aaa0197178f3328dc).
