[MXNet Forum] [Discussion] How to train with parquet files?

Mikkel F via MXNet Forum Thu, 23 Dec 2021 16:43:52 -0800


We use petastorm, make_batch_reader. It does not provide random access, we 
partly solve that by making the DataLoader sequential, and then having the 
petastorm reader shuffle row groups. For us that means we get 100-400 samples 
in the same order for each new row group, always, which is quite an issue. But 
it is the best solution we have found for now. We also suspect that this way of 
training on parquet is very I/O heavy, so we are struggling to increase GPU 
utilization.






---
[Visit 
Topic](https://discuss.mxnet.apache.org/t/how-to-train-with-parquet-files/6980/4)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.apache.org/email/unsubscribe/2a33e89aa930245621bfb2b0db6b3969b6539b8f3ab5081aaa0197178f3328dc).

[MXNet Forum] [Discussion] How to train with parquet files?

Reply via email to