It is my understanding that data parallelism within a group should split the batch evenly among the workers in the group. However, I noticed that each worker is loading the exact same records. For example, consider a batch size of 10, two workers in a group, and partition dimension of 0 (batch dimension) on the network. I would expect the first and second worker to be given records 0-4 and 5-9 respectively. Instead, this is resulting in both workers loading a copy of records 0-4.
If this is intended, it would be great if someone could clear up why data parallelism configuration is causing multiple workers in a group to have the same records. Regards, Richard Platania
