Hi, I am probably not the only one who made this mistake: it is usually very bad to use a power of 2 for the batch size!
Relevant documentation by NVIDIA: https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html#quant-effects The documentation is not extremely clear, so I figured out the formula: N=int((n*(1<<14)*SM)/(H*W*C)) SM is the number of multiprocessors (80 for V100 or Titan V, 68 for RTX 2080 Ti). n is an integer (usually n=1 is slightly worse than n>1). So the efficient batch size is 63 for 9x9 Go on a V100 with 256-channel layers. 53 on the RTX 2080 Ti. There is my tweet with an empirical plot: https://twitter.com/Remi_Coulom/status/1259188988646129665 I created a new CGOS account to play with this improvement. Probably not a huge different in strength, but it is good to get such an improvement so easily. Rémi
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go