Hi,

I am probably not the only one who made this mistake: it is usually very
bad to use a power of 2 for the batch size!

Relevant documentation by NVIDIA:
https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html#quant-effects

The documentation is not extremely clear, so I figured out the formula:
N=int((n*(1<<14)*SM)/(H*W*C))

SM is the number of multiprocessors (80 for V100 or Titan V, 68 for RTX
2080 Ti).
n is an integer (usually n=1 is slightly worse than n>1).

So the efficient batch size is 63 for 9x9 Go on a V100 with 256-channel
layers. 53 on the RTX 2080 Ti.

There is my tweet with an empirical plot:
https://twitter.com/Remi_Coulom/status/1259188988646129665

I created a new CGOS account to play with this improvement. Probably not a
huge different in strength, but it is good to get such an improvement so
easily.

Rémi
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to