Gian-Carlo Pascutto: <a0b16b16-6591-a195-1f93-93dbe8833...@sjeng.org>: >On 23-05-17 10:51, Hideki Kato wrote: >> (2) The number of possible positions (input of the value net) in >> real games is at least 10^30 (10^170 in theory). If the value >> net can recognize all? L&Ds depend on very small difference of >> the placement of stones or liberties. Can we provide necessary >> amount of training data? Have the network enough capacity? >> The answer is almost obvious by the theory of function >> approximation. (ANN is just a non-linear function >> approximator.) > >DCNN clearly have some ability to generalize from learned data and >perform OK even with unseen examples. So I don't find this a very >compelling argument. It's not like Monte Carlo playouts are going to >handle all sequences correctly either.
CNN can generalize if global shapes can be built from smaller local shapes. L&D of a large group is an exception because it's too sensitive for the detail of the position (ie, can be very global). We can't have much expects on such generalization in L&D. By our experiments, value net thinks a group is living if it has a large enough space. That's all. #Actually, this is an opposit. Value net thinks a group is dead if and only if it has short liberties. Some nakade shapes can be solved if outer libeties are almost filled. Additionally, value net frequently thinks false eyes as true, especially on the first lines. (This problem can also be very global and very hard to be solved with no search.) Value net itself cannot manage L&D correctly but allows so deeper search that this problem is hidden (ie, hard to be known). >Evaluations are heuristic guidance for the search, and a help when the >search terminates in an unresolved position. Having multiple independent >ones improves the accuracy of the heuristic - a basic ensemble. Value net approximates "true" value function of Go very coarsely. Rollouts (MC simulations) fill the detail. This could be a best ensemble. >>(3) CNN cannot learn exclusive-or function due to the ReLU >>activation function, instead of traditional sigmoid (tangent >> hyperbolic). CNN is good at approximating continuous (analog) >> functions but Boolean (digital) ones. > >Are you sure this is correct? Especially if we allow leaky ReLU? Do you know the success of "DEEP" CNN comes from the use of ReLU? Sigmoid easily vanishes gradient while ReLU not. However, ReLU cannot represent sharp edges while sigmoid can. DCNN (with ReLU) approximates functions in a piece-wise-linear style. Hideki ReLU) approximates functions in a piece-wise-linear style. Hideki -- Hideki Kato <mailto:hideki_ka...@ybb.ne.jp> _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go