The main difference, if I understand correctly (and I know very little here) is to bootstrap from the ground. That is, no pre-computed inputs. and let the network figure it out by self play.
We have a great test case in that we can start with just racing. That said, I think we will need a net for each match score, since cubeless -> cubeful is where things get messy. Also, given that 0-ply rollouts are relatively fast, when playing against a human - if you can wait a second or two, you can play using cubeful 0-ply. Testing how good this is will be problematic. -Joseph On Thu, 5 Dec 2019 at 09:23, Øystein Schønning-Johansen <[email protected]> wrote: > But let's chat about the idea instead. What will it actually mean to > 'apply "AlphaZero methods" to backgammon.' ? > > AlphaZero (and AlphaGo and Lc0 and SugaR NN) is just more or less the same > thing as reinforcement learning in backgammon. So, from my understanding, > it is rather AlphaZero, who has applied the backgammon methods. They are > both the chess and go variants trains with reinforcement learning pretty > much like the original GNU Backgammon, Jellyfish and Snowie. In Go they had > to make a move selection subroutine based on human play and then add MCTS > to train. Also the neural networks are deeper and more complex. The nn > inputs features are also so more complex and can to some extend resemble > convolutions known from convolutional neural network (And that the inputs > are not properly described in the high level articles.) > > Apart from that, it is actually same thing: Reinforcement learning. > > But how can we improve: We believe (at least I do) that the current state > of backgammon bots are so strong that it plays close to perfect in standard > positions. It is in uncommon and long term plan positions (like deep > backgames and snake rolling prime positions) bots still can improve. Let me > throw some ideas up in the air for discussion: > > Can we make a RL algorithm that is so fast that it can learn on the fly? > Say we during play find a position where some indicator (that may be > another challenge) indicates that this is a position that requires long > term planning. If we then have the ability to RL train a neural net for > that specific position, that could be an huge improvement in my opinion. > (Lot's of details missing.) > > And then, could the evaluations be improved if we specialize neural > networks in to specific position types, and then make a kind of nn > selection system based on k-means of the input features. I tried that many > years ago with only four classes. Those experiments showed that it's not > hopeless approach, and with faster computers it can easily create much more > than just four classes (fours was only the first number that popped into my > head those days) > > Then next idea: What about huge scale distributed rollouts? Maybe we could > have a system like BOINQ to do rollouts on the fly? I'm not sure how this > should be used in a practical sense, and I'm not sure how hard it would be > to implement (with or without BOINQ framework) but I'm just kind of > brainstorming here. > > -Øystein > > > On Wed, Dec 4, 2019 at 6:47 PM Joseph Heled <[email protected]> wrote: > >> I was intentionally rude because I thought his original post was >> inappropriate. >> >> -Joseph >> >> On Thu, 5 Dec 2019 at 06:42, Ralph Corderoy <[email protected]> >> wrote: >> > >> > Hi Joseph, >> > >> > > I thought so. >> > > >> > > I had the same idea the day I heard they cracked go, but just saying >> > > something is a good idea is not helpful at all in my book. >> > >> > I think you're wrong. And also a bit rude to boot. >> > >> > It's fine for Tim to suggest or ponder an idea to the list. It may >> > encourage another subscriber, or draw out news of what a lurker has been >> > working on that's related. >> > >> > -- >> > Cheers, Ralph. >> > >> >>
