But let's chat about the idea instead. What will it actually mean to 'apply
"AlphaZero methods" to backgammon.' ?

AlphaZero (and AlphaGo and Lc0 and SugaR NN) is just more or less the same
thing as reinforcement learning in backgammon. So, from my understanding,
it is rather AlphaZero, who has applied the backgammon methods. They are
both the chess and go variants trains with reinforcement learning pretty
much like the original GNU Backgammon, Jellyfish and Snowie. In Go they had
to make a move selection subroutine based on human play and then add MCTS
to train. Also the neural networks are deeper and more complex. The nn
inputs features are also so more complex and can to some extend resemble
convolutions known from convolutional neural network (And that the inputs
are not properly described in the high level articles.)

Apart from that, it is actually same thing: Reinforcement learning.

But how can we improve: We believe (at least I do) that the current state
of backgammon bots are so strong that it plays close to perfect in standard
positions. It is in uncommon and long term plan positions (like deep
backgames and snake rolling prime positions) bots still can improve. Let me
throw some ideas up in the air for discussion:

Can we make a RL algorithm that is so fast that it can learn on the fly?
Say we during play find a position where some indicator (that may be
another challenge) indicates that this is a position that requires long
term planning. If we then have the ability to RL train a neural net for
that specific position, that could be an huge improvement in my opinion.
(Lot's of details missing.)

And then, could the evaluations be improved if we specialize neural
networks in to specific position types, and then make a kind of nn
selection system based on k-means of the input features. I tried that many
years ago with only four classes. Those experiments showed that it's not
hopeless approach, and with faster computers it can easily create much more
than just four classes (fours was only the first number that popped into my
head those days)

Then next idea: What about huge scale distributed rollouts? Maybe we could
have a system like BOINQ to do rollouts on the fly? I'm not sure how this
should be used in a practical sense, and I'm not sure how hard it would be
to implement (with or without BOINQ framework) but I'm just kind of
brainstorming here.

-Øystein


On Wed, Dec 4, 2019 at 6:47 PM Joseph Heled <[email protected]> wrote:

> I was intentionally rude because I thought his original post was
> inappropriate.
>
> -Joseph
>
> On Thu, 5 Dec 2019 at 06:42, Ralph Corderoy <[email protected]> wrote:
> >
> > Hi Joseph,
> >
> > > I thought so.
> > >
> > > I had the same idea the day I heard they cracked go, but just saying
> > > something is a good idea is not helpful at all in my book.
> >
> > I think you're wrong.  And also a bit rude to boot.
> >
> > It's fine for Tim to suggest or ponder an idea to the list.  It may
> > encourage another subscriber, or draw out news of what a lurker has been
> > working on that's related.
> >
> > --
> > Cheers, Ralph.
> >
>
>

Reply via email to