The main difference, if I understand correctly (and I know very little
here) is to bootstrap from the ground. That is, no pre-computed inputs. and
let the network figure it out by self play.

We have a great test case in that we can start with just racing.

That said, I think we will need a net for each match score, since cubeless
-> cubeful is where things get messy.

Also, given that 0-ply rollouts are relatively fast, when playing against a
human - if you can wait a second or two, you can play using cubeful 0-ply.
Testing how good this is will be problematic.

-Joseph


On Thu, 5 Dec 2019 at 09:23, Øystein Schønning-Johansen <[email protected]>
wrote:

> But let's chat about the idea instead. What will it actually mean to
> 'apply "AlphaZero methods" to backgammon.' ?
>
> AlphaZero (and AlphaGo and Lc0 and SugaR NN) is just more or less the same
> thing as reinforcement learning in backgammon. So, from my understanding,
> it is rather AlphaZero, who has applied the backgammon methods. They are
> both the chess and go variants trains with reinforcement learning pretty
> much like the original GNU Backgammon, Jellyfish and Snowie. In Go they had
> to make a move selection subroutine based on human play and then add MCTS
> to train. Also the neural networks are deeper and more complex. The nn
> inputs features are also so more complex and can to some extend resemble
> convolutions known from convolutional neural network (And that the inputs
> are not properly described in the high level articles.)
>
> Apart from that, it is actually same thing: Reinforcement learning.
>
> But how can we improve: We believe (at least I do) that the current state
> of backgammon bots are so strong that it plays close to perfect in standard
> positions. It is in uncommon and long term plan positions (like deep
> backgames and snake rolling prime positions) bots still can improve. Let me
> throw some ideas up in the air for discussion:
>
> Can we make a RL algorithm that is so fast that it can learn on the fly?
> Say we during play find a position where some indicator (that may be
> another challenge) indicates that this is a position that requires long
> term planning. If we then have the ability to RL train a neural net for
> that specific position, that could be an huge improvement in my opinion.
> (Lot's of details missing.)
>
> And then, could the evaluations be improved if we specialize neural
> networks in to specific position types, and then make a kind of nn
> selection system based on k-means of the input features. I tried that many
> years ago with only four classes. Those experiments showed that it's not
> hopeless approach, and with faster computers it can easily create much more
> than just four classes (fours was only the first number that popped into my
> head those days)
>
> Then next idea: What about huge scale distributed rollouts? Maybe we could
> have a system like BOINQ to do rollouts on the fly? I'm not sure how this
> should be used in a practical sense, and I'm not sure how hard it would be
> to implement (with or without BOINQ framework) but I'm just kind of
> brainstorming here.
>
> -Øystein
>
>
> On Wed, Dec 4, 2019 at 6:47 PM Joseph Heled <[email protected]> wrote:
>
>> I was intentionally rude because I thought his original post was
>> inappropriate.
>>
>> -Joseph
>>
>> On Thu, 5 Dec 2019 at 06:42, Ralph Corderoy <[email protected]>
>> wrote:
>> >
>> > Hi Joseph,
>> >
>> > > I thought so.
>> > >
>> > > I had the same idea the day I heard they cracked go, but just saying
>> > > something is a good idea is not helpful at all in my book.
>> >
>> > I think you're wrong.  And also a bit rude to boot.
>> >
>> > It's fine for Tim to suggest or ponder an idea to the list.  It may
>> > encourage another subscriber, or draw out news of what a lurker has been
>> > working on that's related.
>> >
>> > --
>> > Cheers, Ralph.
>> >
>>
>>

Reply via email to