Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2018-01-03 Thread Adrian Petrescu

On 12/19/2017 12:25 PM, Marc Landgraf wrote:

There is not much to achieve there though.

It is expected that an AI will be able to outplay a Human opponent simply
on micro tricks. Perfect single unit micromanagment across the entire map
can easily gain a large enough edge, that the strategic decision making
with imperfect information doesn't have to reach too high levels.
So most likely any win the AI will achieve will be discredited by the Human
players due to this.


They've gone to great lengths to mitigate this exact criticism. The 
output actions are limited to human-like APM levels, and they aren't 
omnipresent - the AI has to pan the screen, click the cursor, etc., the 
same as a human would, at human-like speeds. You should read the paper - 
they want to make sure the advantage comes purely from the quality of 
the decision-making.



--
  Adrian

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-19 Thread Marc Landgraf
There is not much to achieve there though.

It is expected that an AI will be able to outplay a Human opponent simply
on micro tricks. Perfect single unit micromanagment across the entire map
can easily gain a large enough edge, that the strategic decision making
with imperfect information doesn't have to reach too high levels.
So most likely any win the AI will achieve will be discredited by the Human
players due to this.

2017-12-19 16:26 GMT+01:00 Andy :

> Google has already announced their next step -- Starcraft2. But so far the
> results they published aren't mind blowing like these.
>
>
> 2017-12-19 9:15 GMT-06:00 Fidel Santiago :
>
>> Hello,
>>
>> I was thinking about this development and what it may mean from the point
>> of view of a more general AI. I daresay the next experiment would be to
>> have just one neural net playing the three games, right? To my
>> understanding we still have three instances of the same *methodology*
>> but not yet a single one playing different games.
>>
>> Best regards,
>>
>> Fidel Santiago.
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-19 Thread Roel van Engelen
>I was thinking about this development and what it may mean from the point
of view of a more general AI.
>I daresay the next experiment would be to have just one neural net playing
the >three games, right?
>To my understanding we still have three instances of the same *methodology* but
not yet a single one playing different games.

Deepmind did some research on that topic with the atari games:
https://deepmind.com/blog/enabling-continual-learning-in-neural-networks/

and yes what you describe would be a more general AI but it would be more
interesting to include all 48 atari games
from previous research as well, although i suspect a real general AI will
be developed from a different line of research

as for what Deepmind will be researching, always a guess but i think we
will hear more about tabula rasa stuff since several
real world problems like the salesman problem
 have been adapted to be
solved with mcts.
But we will have to wait for their next paper/blog to know for sure.

On 19 December 2017 at 16:15, Fidel Santiago  wrote:

> Hello,
>
> I was thinking about this development and what it may mean from the point
> of view of a more general AI. I daresay the next experiment would be to
> have just one neural net playing the three games, right? To my
> understanding we still have three instances of the same *methodology* but
> not yet a single one playing different games.
>
> Best regards,
>
> Fidel Santiago.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-19 Thread Andy
Google has already announced their next step -- Starcraft2. But so far the
results they published aren't mind blowing like these.


2017-12-19 9:15 GMT-06:00 Fidel Santiago :

> Hello,
>
> I was thinking about this development and what it may mean from the point
> of view of a more general AI. I daresay the next experiment would be to
> have just one neural net playing the three games, right? To my
> understanding we still have three instances of the same *methodology* but
> not yet a single one playing different games.
>
> Best regards,
>
> Fidel Santiago.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-19 Thread Fidel Santiago
Hello,

I was thinking about this development and what it may mean from the point
of view of a more general AI. I daresay the next experiment would be to
have just one neural net playing the three games, right? To my
understanding we still have three instances of the same *methodology* but
not yet a single one playing different games.

Best regards,

Fidel Santiago.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
AZ scalability looks good in that diagram, and it is certainly a good start, 
but it only goes out through 10 sec/move. Also, if the hardware is 7x better 
for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the 
curve for SF by 7x? Or some combination? Or take the data at face value?

I just noticed that AZ has some losses when the opening was forced into 
specific variations as in Table 2. So we know that AZ is not perfect, but 19 
losses in 1200 games is hard to extrapolate. (Curious: SF was a net winner over 
AZ with White in a B40 Sicilian, the only position/color combination out of 24 
in which SF had an edge.)

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Rémi Coulom
Sent: Thursday, December 7, 2017 11:51 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

>My concern about many of these points of comparison is that they presume how 
>AZ scales. In the absence of data, I would guess that AZ gains much less from 
>hardware than SF. I am basing this guess on >two known facts. First is that AZ 
>did not lose a game, so the upper bound on its strength is perfection. Second, 
>AZ is a knowledge intensive program, so it is counting on judgement to a 
>larger degree.

Doesn't Figure 2 in the paper indicate convincingly that AZ scales better than 
Stockfish?

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Ingo Althöfer
Hi Jim,


> In 2002/Nov, I created a Go adaptation, Abchij, which 
> I think might not be easily conquered by these 
> algorithms. It's funny, I did so in anticipation of 
> thwarting any sort of brute force algorithms that might 
> emerge to "solve" Go as I hated how those were the 
> solution to Chess. 
>
> If you are interested, I would be happy to post the 
> rules for the game.

I can only write for myself. But I would definitely
like to see your rules.

Namaste,
Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Jim O'Flaherty
In 2002/Nov, I created a Go adaptation, Abchij, which I think might not be
easily conquered by these algorithms. It's funny, I did so in anticipation
of thwarting any sort of brute force algorithms that might emerge to
"solve" Go as I hated how those were the solution to Chess. If you are
interested, I would be happy to post the rules for the game.


Namaste,

Jim O'Flaherty
Founder/CEO
Precision Location Intelligence, Inc.
 • Irving, TX, USA
469-358-0633 <4693580633> • jim.oflaherty...@gmail.com •
www.linkedin.com/in/jimoflahertyjr

CONFIDENTIALITY / PROPRIETARY NOTICE:
The information contained in this e-mail, including any attachment(s), is
confidential information for Precision Location Intelligence, Inc.. As
such, it may be privileged and exempt from disclosure under applicable law.
If the reader of this message is not the intended recipient, or if you
received this message in error, then any direct or indirect disclosure,
distribution or copying of this message is strictly prohibited. If you have
received this message in error, please notify Precision Location
Intelligence, Inc. by calling (214) 489-7779 <2144897779> immediately and
by sending a return e-mail; delete this message; and destroy all copies,
including attachments.

On Wed, Dec 6, 2017 at 8:28 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> It seems, we are living in extremely
> heavy times ...
>
> I want to go to bed now and meditate for threee days.
>
> > DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
> > Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> > https://arxiv.org/pdf/1712.01815.pdf
> >
> > AlphaZero(Chess) outperformed Stockfish after 4 hours,
> > AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> It may sound strange, but at the moment my only hopes for
> games too difficult for AlphaZero might be
>
> * a connection game like Hex (on 19x19 board)
>
> * a game like Clobber (based on CGT)
>
> Mastering Clobber would mean that also the concept of
> combinatorial game theory would be "easily" learnable.
>
>
> Side question: Would the classic Nim game be
> a trivial nut for AlphaZero ?
>
> Ingo (is now starting to hope for an AlphaZero type program
> that can do "general" mathematical research).
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Rémi Coulom
>My concern about many of these points of comparison is that they presume how 
>AZ scales. In the absence of data, I would guess that AZ gains much less from 
>hardware than SF. I am basing this guess on >two known facts. First is that AZ 
>did not lose a game, so the upper bound on its strength is perfection. Second, 
>AZ is a knowledge intensive program, so it is counting on judgement to a 
>larger degree.

Doesn't Figure 2 in the paper indicate convincingly that AZ scales better than 
Stockfish?

Rémi
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
> Which is IMHO missing the point a bit ;-)

I saw it the same way, while conceding that the facts are accurate.

It makes sense for SF to internalize the details before making decisions. At 
some point there will be a realization that AZ is a fundamental change.


>What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
>search? In a game commonly considered less tactical?

That is a common perception, especially among those who have never debugged a 
Go program. :-)

I was coming at it from the other direction, reasoning that since SF and AZ are 
close to perfect at chess, then there is less to gain from speed. (Whereas I 
doubt that AGZ is close to perfect at Go.)

All of this is subject to my money back guarantee: my opinions are guaranteed 
wrong, or your money back. :-)


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 8:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 7/12/2017 13:20, Brian Sheppard via Computer-go wrote:
> The conversation on Stockfish's mailing list focused on how the match 
> was imbalanced.

Which is IMHO missing the point a bit ;-)

> My concern about many of these points of comparison is that they 
> presume how AZ scales. In the absence of data, I would guess that AZ 
> gains much less from hardware than SF. I am basing this guess on two 
> known facts. First is that AZ did not lose a game, so the upper bound 
> on its strength is perfection. Second, AZ is a knowledge intensive 
> program, so it is counting on judgement to a larger degree.

What about the data point that AlphaGo Zero gained 2100 Elo from its tree 
search? In a game commonly considered less tactical?

--
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 7/12/2017 13:20, Brian Sheppard via Computer-go wrote:
> The conversation on Stockfish's mailing list focused on how the
> match was imbalanced.

Which is IMHO missing the point a bit ;-)

> My concern about many of these points of comparison is that they 
> presume how AZ scales. In the absence of data, I would guess that AZ 
> gains much less from hardware than SF. I am basing this guess on two 
> known facts. First is that AZ did not lose a game, so the upper
> bound on its strength is perfection. Second, AZ is a knowledge
> intensive program, so it is counting on judgement to a larger
> degree.

What about the data point that AlphaGo Zero gained 2100 Elo from its
tree search? In a game commonly considered less tactical?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
The conversation on Stockfish's mailing list focused on how the match was 
imbalanced.

- AZ's TPU hardware was estimated at several times (7 times?) the computational 
power of Stockfish's.
- Stockfish's transposition table size (1 GB) was considered much too small for 
a 64 core machine.
- Stockfish's opening book is disabled, whereas AZ has, in effect, memorized a 
huge opening book.
- The match was against SF 8 (one year old) rather than the latest dev version.

To this I would add that the losses of Stockfish that I played through seemed 
to be largely self-similar, so it is possible that Stockfish has a relatively 
limited number of weaknesses that AZ does not, but the format of the match 
amplifies the issue.

So the attitude among the SF core is pretty competitive. Which is great news 
for continued development.

My concern about many of these points of comparison is that they presume how AZ 
scales. In the absence of data, I would guess that AZ gains much less from 
hardware than SF. I am basing this guess on two known facts. First is that AZ 
did not lose a game, so the upper bound on its strength is perfection. Second, 
AZ is a knowledge intensive program, so it is counting on judgement to a larger 
degree.

But I could be wrong. Maybe AZ falls apart tactically without 80K pops. There 
is no data, so all WAGs are valid.


-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 4:13 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 21:19, Petr Baudis wrote:
> Yes, that also struck me.  I think it's good news for the community
> to see it reported that this works, as it makes the training process
> much more straightforward.  They also use just 800 simulations,
> another good news.  (Both were one of the first tradeoffs I made in
> Nochi.)

The 800 simulations are a setting that works over all 3 games. It's not
necessarily as good for 19x19 Go (more legal moves than the other games,
so less deep trees).

As for both the lack of testing and this parameter, someone has remarked
on github that the DeepMind hardware is fixed, so this also represents a
tuning between the speed of the learning machine and the speed of the
self-play machines.

In my experience, just continuing to train the network further (when no
new data is batched in) often regresses the performance by 200 or more
Elo. So it's not clear this step is *entirely* ignorable unless you have
already tuned the speed of the other two aspects.

> Another interesting tidbit: they use the TPUs to also generate the 
> selfplay games.

I think this was already known.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
I see the same dynamics that you do, Darren. The 400-game match always has some 
probability of being won by the challenger. It is just much more likely if the 
challenger is stronger than the champion.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Wednesday, December 6, 2017 7:55 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

>> One of the changes they made (bottom of p.3) was to continuously 
>> update the neural net, rather than require a new network to beat it 
>> 55% of the time to be used. (That struck me as strange at the time, 
>> when reading the AlphaGoZero paper - why not just >50%?)

Gian wrote:
> I read that as a simple way of establishing confidence that the result 
> was statistically significant > 0. (+35 Elo over 400 games...

Brian Sheppard also:
> Requiring a margin > 55% is a defense against a random result. A 55% 
> score in a 400-game match is 2 sigma.

Good point. That makes sense.

But (where A is best so far, and B is the newer network) in A vs. B, if B wins 
50.1%, there is a slightly greater than 50-50 chance that B is better than A. 
In the extreme case of 54.9% win rate there is something like a 94%-6% chance 
(?) that B is better, but they still throw B away.

If B just got lucky, and A was better, well the next generation is just more 
likely to de-throne B, so long-term you won't lose much.

On the other hand, at very strong levels, this might prevent improvement, as a 
jump to 55% win rate in just one generation sounds unlikely to happen. (Did I 
understand that right? As B is thrown away, and A continues to be used, there 
is only that one generation within which to improve on it, each time?)

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Eric Boesch
I could be drawing wrong inferences from incomplete information, but as
Darren pointed out, this paper does leave the impression Alpha Zero is not
as strong as the real AlphaGo Zero, in which case it would be clearer to
say so explicitly. Of course the chess and shogi results are impressive
regardless. (In chess, the 28/100 wins is good, but 0 losses is even
better. Entering a drawn sequence starting from an inferior position --
such as playing black -- is a desirable result for even a perfect program
without contempt, so failing to win as black is not a good indicator of
strength.)

Comparing the Elo charts in this new paper and the Nature paper on AlphaGo
Zero, and assigning AlphaGo Lee a reference rating of 0 Elo, it appears
that the order in strength of go play is Alpha Zero (~900 Elo), AlphaGo
Master (~1400 Elo), then the full-strength AlphaGo Zero (~1500 Elo).

I would also think Alpha Zero's 8 hours of training with the help of an
immense network of 5,000 first generation TPUs is more expensive, and only
faster in a strictly chronological sense, than AlphaGo Zero 20-block
3-day's training with 4 second generation TPUs.


On Wed, Dec 6, 2017 at 4:29 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> The chess result is 64-36: a 100 rating point edge! I think the Stockfish
> open source project improved Stockfish by ~20 rating points in the last
> year. Given the number of people/computers involved, Stockfish’s annual
> effort level seems comparable to the AZ effort.
>
>
>
> Stockfish is really, really tweaked out to do exactly what it does. It is
> very hard to improve anything about Stockfish. To be clear: I am not
> disparaging the code or people or project in any way. The code is great,
> people are great, project is great. It is really easy to work on Stockfish,
> but very hard to make progress given the extraordinarily fine balance of
> resources that already exists.  I tried hard for about 6 months last year
> without any successes. I tried dozens (maybe 100?) experiments, including
> several that were motivated by automated tuning or automated searching for
> opportunities. No luck.
>
>
>
> AZ would dominate the current TCEC. Stockfish didn’t lose a game in the
> semi-final, failing to make the final because of too many draws against the
> weaker players.
>
>
>
> The Stockfish team will have some self-examination going forward for sure.
> I wonder what they will decide to do.
>
>
>
> I hope this isn’t the last we see of these DeepMind programs.
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Richard Lorentz
> *Sent:* Wednesday, December 6, 2017 12:50 PM
> *To:* computer-go@computer-go.org
> *Subject:* Re: [Computer-go] Mastering Chess and Shogi by Self-Play with
> a General Reinforcement Learning Algorithm
>
>
>
> One chess result stood out for me, namely, just how much easier it was for
> AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with
> black (3 wins, 47 draws, 0 losses).
>
> Maybe we should not give up on the idea of White to play and win in chess!
>
> On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:
>
> Hi,
>
> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
>
> Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.
> org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-
> j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlN
> u40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=
>
> AlphaZero(Chess) outperformed Stockfish after 4 hours,
> AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> Search is MCTS.
> AlphaZero(Chess) searches 80,000 positions/sec.
> Stockfishsearches 70,000,000 positions/sec.
> AlphaZero(Shogi) searches 40,000 positions/sec.
> elmo searches 35,000,000 positions/sec.
>
> Thanks,
> Hiroshi Yamashita
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__
> computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_
> cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=
> w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=
> Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Darren Cook
>> One of the changes they made (bottom of p.3) was to continuously 
>> update the neural net, rather than require a new network to beat
>> it 55% of the time to be used. (That struck me as strange at the
>> time, when reading the AlphaGoZero paper - why not just >50%?)

Gian wrote:
> I read that as a simple way of establishing confidence that the 
> result was statistically significant > 0. (+35 Elo over 400 games...

Brian Sheppard also:
> Requiring a margin > 55% is a defense against a random result. A 55% 
> score in a 400-game match is 2 sigma.

Good point. That makes sense.

But (where A is best so far, and B is the newer network) in
A vs. B, if B wins 50.1%, there is a slightly greater than 50-50 chance
that B is better than A. In the extreme case of 54.9% win rate there is
something like a 94%-6% chance (?) that B is better, but they still
throw B away.

If B just got lucky, and A was better, well the next generation is just
more likely to de-throne B, so long-term you won't lose much.

On the other hand, at very strong levels, this might prevent
improvement, as a jump to 55% win rate in just one generation sounds
unlikely to happen. (Did I understand that right? As B is thrown away,
and A continues to be used, there is only that one generation within
which to improve on it, each time?)

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
Requiring a margin > 55% is a defense against a random result. A 55% score in a 
400-game match is 2 sigma.

But I like the AZ policy better, because it does not require arbitrary 
parameters. It also improves more fluidly by always drawing training examples 
from the current probability distribution, and when the program is close to 
perfect you would be able to capture the lest 5% of skill.

I am not sure what to make of the AZ vs AGZ result. Mathematically, there 
should be a degree of training sufficient for AZ to exceed any fixed level of 
skill, such as AGZ's 40/40 level. So there must be a reason why DeepMind did 
not report such a result, but it unclear what that is.

-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Darren Cook
Sent: Wednesday, December 6, 2017 12:58 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

> Mastering Chess and Shogi by Self-Play with a General Reinforcement 
> Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update the 
neural net, rather than require a new network to beat it 55% of the time to be 
used. (That struck me as strange at the time, when reading the AlphaGoZero 
paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing 
to the 20-block, 3-day version. Not the 40-block, 40-day version that was even 
stronger.

As papers rarely show failures, can we take it to mean they couldn't 
out-perform their best go bot, do you think? If so, I wonder how hard they 
tried?

In other words, do you think the changes they made from AlphaGo Zero to Alpha 
Zero have made it weaker (when just viewed from the point of view of making the 
strongest possible go program).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
The chess result is 64-36: a 100 rating point edge! I think the Stockfish open 
source project improved Stockfish by ~20 rating points in the last year. Given 
the number of people/computers involved, Stockfish’s annual effort level seems 
comparable to the AZ effort.

 

Stockfish is really, really tweaked out to do exactly what it does. It is very 
hard to improve anything about Stockfish. To be clear: I am not disparaging the 
code or people or project in any way. The code is great, people are great, 
project is great. It is really easy to work on Stockfish, but very hard to make 
progress given the extraordinarily fine balance of resources that already 
exists.  I tried hard for about 6 months last year without any successes. I 
tried dozens (maybe 100?) experiments, including several that were motivated by 
automated tuning or automated searching for opportunities. No luck.

 

AZ would dominate the current TCEC. Stockfish didn’t lose a game in the 
semi-final, failing to make the final because of too many draws against the 
weaker players.

 

The Stockfish team will have some self-examination going forward for sure. I 
wonder what they will decide to do.

 

I hope this isn’t the last we see of these DeepMind programs.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Richard Lorentz
Sent: Wednesday, December 6, 2017 12:50 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

 

One chess result stood out for me, namely, just how much easier it was for 
AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with 
black (3 wins, 47 draws, 0 losses).

Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:

Hi, 

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. 

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=>
 
&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=
 

AlphaZero(Chess) outperformed Stockfish after 4 hours, 
AlphaZero(Shogi) outperformed elmo after 2 hours. 

Search is MCTS. 
AlphaZero(Chess) searches 80,000 positions/sec. 
Stockfishsearches 70,000,000 positions/sec. 
AlphaZero(Shogi) searches 40,000 positions/sec. 
elmo searches 35,000,000 positions/sec. 

Thanks, 
Hiroshi Yamashita 

___ 
Computer-go mailing list 
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org>  
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=>
 
&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Petr Baudis
On Wed, Dec 06, 2017 at 09:57:42AM -0800, Darren Cook wrote:
> > Mastering Chess and Shogi by Self-Play with a General Reinforcement
> > Learning Algorithm
> > https://arxiv.org/pdf/1712.01815.pdf
> 
> One of the changes they made (bottom of p.3) was to continuously update
> the neural net, rather than require a new network to beat it 55% of the
> time to be used. (That struck me as strange at the time, when reading
> the AlphaGoZero paper - why not just >50%?)

  Yes, that also struck me.  I think it's good news for the community to
see it reported that this works, as it makes the training process much
more straightforward.  They also use just 800 simulations, another good
news.  (Both were one of the first tradeoffs I made in Nochi.)

  Another interesting tidbit: they use the TPUs to also generate the
selfplay games.

> The AlphaZero paper shows it out-performs AlphaGoZero, but they are
> comparing to the 20-block, 3-day version. Not the 40-block, 40-day
> version that was even stronger.
> 
> As papers rarely show failures, can we take it to mean they couldn't
> out-perform their best go bot, do you think? If so, I wonder how hard
> they tried?

  IMHO the most likely explanation is that this research has been going
on for a while and when they started in this direction, that early
version was their state-of-art baseline.  This kind of chronology, with
the 40-block version being almost "a last-minute addition", is imho
apparent even in the text of the Nature paper.

  Also, the 3-day version simply had roughly similar training time
available as AlphaZero did.

-- 
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 19:48, Xavier Combelle wrote:
> Another result is that chess is really drawish, at the opposite of shogi

We sort-of knew that, but OTOH isn't that also because the resulting
engine strength was close to Stockfish, unlike in other games?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
> The AlphaZero paper shows it out-performs AlphaGoZero, but they are
> comparing to the 20-block, 3-day version. Not the 40-block, 40-day
> version that was even stronger.
> As papers rarely show failures, can we take it to mean they couldn't
> out-perform their best go bot, do you think? ...
> 
> In other words, do you think the changes they made from AlphaGo Zero to
> Alpha Zero have made it weaker ...

Just some speculation:

The article on AlphaGo Zero is in NATURE.
Perhaps they made the AlphaZero research simultaneously,
and when facing problems with acceptance in a journal (like NATURE)
they decided to publish a preversion on AlphaZero in arXiv.
So, perhaps the 40-block 40-day experiment was not yet done when
they had written the AlphaZero paper.

Just speculating...
Ingo.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 18:57, Darren Cook wrote:
>> Mastering Chess and Shogi by Self-Play with a General Reinforcement
>> Learning Algorithm
>> https://arxiv.org/pdf/1712.01815.pdf
> 
> One of the changes they made (bottom of p.3) was to continuously update
> the neural net, rather than require a new network to beat it 55% of the
> time to be used. (That struck me as strange at the time, when reading
> the AlphaGoZero paper - why not just >50%?)

I read that as a simple way of establishing confidence that the result
was statistically significant > 0. (+35 Elo over 400 games - I don't
know by hearth how large the typical error margin of 400 games is, but I
think it won't be far off!)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Xavier Combelle
Another result is that chess is really drawish, at the opposite of shogi


Le 06/12/2017 à 18:50, Richard Lorentz a écrit :
> One chess result stood out for me, namely, just how much easier it was
> for AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather
> than with black (3 wins, 47 draws, 0 losses).
>
> Maybe we should not give up on the idea of White to play and win in chess!
>
> On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:
>> Hi,
>>
>> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
>> method.
>>
>> Mastering Chess and Shogi by Self-Play with a General Reinforcement
>> Learning Algorithm
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=
>>
>>
>> AlphaZero(Chess) outperformed Stockfish after 4 hours,
>> AlphaZero(Shogi) outperformed elmo after 2 hours.
>>
>> Search is MCTS.
>> AlphaZero(Chess) searches 80,000 positions/sec.
>> Stockfish    searches 70,000,000 positions/sec.
>> AlphaZero(Shogi) searches 40,000 positions/sec.
>> elmo searches 35,000,000 positions/sec.
>>
>> Thanks,
>> Hiroshi Yamashita
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
"Joshua Shriver"  asked:
> What about arimaa?

My personal impression: Arimaa should be rather easy for the
AlphaZero approach.


My questions:
* How well does the AlphaZero approach
perform in Non-zero-sum games?
(or in games with more than two players)

* How well does the AlphaZero approach
perform in games with a robot component
(for instance in Frisbee Go)?
https://www.althofer.de/robot-play/frisbee-robot-go.jpg

* How well does AlphaZero perform in games where "we"
know the best moves by mathematical analysis (for instance
the Nim game), or where we know that the second player
has a mirror strategy to secure a draw?

Ingo.

PS. For a long time I thought that Boston Dynamics was
the best horse in Google's staple. But it seems that
DeepMind was and is better...
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Darren Cook
> Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update
the neural net, rather than require a new network to beat it 55% of the
time to be used. (That struck me as strange at the time, when reading
the AlphaGoZero paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are
comparing to the 20-block, 3-day version. Not the 40-block, 40-day
version that was even stronger.

As papers rarely show failures, can we take it to mean they couldn't
out-perform their best go bot, do you think? If so, I wonder how hard
they tried?

In other words, do you think the changes they made from AlphaGo Zero to
Alpha Zero have made it weaker (when just viewed from the point of view
of making the strongest possible go program).

Darren
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Richard Lorentz
One chess result stood out for me, namely, just how much easier it was 
for AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather 
than with black (3 wins, 47 draws, 0 losses).


Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:

Hi,

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero 
method.


Mastering Chess and Shogi by Self-Play with a General Reinforcement 
Learning Algorithm
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e= 



AlphaZero(Chess) outperformed Stockfish after 4 hours,
AlphaZero(Shogi) outperformed elmo after 2 hours.

Search is MCTS.
AlphaZero(Chess) searches 80,000 positions/sec.
Stockfish    searches 70,000,000 positions/sec.
AlphaZero(Shogi) searches 40,000 positions/sec.
elmo searches 35,000,000 positions/sec.

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Joshua Shriver
What about arimaa?

On Wed, Dec 6, 2017 at 9:28 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de> wrote:
> It seems, we are living in extremely
> heavy times ...
>
> I want to go to bed now and meditate for threee days.
>
>> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method.
>> Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
>> Algorithm
>> https://arxiv.org/pdf/1712.01815.pdf
>>
>> AlphaZero(Chess) outperformed Stockfish after 4 hours,
>> AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> It may sound strange, but at the moment my only hopes for
> games too difficult for AlphaZero might be
>
> * a connection game like Hex (on 19x19 board)
>
> * a game like Clobber (based on CGT)
>
> Mastering Clobber would mean that also the concept of
> combinatorial game theory would be "easily" learnable.
>
>
> Side question: Would the classic Nim game be
> a trivial nut for AlphaZero ?
>
> Ingo (is now starting to hope for an AlphaZero type program
> that can do "general" mathematical research).
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread David Wu
Hex:
https://arxiv.org/pdf/1705.08439.pdf

This is not on a 19x19 board, and it was not tested against the current
state of the art (Mohex 1.0 was the state of the art at its time, but is at
least several years old now, I think), but they do get several hundred elo
points stronger than this old version of Mohex, have training curves that
suggest that they still haven' reached the limit of improvement, and are
doing it with orders of magnitude less computation than Google would have
available.

So, I think it is likely that hex is not going to be too difficult for
AlphaZero or similar architecture.


On Wed, Dec 6, 2017 at 9:28 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> It seems, we are living in extremely
> heavy times ...
>
> I want to go to bed now and meditate for threee days.
>
> > DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
> > Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> > https://arxiv.org/pdf/1712.01815.pdf
> >
> > AlphaZero(Chess) outperformed Stockfish after 4 hours,
> > AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> It may sound strange, but at the moment my only hopes for
> games too difficult for AlphaZero might be
>
> * a connection game like Hex (on 19x19 board)
>
> * a game like Clobber (based on CGT)
>
> Mastering Clobber would mean that also the concept of
> combinatorial game theory would be "easily" learnable.
>
>
> Side question: Would the classic Nim game be
> a trivial nut for AlphaZero ?
>
> Ingo (is now starting to hope for an AlphaZero type program
> that can do "general" mathematical research).
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
It seems, we are living in extremely
heavy times ...

I want to go to bed now and meditate for threee days. 
 
> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method.
> Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
> Algorithm
> https://arxiv.org/pdf/1712.01815.pdf
> 
> AlphaZero(Chess) outperformed Stockfish after 4 hours,
> AlphaZero(Shogi) outperformed elmo after 2 hours.
 
It may sound strange, but at the moment my only hopes for
games too difficult for AlphaZero might be 

* a connection game like Hex (on 19x19 board)

* a game like Clobber (based on CGT)

Mastering Clobber would mean that also the concept of
combinatorial game theory would be "easily" learnable.


Side question: Would the classic Nim game be 
a trivial nut for AlphaZero ?

Ingo (is now starting to hope for an AlphaZero type program
that can do "general" mathematical research).
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Hiroshi Yamashita

Hi,

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm
https://arxiv.org/pdf/1712.01815.pdf

AlphaZero(Chess) outperformed Stockfish after 4 hours,
AlphaZero(Shogi) outperformed elmo after 2 hours.

Search is MCTS. 


AlphaZero(Chess) searches 80,000 positions/sec.
Stockfishsearches 70,000,000 positions/sec.
AlphaZero(Shogi) searches 40,000 positions/sec.
elmo searches 35,000,000 positions/sec.

Thanks,
Hiroshi Yamashita

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go