Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 22-05-17 21:01, Marc Landgraf wrote:
> But what you should really look at here is Leelas evaluation of the game.

Note that this is completely irrelevant for the discussion about
tactical holes and the position I posted. You could literally plug any
evaluation into it (save for a static oracle, in which case why search
at all...) and it would still have the tactical blindness being discussed.

It's an issue of limitations of the policy network, combined with the
way one uses the UCT formula. I'll use the one from the original AlphaGo
paper here, because it's public and should behave even worse:

u(s, a) = c_puct * P(s, a) * sqrt(total_visits / (1 + child_visits))

Note that P(s, a) is a direct factor here, which means that for a move
ignored by the policy network, the UCT term will almost vanish. In other
words, unless the win is immediately visible (and for tactics it won't),
you're not going to find it. Also note that this is a deviation from
regular UCT or PUCT, which do not have such a direct term and hence only
have a disappearing prior, making the search eventually more exploratory.

Now, even the original AlphaGo played moves that surprised human pros
and were contrary to established sequences. So where did those come
from? Enough computation power to overcome the low probability?
Synthesized by inference from the (much larger than mine) policy network?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 10:51, Hideki Kato wrote:
> (2) The number of possible positions (input of the value net) in 
> real games is at least 10^30 (10^170 in theory).  If the value 
> net can recognize all?  L&Ds depend on very small difference of 
> the placement of stones or liberties.  Can we provide necessary 
> amount of training data?  Have the network enough capacity?  
> The answer is almost obvious by the theory of function 
> approximation.  (ANN is just a non-linear function 
> approximator.)

DCNN clearly have some ability to generalize from learned data and
perform OK even with unseen examples. So I don't find this a very
compelling argument. It's not like Monte Carlo playouts are going to
handle all sequences correctly either.

Evaluations are heuristic guidance for the search, and a help when the
search terminates in an unresolved position. Having multiple independent
ones improves the accuracy of the heuristic - a basic ensemble.

> (3) CNN cannot learn exclusive-or function due to the ReLU 
> activation function, instead of traditional sigmoid (tangent 
> hyperbolic).  CNN is good at approximating continuous (analog) 
> functions but Boolean (digital) ones.

Are you sure this is correct? Especially if we allow leaky ReLU?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 17:19, Hideki Kato wrote:
> Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>:
> 
>> Now, even the original AlphaGo played moves that surprised human pros
>> and were contrary to established sequences. So where did those come
>> from? Enough computation power to overcome the low probability?
>> Synthesized by inference from the (much larger than mine) policy network?
> 
> Demis Hassabis said in a talk:
> After the game with Sedol, the team used "adversarial learning" in 
> order to fill the holes in policy net (such as the Sedol's winning 
> move in the game 4).

I said, the "original AlphaGo", i.e. the one used in the match against
Lee Sedol. According to the Nature paper, the policy net was trained
with supervised learning only [1]. And yet...

In the attached SGF, AlphaGo played P10, which was considered a very
surprising move by all commentators. Presumably, this means it's not
seen in high level human play, and would not get a high rating in the
policy net. I can sort-of confirm this:

0.295057654 (E13)
...(60 more moves follow)...
0.11952 (P10)

So, 0.001% probability. Demis commented that Lee Sedol's winning move in
game 4 was a one in 10 000 move. This is a 1 in 100 000 move.
Differently trained policy nets might rate it a bit higher or lower, but
simply due to the fact that was considered very un-human to do, it seems
unlikely to ever be rated highly by a policy net based on supervised
learning.

So in AlphaGo's formula, you're dealing with a reduction of the UCT term
by a factor 100 000 plus or minus some order of magnitude.

  D6 -> 1359934 (W: 53.21%) (U: 49.34%) (V: 55.15%:  38918) (N:  6.3%)
PV: D6 F6 E7 F7 C8 B8 D7 B7 E9 C9 F8 H7 H
9 K7 H3 K9
...many moves...
 P10 -> 421 (W: 52.68%) (U: 50.09%) (V: 53.98%:  8) (N:  0.0%)
PV: P10 Q10 P8 Q9

Now, of course AlphaGo had a few orders of magnitude more hardware, but
you can see from the above that it's, eh, not easy for P10 to overtake
the top moves here in playout count.

And yet, that's the move that was played.

[1] I'm assuming that what played the match corresponds to what they
published there - maybe that is my mistake. I'm not sure I remember the
relevant timeline correctly.

-- 
GCP


sedol.sgf
Description: application/go-sgf
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Xeon Phi result

2017-06-07 Thread Gian-Carlo Pascutto
Hi all,

I managed to get a benchmark off of a Intel® Xeon Phi™ Processor 7250
16GB, 1.40 GHz, 68 core (272 thread) system.

I used a version of Leela essentially identical to the public Leela
0.10.0, but compiled with -march=knl (using gcc 5.3), using an
appropriate version of Intel MKL (2017.1 for MIC) and increasing the
maximum amount of threads.

benchmark:

~ 151000 g/s (557 g/s per thread)

netbench:

predictions ->   670 p/s
evaluations ->  3007 p/s

This was with the 16G HBM bound as addressable memory. Using the regular
DDR4 cuts "netbench" numbers in half, but has no big impact on "benchmark".

This means it's about 5 times faster (in integer operations) than a quad
core desktop with HT, and about similar in floating point performance to
a mid-range videocard.

It's a nice machine, if a bit pricey.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Gian-Carlo Pascutto
On 24-05-17 05:33, "Ingo Althöfer" wrote:
>> So, 0.001% probability. Demis commented that Lee Sedol's winning move in
>> game 4 was a one in 10 000 move. This is a 1 in 100 000 move.
> 
> In Summer 2016 I checked the games of AlphaGo vs Lee Sedol
> with repeated runs of CrazyStone DL:
> In 3 of 20 runs the program selected P10. It
> turned out that a rather early "switch" in the search was
> necessary to arrive at P10. But if CS did that it
> remained with this candidate.

I guess it's possible this move is selected by a policy other than the
neural network. Or perhaps the probability can be much higher with a
differently trained policy net.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Gian-Carlo Pascutto
On 19-06-17 17:38, Vincent Richard wrote:

> During my research, I’ve trained a lot of different networks, first on
> 9x9 then on 19x19, and as far as I remember all the nets I’ve worked
> with learned quickly (especially during the first batches), except the
> value net which has always been problematic (diverge easily, doesn't
> learn quickly,...) . I have been stuck on the 19x19 value network for a
> couple months now. I’ve tried countless of inputs (feature planes) and
> lots of different models, even using the exact same code as others. Yet,
> whatever I try, the loss value doesn’t move an inch and accuracy stays
> at 50% (even after days of training). I've tried to change the learning
> rate (increase/decrease), it doesn't change. However, if I feed a stupid
> value as target output (for example black always win) it has no trouble
> learning.
> It is even more frustrating that training any other kind of network
> (predicting next move, territory,...) goes smoothly and fast.
> 
> Has anyone experienced a similar problem with value networks or has an
> idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Gian-Carlo Pascutto
On 19/06/2017 21:31, Vincent Richard wrote:
> - The data is then analyzed by a script which extracts all kind of
> features from games. When I'm training a network, I load the features I
> want from this analysis to build the batch. I have 2 possible methods
> for the batch construction. I can either add moves one after the other
> (the fast mode) or pick random moves among different games (slower but
> reduces the variance). 

You absolutely need the latter, especially as for outcome prediction the
moves from the same game are not independent samples.

> During sime of the tests, all the networks I was training had the same
> layers except for the last. So as you suggested, I was also wondering if
> this last layer wasn’t the problem. Yet, I haven’t found any error.
...
> However, if I feed a stupid
> value as target output (for example black always win) it has no trouble
> learning.

A problem with side to move/won side marking in the input or feature
planes, or with the expected outcome (0 vs 1 vs -1)?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] July KGS bot tournament

2017-07-08 Thread Gian-Carlo Pascutto
On 8/07/2017 9:07, Nick Wedd wrote:
> The July KGS bot tournament will be on Sunday, July 7th, starting at
> 08:00 UTC and end by 15:00 UTC.  It will use 19x19 boards, with
> time limits of 14 minutes each and  very fast Canadian overtime, and
> komi of 7½.  It will be a Swiss tournament.
>  See http://www.gokgs.com/tournInfo.jsp?id=1116
> 

The announced time control doesn't match up with the one on the web page.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] KGS Bot tournament July

2017-07-09 Thread Gian-Carlo Pascutto
On 9/07/2017 17:41, "Ingo Althöfer" wrote:
> Hello,
> 
> it seems that the KGS bot tournament did not start, yet.
> What is the matter?

The tournament was played, I am not sure why the standings did not update.

If I'm reading the game histories correctly:

1. Zen7 pts
2. Leela  4 pts
3. Aya3 pts
4. gnugo  0 pts

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Gian-Carlo Pascutto
On 23-07-17 18:24, David Wu wrote:
> Has anyone tried this sort of idea before?

I haven't tried it, but (with the computer chess hat on) these kind of
proposals behave pretty badly when you get into situations where your
evaluation is off and there are horizon effects. The top move drops off
and now every alternative that has had less search looks better (because
it hasn't seen the disaster yet). You do not want discounting in this
situation.

It's true that a move with a superior winrate than the move with the
maximum amount of simulations is a good candidate to be better. Some
engines will extend the time when this happens. Leela will play it, in
certain conditions.

> I recall a paper published on this basis. A paper presumably about 
> CrazyStone: Efficient Selectivity and Backup Operators in
> Monte-Carlo Tree Search.

I'm reasonably sure this did not include forgetting/discounting, only
shifting between average and maximum by focusing simulations near the
maximum. It's the predecessor of UCT.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Gian-Carlo Pascutto
On 24-07-17 16:07, David Wu wrote:
> Hmm. Why would discounting make things worse? Do you mean that you
> want the top move to drop off slower (i.e. for the bot to take longer
> to achieve the correct valuation of the top move) to give it "time"
> to search the other moves enough to find that they're also bad?

I don't want the top move to drop off slower, I just don't want to play
other moves until they've been searched to comparable "depth".

If there's a disaster lurking behind the main-variation that we only
just started to understand, the odds are, the same disaster also lurks
in a few of the alternative moves.

> I would have thought that with typical exploration policies, whether
> the top move drops off a little faster or a little slower, once its
> winrate drops down close to the other moves, the other moves should
> get a lot of simulations as well.

Yes. But the goal of the discounting is, that a new move can make it
above the old one, despite having had less total search effort.

My point is that it is not always clear this is a positive effect.

> I know that there are ways to handle this at the root, via time
> control or otherwise.

The situation isn't necessarily different here, if you consider that at
the root the best published technique is still "think longer so the new
move can overtake the old one", not "play the new move".

Anyway, not saying this can't work. Just pointing out the problem areas.

I would be a bit surprised if discounting worked for Go because it's
been published for other areas (e.g. Amazons) but I don't remember any
reports of success in Go. But the devil can be in the details (i.e. the
discounting formula) for tricks like this.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 17-08-17 21:35, Darren Cook wrote:
> "I'm sure some things were learned about parallel processing... but the
> real science was known by the 1997 rematch... but AlphaGo is an entirely
> different thing. Deep Blue's chess algorithms were good for playing
> chess very well. The machine-learning methods AlphaGo uses are
> applicable to practically anything."
> 
> Agree or disagree?

Deep Thought (the predecessor of Deep Blue) used a Supervised Learning
approach to set the initial evaluation weights. The details might be
lost in time but it's reasonable to assume some were carried over to
Deep Blue. Deep Blue itself used hill-climbing to find evaluation
features that did not seem to correlate with strength much, and improve
them.

A lot of the strength of AlphaGo comes from a fast, parallelized tree
search.

Uh, what was the argument again?

Maybe we should stop inventing artificial differences and appreciate
that the tools in our toolbox have become much sharper over the years.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 18-08-17 16:56, Petr Baudis wrote:
>> Uh, what was the argument again?
> 
>   Well, unrelated to what you wrote :-) - that Deep Blue implemented
> existing methods in a cool application, while AlphaGo introduced
> some very new methods (perhaps not entirely fundamentally, but still
> definitely a ground-breaking work).

I just fundamentally disagree with this characterization, which I think
is grossly unfair to the Chiptest/Deep Thought/Deep Blue lineage.
Remember there were 12 years in-between those programs.

They did not just...re-implement the same "existing methods" over and
over again all that time. Implementation details and exact workings are
very important [1]. I imagine the main reason this false distinction
(i.e. the "artificial difference" from my original post) is being made
is, IMHO, that you're all aware of the fine nuances of how AlphaGo DCNN
usage (for example) differs compared to previous efforts, but you're not
aware of the same nuances in Chiptest and successors etc.

[1] As is speed, another dirty word in AI circles that is nevertheless
damn important for practical performance.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 18/08/2017 20:34, Petr Baudis wrote:
>   You may be completely right!  And yes, I was thinking about Deep Blue
> in isolation, not that aware about general computer chess history.  Do
> you have some suggested reading regarding Deep Blue and its lineage and
> their contributions to the field of AI at large?

I sure do. I *love* Hsu's papers, although it's hard now to imagine the
context some of his statements were made in, 30 year ago, because we
know the outcome. Back then, I'm sure many people thought he was a
raving lunatic.

His key idea was that, given ample of evidence program strength scaled
fairly well with computer speed, we should make the programs faster. He
did so by converting chess search to a literal hardware circuit, in an
implementation that was both algorithmically more efficient, and faster,
achieving about *3 orders of magnitude* improvement over what was then
the state of the art. And then designing a parallel search that worked
with it and scaled well enough.

Saying that these "implemented existing methods" is factually wrong, and
betrays a deep misunderstanding of the importance of computing power in
AI research. But I'll get back to this, later.

The original paper, "A Two-Million Moves/s CMOS Single-Chip Chess Move
Generator" was published in 1987. In the conclusion, it states "The best
chess machine now in existence is still about 400 to 500 rating points
below the Human World Chess Champion. Earlier experimental evidence
shows that each doubling in machine speed roughly corresponds to a 100
rating points increase...It is questionable that this remains true at
high level play. But nonetheless with a potential 100-1000 fold speed-up
at the door, *something interesting is probably about to happen*."

In this PhD thesis, he goes further, and draws the scaling graph with
Kasparov on it, at the end, and says in the introduction: "This
dissertation is mainly a collection of exploratory work on what I call
the "ultimate" chess machine - a chess machine that is capable of
searching at least 100 million nodes per second and possibly beyond 1
billion nodes per second. Current evidence seems to indicate that such a
machine will have an overwhelming chance of defeating the human World
Chess Champion."

He wrote that in 1989!

Kasparov, the same expert whose claims about Go and chess started this
very thread, had said the year before that no Grandmaster would be
defeated in tournament play before the year 2000. That gives you some
idea how outlandish Hsu's ideas seemed at the time. Or, for that matter,
how reliable Kasparov's opinion is in these matters.

Hsu achieved his goal in 1997, with 3 years to spare. Kasparov's
response was to call him a cheater.

Now, now, you might be thinking, was it all about speed? It was not -
the above was just Hsu's shtick, who was one member of the team. But, do
not for a moment make the mistake of underestimating just how important
speed is.

Do you know why, decades after having discarded them, we suddenly
started using neural networks again, and, well, do they turn out to work
well for Go now?

It's because we have several orders of magnitude more computing power.
Made possible by dedicated chips for neural network computations (OK, so
maybe they were intended for computer games - turns out the functions
are pretty similar, not to speak of TPUs).

And Hsu? He's working on FPGA's at Microsoft, who're mainly using them
to accelerate AI research and applications. In one of his last
interviews, in 2007, he predicted that "world-champion-level Go machine
can be built within 10 years." He got the details of the approach wrong,
though.

Others members also published several papers, i.e. Murray Campbell,
Thomas Anantharaman and Andreas Nowatzyk.

Nowatzyk has published the original automated evaluation tuning code
used by Deep Though. It's available, together with an explanation, at
http://www.tim-mann.org/deepthought.html

This was significant, because software based programs at the time had to
trade off evaluation terms for speed, so they mostly didn't have very
few, and could rely on manual tuning. Existing methods, you say?

Anantharaman's most known work is the publication of Singular
Extensions. The contribution of this method is somewhat hazy - with Hsu
admitting that they overestimated the initial gain from them due to
measurement errors - but improved methods are in fact in use in current
top of the line chess engines.

Campbell has published on a bunch of subjects. A ton on parallel game
tree search, and a method for biasing the tree search based on human
games. We call that a policy net, nowadays. Ok, maybe I'm stretching a
bit here.

Now, in as to how these methods "contributed to the AI field at large",
which I interpret as asking how well they generalize, that's an
interesting question. But it's also an interesting question that you can
ask of AlphaGo's contributions. Doing move prediction with a DCNN was
first done by Clark and Storkey of the uni

Re: [Computer-go] Deep Blue the end, AlphaGo the beginning?

2017-08-18 Thread Gian-Carlo Pascutto
On 18/08/2017 23:07, uurtamo . wrote:
> They run on laptops. A program that could crush a grandmaster will run
> on my laptop. That's an assertion I can't prove, but I'm asking you to
> verify it or suggest otherwise.

Sure.

> Now the situation with go is different.

For what it's worth, I would expect the next release of Zen to make this
achievable as well. Especially if it supports GPU acceleration, and you
have one of those laptops with a GTX 1080 in it :-) But yes, chess is
comparatively further ahead against humans.

> But if we do agree that the problem itself is fundamentally harder,
> (which I believe it is) and we don't want to ascribe its solution simply
> to hardware (which people tried to do with big blue), then we should
> acknowledge that it required more innovation.
> 
> I do agree, and hope that you do, that this innovation is all part of a
> continuum of innovation that is super exciting to understand.

Of course I do. That is the whole point I was making with "appreciating
the sharpened tools".

My objection was to the claim that making Deep Blue didn't require any
innovation or new methods at all. They beat Kasparov in 1997, not 2017!

There is a secondary argument whether the methods used for Deep Blue
generalize as well as the methods used for AlphaGo. I think that
argument may not be as simple and clear-cut as Kasparov implied, because
for one, there are similarities and crossover in which methods both
programs used.

But I understand where it comes from. SL/RL and DCNN's (more associated
with AlphaGo) seem like a broader hammer than tree search (more
associated with Deep Blue).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly
reduced the move prediction accuracy, but reduced the value error and
boosted playing performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with
them reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo
rating of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may
just mean the value network is simply very good - and perhaps relatively
better than the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully
observable solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the
internal understanding of the network as an optimization. That sucks
though - it's annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated
at all, and it's hopeful the training is "faster" than previous
approaches - but many things look fast if you can throw 64 GPU workers
at a problem.

In this context, the graphs of the differing network architectures
causing huge strength discrepancies are both good and bad. Making a
better pick can cause you to get massively better results, take a bad
pick and you won't come close.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> A stunning result. The NN uses a standard vision architecture (no Go
> adaptation beyond what is necessary to represent the game state).

The paper says that Master (4858 rating) uses Go specific features,
initialized by SL, and the same technique. Without go features, and
without initialization, it's Zero (5185 rating).

The obvious question is, what would be the result of using go features
and not initializing?

I would expect that providing liberties is a useful shortcut (see my
remark about game history!). But I'm willing to be surprised :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Gian-Carlo Pascutto
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I think
> to myself “Wow, this is brilliant! And I think I see the next step.”
> When I read their next paper, they show me the next *three* steps.

Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it
was somewhat obvious that just self-playing that should lead to an
improved policy and value net.

And before someone accuses me of Captain Hindsighting here, this was
pointed out on this list:
http://computer-go.org/pipermail/computer-go/2017-January/009786.html

It looks to me like the real devil is in the details. Don't use a
residual stack? -600 Elo. Don't combine the networks? -600 Elo.
Bootstrap the learning? -300 Elo

We made 3 perfectly reasonable choices and somehow lost 1500 Elo along
the way. I can't get over that number, actually.

Getting the details right makes a difference. And they're getting them
right, either because they're smart, because of experience from other
domains, or because they're trying a ton of them. I'm betting on all 3.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Gian-Carlo Pascutto
On 18-10-17 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Another interesting tidbit:

The inputs don't contain a reliable board edge. The "white to move"
plane contains it, but only when white is to move.

So until AG Zero "black" learned that a go board is 19 x 19, the white
player had a serious advantage.

I think I will use 18 input layers :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On 19-10-17 13:23, Álvaro Begué wrote:
> Summing it all up, I get 22,837,864 parameters for the 20-block network
> and 46,461,544 parameters for the 40-block network.
> 
> Does this seem correct?

My Caffe model file is 185887898 bytes / 32-bit floats = 46 471 974

So yes, that seems pretty close. I'll send the model file and some
observations in a separate post.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On 19-10-17 13:00, Aja Huang via Computer-go wrote:
> Hi Hiroshi,
> 
> I think these are good questions. You can ask them at 
> https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/

It seems the question was indeed asked but not answered:
https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/dol03aq/

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Zero performance

2017-10-20 Thread Gian-Carlo Pascutto
I reconstructed the full AlphaGo Zero network in Caffe:
https://sjeng.org/dl/zero.prototxt

I did some performance measurements, with what should be
state-of-the-art on consumer hardware:

GTX 1080 Ti
NVIDIA-Caffe + CUDA 9 + cuDNN 7
batch size = 8

Memory use is about ~2G. (It's much more for learning, the original
minibatch size of 32 wouldn't fit on this card!)

Running 2000 iterations takes 93 seconds.

In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS
simulations, and they expand 1 node per visit (if I got it right) so
that would be 1600 network evaluations as well, or 200 of my iterations.

So it would take me ~9.3s to produce a self-play move, compared to 0.4s
for them.

I would like to extrapolate how long it will take to reproduce the
research, but I think I'm missing how many GPUs are in each self-play
worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games.

Let's say the latter is around 200 moves. They generated 29 million
games for the final result, which means it's going to take me about 1700
years to replicate this. I initially estimated 7 years based on the
reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything
in the calculations above, or was it really a *pile* of those 64 GPU
machines?

Because the performance on playing seems reasonable (you would be able
to actually run the MCTS on a consumer machine, and hence end up with a
strong program), I would be interested in setting up a distributed
effort for this. But realistically there will be maybe 10 people
joining, 80 if we're very lucky (looking at Stockfish numbers). That
means it'd still take 20 to 170 years.

Someone please tell me I missed a factor of 100 or more somewhere. I'd
love to be wrong here.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-20 Thread Gian-Carlo Pascutto
On 20-10-17 19:44, Gian-Carlo Pascutto wrote:
> Memory use is about ~2G. (It's much more for learning, the original
> minibatch size of 32 wouldn't fit on this card!)

Whoops, this is not true.

It fits! Barely: 10307MiB / 11171MiB

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-20 Thread Gian-Carlo Pascutto
On Fri, Oct 20, 2017, 21:48 Petr Baudis  wrote:

>   Few open questions I currently have, comments welcome:
>
>   - there is no input representing the number of captures; is this
> information somehow implicit or can the learned winrate predictor
> never truly approximate the true values because of this?
>

They are using Chinese rules, so prisoners don't matter. There are simply
less stones of one color on the board.


>   - what ballpark values for c_{puct} are reasonable?
>

The original paper has the value they used. But this likely needs tuning. I
would tune with a supervised network to get started, but you need games for
that. Does it even matter much early on? The network is random :)


>   - why is the dirichlet noise applied only at the root node, if it's
> useful?
>

It's only used to get some randomness in the move selection, no ? It's not
actually useful for anything besides that.


>   - the training process is quite lazy - it's not like the network sees
> each game immediately and adjusts, it looks at last 500k games and
> samples 1000*2048 positions, meaning about 4 positions per game (if
> I understood this right) - I wonder what would happen if we trained
> it more aggressively, and what AlphaGo does during the initial 500k
> games; currently, I'm training on all positions immediately, I guess
> I should at least shuffle them ;)
>

I think the lazyness may be related to the concern that reinforcement
methods can easily "forget" things they had learned before. The value
network training also likes positions from distinct games.


-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-20 Thread Gian-Carlo Pascutto
I agree. Even on 19x19 you can use smaller searches. 400 iterations MCTS is
probably already a lot stronger than the raw network, especially if you are
expanding every node (very different from a normal program at 400
playouts!). Some tuning of these mini searches is important. Surely you
don't want to explore every child node for the first play urgency... I
remember this little algorithmic detail was missing from the first paper as
well.

So that's a factor 32 gain. Because the network is smaller, it should learn
much faster too. Someone on reddit posted a comparison of 20 blocks vs 40
blocks.

With 10 people you can probably get some results in a few months. The
question is, how much Elo have we lost on the way...

Another advantage would be that, as long as you keep all the SGF, you can
bootstrap a bigger network from the data! So, nothing is lost from starting
small. You can "upgrade" if the improvements start to plateau.

On Fri, Oct 20, 2017, 23:32 Álvaro Begué  wrote:

> I suggest scaling down the problem until some experience is gained.
>
> You don't need the full-fledge 40-block network to get started. You can
> probably get away with using only 20 blocks and maybe 128 features (from
> 256). That should save you about a factor of 8, plus you can use larger
> mini-batches.
>
> You can also start with 9x9 go. That way games are shorter, and you
> probably don't need 1600 network evaluations per move to do well.
>
> Álvaro.
>
>
> On Fri, Oct 20, 2017 at 1:44 PM, Gian-Carlo Pascutto 
> wrote:
>
>> I reconstructed the full AlphaGo Zero network in Caffe:
>> https://sjeng.org/dl/zero.prototxt
>>
>> I did some performance measurements, with what should be
>> state-of-the-art on consumer hardware:
>>
>> GTX 1080 Ti
>> NVIDIA-Caffe + CUDA 9 + cuDNN 7
>> batch size = 8
>>
>> Memory use is about ~2G. (It's much more for learning, the original
>> minibatch size of 32 wouldn't fit on this card!)
>>
>> Running 2000 iterations takes 93 seconds.
>>
>> In the AlphaGo paper, they claim 0.4 seconds to do 1600 MCTS
>> simulations, and they expand 1 node per visit (if I got it right) so
>> that would be 1600 network evaluations as well, or 200 of my iterations.
>>
>> So it would take me ~9.3s to produce a self-play move, compared to 0.4s
>> for them.
>>
>> I would like to extrapolate how long it will take to reproduce the
>> research, but I think I'm missing how many GPUs are in each self-play
>> worker (4 TPU or 64 GPU or ?), or perhaps the average length of the games.
>>
>> Let's say the latter is around 200 moves. They generated 29 million
>> games for the final result, which means it's going to take me about 1700
>> years to replicate this. I initially estimated 7 years based on the
>> reported 64 GPU vs 1 GPU, but this seems far worse. Did I miss anything
>> in the calculations above, or was it really a *pile* of those 64 GPU
>> machines?
>>
>> Because the performance on playing seems reasonable (you would be able
>> to actually run the MCTS on a consumer machine, and hence end up with a
>> strong program), I would be interested in setting up a distributed
>> effort for this. But realistically there will be maybe 10 people
>> joining, 80 if we're very lucky (looking at Stockfish numbers). That
>> means it'd still take 20 to 170 years.
>>
>> Someone please tell me I missed a factor of 100 or more somewhere. I'd
>> love to be wrong here.
>>
>
>> --
>> GCP
>
>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-21 Thread Gian-Carlo Pascutto
On 20/10/2017 22:41, Sorin Gherman wrote:
> Training of AlphaGo Zero has been done on thousands of TPUs,
> according to this source: 
> https://www.reddit.com/r/baduk/comments/777ym4/alphago_zero_learning_from_scratch_deepmind/dokj1uz/?context=3
>
>  Maybe that should explain the difference in orders of magnitude that
> you noticed?

That would make a lot more sense, for sure. It would also explain the
25M USD number from Hassabis. That would be a lot of money to spend on
"only" 64 GPUs, or 4 TPU (which are supposed to be ~1 GPU).

There's no explanation where the number came from, but it seems that he
did similar math as in the original post here.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero performance

2017-10-21 Thread Gian-Carlo Pascutto
On 20/10/2017 22:48, fotl...@smart-games.com wrote:
> The paper describes 20 and 40 block networks, but the section on
> comparison says AlphaGo Zero uses 20 blocks. I think your protobuf
> describes a 40 block network. That's a factor of two 😊

They compared with both, the final 5180 Elo number is for the 40 block
one. For the 20 block one, the numbers stop around 4300 Elo.
See for example:

https://www.reddit.com/r/baduk/comments/77hr3b/elo_table_of_alphago_zero_selfplay_games/

A factor of 2 isn't much, but sure, it seems sensible to start with the
smaller one, given how intractable the problem looks right now.

> Your time looks reasonable when calculating the time to generate the
> 29M games at about 10 seconds per move. This is only the time to
> generate the input data. Do you have an estimate of the additional
> time it takes to do the training? It's probably small in comparison,
> but it might not be.

So far I've assumed that it's zero, because it can happen in parallel
and the time to generate the self-play games dominates. From the revised
hardware estimates, we can also see that the training machines used 64
GPUs, which is a lot smaller than the 1500+ TPU estimate for the
self-play machines.

Training on the GTX 1080 Ti does 4 batches of 32 positions per second.
They use 2048 position batches, and train for 1000 batches before
checkpointing. So the GTX can produce a checkpoint every 4.5 hours [1].
Testing that over 400 games takes 8.5 days (400 x 200 x 9.3s).

So again, it totally bottlenecks on playing games, not on training. At
least, if the improvement is big, one needn't play the 400 games out,
but SPRT termination can be used.

[1] To be honest, this seems very fast - even starting from 0 such a big
network barely advances in 1000 iterations (or I misinterpreted a
training parameter). But I guess it's important to have a very fast -
learn knowledge - use new knowledge - feedback cycle.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-22 Thread Gian-Carlo Pascutto
On 21/10/2017 14:21, David Ongaro wrote:
> I understand that DeepMind might be unable to release the source code
> of AlphaGo due to policy or licensing reasons, but it would be great
> (and probably much more valuable) if they could release the fully
> trained network.

The source of AlphaGo Zero is really of zero interest (pun intended). It
can be obtained by ripping out ~50% of the Ray/Rn or AQ code (everything
related to MC playouts) and some minimal changes to evaluate the same
network for scoring and policy. Same for Leela.

It's literally possible to have a "Leela/Ray/AQ Zero" in a week or so
(it'll require a GPU or performance will be atrocious).

Of course, I can't give you the trained network to load into it. That'll
take another 88642 weeks.

So yes, the database of 29M self-play games would be immensely more
valuable. (Probably like the last 5M or so is fine, too). I prefer the
games over the network - with the games it's easier to train a smaller
network that gives better results on PC's that don't have 4 TPUs in them.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-24 Thread Gian-Carlo Pascutto
On 23-10-17 10:39, Darren Cook wrote:
>> The source of AlphaGo Zero is really of zero interest (pun intended).
> 
> The source code is the first-hand account of how it works, whereas an
> academic paper is a second-hand account. So, definitely not zero use.

This should be fairly accurate:

https://github.com/gcp/leela-zero

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-25 Thread Gian-Carlo Pascutto
On 24-10-17 23:10, Xavier Combelle wrote:
> How is it a fair comparison if there is only 3 days of training for Zero ?
> Master had longer training no ?

In the graph you can see that the 20-block Zero training had already
started to flatten off.

Of course predicting past the end of the graph is prone to error, but it
does seem somewhat unlikely it would make it past?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Gian-Carlo Pascutto
On 25-10-17 05:43, Andy wrote:
> Gian-Carlo, I didn't realize at first that you were planning to create a
> crowd-sourced project. I hope this project can get off the ground and
> running!
> 
> I'll look into installing this but I always find it hard to get all the
> tool chain stuff going.

I will provide pre-made packages for common operating systems. Right now
we (Jonathan Roy is helping with the server) are exploring what's
possible for such a crowd-sourced effort, and testing the server. I'll
provide an update here when there's something to play with.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-25 Thread Gian-Carlo Pascutto
On 25-10-17 16:00, Petr Baudis wrote:

>> The original paper has the value they used. But this likely needs tuning. I
>> would tune with a supervised network to get started, but you need games for
>> that. Does it even matter much early on? The network is random :)
> 
>   The network actually adapts quite rapidly initially, in my experience.
> (Doesn't mean it improves - it adapts within local optima of the few
> games it played so far.)

Yes, but once there's structure, you can tune the parameter with CLOP or
whatever.

>   Yes, but why wouldn't you want that randomness in the second or third
> move?

You only need to play a different move at the root in order for the game
to deviate.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Gian-Carlo Pascutto
On 25-10-17 17:57, Xavier Combelle wrote:
> Is there some way to distribute learning of a neural network ?

Learning as in training the DCNN, not really unless there are high
bandwidth links between the machines (AFAIK - unless the state of the
art changed?).

Learning as in generating self-play games: yes. Especially if you update
the network only every 25 000 games.

My understanding is that this task is much more bottlenecked on game
generation than on DCNN training, until you get quite a bit of machines
that generate games.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Gian-Carlo Pascutto
On 26-10-17 10:55, Xavier Combelle wrote:
> It is just wild guesses  based on reasonable arguments but without
> evidence.

David Silver said they used 40 layers for AlphaGo Master. That's more
evidence than there is for the opposite argument that you are trying to
make. The paper certainly doesn't talk about a "small" and a "big" Master.

You seem to be arguing from a bunch of misreadings and
misunderstandings. For example, Figure 3 in the paper shows the Elo plot
for the 20 block/40 layer version, and it compares to Alpha Go Lee, not
Alpha Go Master. The Alpha Go Master line would be above the flattening
part of the 20 block/40 layer AlphaGo Zero. I guess you missed this when
you say that they "only mention it to compare on kifu prediction"?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

2017-10-26 Thread Gian-Carlo Pascutto
On 25-10-17 16:00, Petr Baudis wrote:
> That makes sense.  I still hope that with a much more aggressive 
> training schedule we could train a reasonable Go player, perhaps at
> the expense of worse scaling at very high elos...  (At least I feel 
> optimistic after discovering a stupid bug in my code.)

By the way, a trivial observation: the initial network is random, so
there's no point in using it for playing the first batch of games. It
won't do anything useful until it has run a learning pass on a bunch of
"win/loss" scored games and it can at least tell who is the likely
winner in the final position (even if it mostly won't be able to make
territory at first).

This suggests that bootstrapping probably wants 500k starting games with
just random moves.

FWIW, it does not seem easy to get the value part of the network to
converge in the dual-res architecture, even when taking the appropriate
steps (1% weighting on error, strong regularizer).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-26 Thread Gian-Carlo Pascutto
On 26-10-17 15:55, Roel van Engelen wrote:
> @Gian-Carlo Pascutto
> 
> Since training uses a ridiculous amount of computing power i wonder
> if it would be useful to make certain changes for future research,
> like training the value head with multiple komi values
> <https://arxiv.org/pdf/1705.10701.pdf>

Given that the game data will be available, it will be trivial for
anyone to train a different network architecture on the result and see
if they get better results, or a program that handles multiple komi
values, etc.

The problem is getting the *data*, not the training.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Gian-Carlo Pascutto
Figure 6 has the same graph as Figure 3 but for 40 blocks. You can compare
the Elo.

On Thu, Oct 26, 2017, 23:35 Xavier Combelle 
wrote:

> Unless I mistake figure 3 shows the plot of supervised learning to
> reinforcement learning, not 20 bloc/40 block
>
> For searching mention of the 20 blocks I search for 20 in the whole
> paper and did not found any other mention
>
> than of the kifu thing.
>
>
> Le 26/10/2017 à 15:10, Gian-Carlo Pascutto a écrit :
> > On 26-10-17 10:55, Xavier Combelle wrote:
> >> It is just wild guesses  based on reasonable arguments but without
> >> evidence.
> > David Silver said they used 40 layers for AlphaGo Master. That's more
> > evidence than there is for the opposite argument that you are trying to
> > make. The paper certainly doesn't talk about a "small" and a "big"
> Master.
> >
> > You seem to be arguing from a bunch of misreadings and
> > misunderstandings. For example, Figure 3 in the paper shows the Elo plot
> > for the 20 block/40 layer version, and it compares to Alpha Go Lee, not
> > Alpha Go Master. The Alpha Go Master line would be above the flattening
> > part of the 20 block/40 layer AlphaGo Zero. I guess you missed this when
> > you say that they "only mention it to compare on kifu prediction"?
> >
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 

GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-27 Thread Gian-Carlo Pascutto
On 27-10-17 00:33, Shawn Ligocki wrote:
> But the data should be different for different komi values, right? 
> Iteratively producing self-play games and training with the goal of 
> optimizing for komi 7 should converge to a different optimal player 
> than optimizing for komi 5.

For the policy (head) network, yes, definitely. It makes no difference
to the value (head) network.

> But maybe having high quality data for komi 7 will still save a lot
> of the work for training a komi 5 (or komi agnostic) network?

I'd suspect so.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] November KGS bot tournament

2017-10-27 Thread Gian-Carlo Pascutto
On 26-10-17 09:43, Nick Wedd wrote:
> Please register by emailing me at mapr...@gmail.com
> , with the words "KGS Tournament Registration"
> in the email title.
> With the falling interest in these events since the advent of AlphaGo,
> it is likely that this will be the last of the series of KGS bot
> tournaments.

Thank you for organizing them for so long!

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Zero is weaker than Master!?

2017-10-27 Thread Gian-Carlo Pascutto
On 27-10-17 10:15, Xavier Combelle wrote:
> Maybe I'm wrong but both curves for alphago zero looks pretty similar
> except than the figure 3 is the zoom in of figure 6

The blue curve in figure 3 is flat at around 60 hours (2.5 days). In
figure 6, at 2.5 days the line is near vertical. So it is not a zoom.

Maybe this can help you:
https://www.reddit.com/r/baduk/comments/77hr3b/elo_table_of_alphago_zero_selfplay_games/

Note the huge Elo advantage of the 20 blocks version early on (it can
learn faster, but stalls out faster).

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:07, Imran Hendley wrote:
> Am I understanding this correctly?

Yes.

It's possible they had in-betweens or experimented with variations at
some point, then settled on the simplest case. You can vary the
randomness if you define it as a softmax with varying temperature,
that's harder if you only define the policy as select best or select
proportionally.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero Loss

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:08, Petr Baudis wrote:
> Hi!
> 
> Does anyone knows why the AlphaGo team uses MSE on [-1,1] as the
> value output loss rather than binary crossentropy on [0,1]?  I'd say
> the latter is way more usual when training networks as typically
> binary crossentropy yields better result, so that's what I'm using
> in https://github.com/pasky/michi/tree/nnet for the time being, but
> maybe I'm missing some good reason to use MSE instead?

Not that I know of. You can certainly get some networks to converge
better by using cross-entropy over MSE.

Maybe it's related to the nature of the errors? More avoidance of the
output being entirely wrong? Or habit? MSE is generally preferred for
regression-like problems, but you can argue whether a go position is
being regressed to some winrate%, or to win/loss...

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-10 Thread Gian-Carlo Pascutto
On 10/11/2017 1:47, Petr Baudis wrote:

>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.

How many filters per layer?

FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
the initial AlphaGo. Given the amount of games you have, and the size of
the board, I would not be surprised if your neural net program is
"outbooking" the opponent by remembering the sequences rather than
learning more generic things.

(But hey, outbooking is learning too!)

>   * The neural network is updated after _every_ game, _twice_, on _all_
> positions plus 64 randomly sampled positions from the entire history,
> this all done four times - on original position and the three
> symmetry flips (but I was too lazy to implement 90\deg rotation).

The reasoning being to give a stronger and faster reinforcement with the
latest data?

>   * Value function is trained with cross-entropy rather than MSE,
> no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> the annealing is reset time by time due to manual restarts of the
> script from a checkpoint).

I never really had good results with Adam and friends compared to SGD
(even momentum does not always help - but of course it's much faster
early on).

>   * No resign auto-threshold but it is important to play 25% games
> without resigning to escale local "optima".

This makes sense because both sides will miscount in exactly the same way.

>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
> 1500 games when the network improved significantly (see below) were
> run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.

The simulation number if one of the big black boxes in this setup, I
think. If the policy network does not have a strong opinion yet, it
seems that one has to make it sufficiently bigger than the amount of
legal moves. If not, first-play-urgency will cause every successor
position to be evaluated and there's no look ahead, which means MCTS
can't discover anything.

So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
not 1200 etc.

With only 50-ish moves to consider on 7x7, it's interesting that you see
a big improvement by making it (relatively) much larger than DeepMind did.

But uh, you're not simply matching it against GNUGo with more
simulations are you? I mean it would be quite normal to win more when
searching deeper.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] A question on source code of leela-zero

2017-11-15 Thread Gian-Carlo Pascutto
On 13-11-17 02:06, Chao wrote:
> Hello, all,
> 
> I have a question from the code of leela-zero:
> 
> https://github.com/gcp/leela-zero
> 
> In UCTSearch.cpp, function play_simulation:
> 
> When we have two consecutive passes to end the game, the final node (a
> second pass) will not create any new child, and the result.valid() will
> be always return false. In this case, all of the parent nodes during the
> MCTS process will get invalid result, and hence their node status will
> not be updated.
> 
> My question is how can we update the node status here if we always get
> invalid result after game finished?

This was properly filed as an issue on github:
https://github.com/gcp/leela-zero/issues/38

And it turns out that, despite the behavior being intended, this person
was right to call it out as problematic!

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

2017-11-15 Thread Gian-Carlo Pascutto
On 11-11-17 00:58, Petr Baudis wrote:
>>>   * The neural network is updated after _every_ game, _twice_, on _all_
>>> positions plus 64 randomly sampled positions from the entire history,
>>> this all done four times - on original position and the three
>>> symmetry flips (but I was too lazy to implement 90\deg rotation).
>>
>> The reasoning being to give a stronger and faster reinforcement with the
>> latest data?
> 
> Yes.

One thing I wonder about, given the huge size of the network and the
strong reinforcement, don't you get total overfitting?

I guess the next few games will quickly "point out" the overfit, but I
still wonder whether keeping the overfit under control wouldn't be
better rather than the see-sawing this would seem to cause.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] what is reachable with normal HW

2017-11-15 Thread Gian-Carlo Pascutto
On 15-11-17 10:51, Petri Pitkanen wrote:
> I think the intereseting question left now is: How strong GO-program one
> can have in normal Laptop? TPU and GPU are fine for showing what can be
> done but as practical tool for a go player the bot  has to run something
> people can afford. And can buy from shop? From KGS 100 list I can spot
> 8d bots but I do not know how big HW they are using. 
> 
> Could todays laptop with best possible SW beat best humans?

What does "best possible SW" mean? The one that isn't written yet? :-)

Zero was reportedly very strong with 4 TPU. If we say 1 TPU = 1 GTX 1080
Ti the Elo loss from the slowdown from 4 to 1 would still make it far
stronger than the best humans.

As for things that are available right now:

The latest Zen is very strong even without a GPU.

My bot was 8d on KGS with a GTX 1080 Ti and a Ryzen 1700 (roughly ~1000
USD hardware). I don't run on KGS anymore but people from Tygem told me
it is equal to lower ranked pros there, on a smaller system.

You can get very strong software right now, but nothing will change the
fact that better hardware always helps. The difference between a laptop
and a desktop with a real GPU will always be there.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16/11/2017 16:43, Petr Baudis wrote:
> But now, we expand the nodes literally all the time, breaking the 
> stationarity possibly in drastic ways.  There are no reevaluations
> that would improve your estimate.

First of all, you don't expect the network evaluations to drastically
vary between parent and children, unless there are tactics that you are
not understanding.

Secondly, the evaluations are rather noisy, so averaging still makes sense.

Third, evaluating with a different rotation effectively forms an
ensemble that improves the estimate.

> Therefore, can't we take the next step, and do away with MCTS?  Is 
> there a theoretical viewpoint from which it still makes sense as the
> best policy improvement operator?

People have posted results with that on this list and IIRC programs
using regular alpha-beta were weaker.

As for a theoretical viewpoint: the value net is an estimation of the
value of some fixed amount of Monte Carlo rollouts.

> What would you say is the current state-of-art game tree search for 
> chess?  That's a very unfamiliar world for me, to be honest all I
> really know is MCTS...

The same it was 20 year ago, alpha-beta. Though one could certainly make
the argument that an alpha-beta searcher using late move reductions
(searching everything but the best moves less deeply) is searching a
tree of a very similar shape as an UCT searcher with a small exploration
constant.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Gian-Carlo Pascutto
On 16-11-17 18:15, "Ingo Althöfer" wrote:
> Something like MCTS would not work in chess, because in
> contrast to Go (and Hex and Amazons and ...) Chess is
> not a "game with forward direction".

Ingo, I think the reason Petr brought the whole thing up is that AlphaGo
Zero uses "MCTS" but it does not actually use Monte Carlo Playouts. I
think your condition about "forward direction" only applies to the
randomized playouts, yes? A neural network evaluation on the other hand
is very much like a classical static chess evaluation.

There are publications about Parallel Randomized Best First Search in
chess, read them and notice how it compares to MCTS.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 17-11-17 02:15, Hideki Kato wrote:
> Stephan K: 
> :
>> 2017-11-16 17:37 UTC+01:00, Gian-Carlo Pascutto :
>>> Third, evaluating with a different rotation effectively forms an
>>> ensemble that improves the estimate.
>>
>> Could you expand on that? I understand rotating the board has an
>> impact for a neural network, but how does that change anything for a
>> tree search? Or is it because the monte carlo tree search relies on
>> the policy network?
> 
> The author of AQ told me that majority-voting based on the 
> orientation (4 or 8) makes AQ stronger.

How do you majority vote with 4 (or 8...) real numbers?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Is MCTS needed?

2017-11-17 Thread Gian-Carlo Pascutto
On 16-11-17 18:24, Stephan K wrote:
> 2017-11-16 17:37 UTC+01:00, Gian-Carlo Pascutto :
>> Third, evaluating with a different rotation effectively forms an
>> ensemble that improves the estimate.
> 
> Could you expand on that? I understand rotating the board has an
> impact for a neural network, but how does that change anything for a
> tree search? Or is it because the monte carlo tree search relies on
> the policy network?

It was a response to the statement "There are no reevaluations that
would improve your estimate."

Consider a quiet position where the tree search wouldn't reveal any
tactics. Normally, searching deeper won't give an immediate benefit. But
because of the rotations, the value network's score is improved from a
single estimate to an ensemble.

In chess/alpha-beta terms, the quiescence search resolves the tactics
(if any), so running it again with part of the tactics resolved would
produce the same score. But with value nets, this is not entirely true.

> Could it be possible to train a value net using only the results of
> already finished games, rather than monte carlo rollouts?

Isn't this how it works already?

> My (extremely vague and possibly fallacious) understanding of the
> situation was that monte carlo tree search was less effective for
> chess because of the more sudden changes there might be when
> evaluating chess positions. For instance, a player with an apparently
> lesser position might actually be a few moves away from a checkmate
> (or just from a big gain), which might be missed by the monte carlo
> tree search because it depends on one particular branch of the tree.

Life and death and capture races behave the same. The inability of MCTS
to switch to a new PV instantly isn't necessarily very different from
the requirement in chess that all moves are searched to an equal
(nominal!) depth. In practical alpha-beta implementations, failing high
on a new best move requires a re-search as well.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 03-12-17 17:57, Rémi Coulom wrote:
> They have a Q(s,a) term in their node-selection formula, but they
> don't tell what value they give to an action that has not yet been
> visited. Maybe Aja can tell us.

FWIW I already asked Aja this exact question a bit after the paper came
out and he told me he cannot answer questions about unpublished details.

This is not very promising regarding reproducibility considering the AZ
paper is even lighter on them.

Another issue which is up in the air is whether the choice of the number
of playouts for the MCTS part represents an implicit balancing between
self-play and training speed. This is particularly relevant if the
evaluation step is removed. But it's possible even DeepMind doesn't know
the answer for sure. They had a setup, and they optimized it. It's not
clear which parts generalize.

(Usually one wonders about such things in terms of algorithms, but here
one wonders about it in terms of hardware!)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 03-12-17 17:57, Rémi Coulom wrote:
> They have a Q(s,a) term in their node-selection formula, but they
> don't tell what value they give to an action that has not yet been
> visited. Maybe Aja can tell us.

FWIW I already asked Aja this exact question a bit after the paper came
out and he told me he cannot answer questions about unpublished details.

This is not very promising regarding reproducibility considering the AZ
paper is even lighter on them.

Another issue which is up in the air is whether the choice of the number
of playouts for the MCTS part represents an implicit balancing between
self-play and training speed. This is particularly relevant if the
evaluation step is removed. But it's possible even DeepMind doesn't know
the answer for sure. They had a setup, and they optimized it. It's not
clear which parts generalize.

(Usually one wonders about such things in terms of algorithms, but here
one wonders about it in terms of hardware!)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 06-12-17 11:47, Aja Huang wrote:
> All I can say is that first-play-urgency is not a significant 
> technical detail, and what's why we didn't specify it in the paper.

I will have to disagree here. Of course, it's always possible I'm
misunderstanding something, or I have a program bug that I'm mixing up
with this.

Or maybe you mean that you expect the program to improve regardless of
this setting. In any case, I've now seen people state here twice that
this is detail that doesn't matter. But practical results suggest otherwise.

For a strong supervised network, FPU=0 (i.e. not exploring all successor
nodes for a longer time, relying strongly on policy priors) is much
stronger. I've seen this in Leela Zero after we tested it, and I've
known it to be true from regular Leela for a long time. IIRC, the strong
open source Go bots also use some form of progressive widening, which
produces the same effect.

For a weak RL network without much useful policy priors, FPU>1 is much
stronger than FPU=0.

Now these are relative scores of course, so one could argue they don't
affect the learning process. But they actually do that as well!

The new AZ paper uses MCTS playouts = 800, and plays proportionally
according to MCTS output. (Previous AGZ had playouts = 1600,
proportional for first 30 moves).

Consider what this means for the search probability outputs, exactly the
thing the policy network has to learn. With FPU=1, the move
probabilities are much more uniform, and the moves played are
consequentially much more likely to be bad or even blunders, because
there are less playouts that can be spent on the best move, even if it
was found.

> The initial value of Q is not very important because Q+U is
> dominated by the U piece when the number of visits is small.

a = Q(s, a) + coeff * P(s,a) * (sqrt(parent->visits) / 1.0f +
child->visits());

Assume parent->visits = 100, sqrt = 10
Assume child->visits = 0
Assume P(s, a) = 0.0027 (near uniform prior for "weak" network)

The right most side of this (U term) is ~1. This clearly does not
dominate the Q term. If Q > 1 (classic FPU) then every child node will
get expanded. If Q = 0 (Q(s, a) = 0) then the first picked child
(largest policy prior) will get something like 10 expansions before
another child gets picked. That's a massive difference in search tree
shape, *especially* with only 800 total playouts.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 18:57, Darren Cook wrote:
>> Mastering Chess and Shogi by Self-Play with a General Reinforcement
>> Learning Algorithm
>> https://arxiv.org/pdf/1712.01815.pdf
> 
> One of the changes they made (bottom of p.3) was to continuously update
> the neural net, rather than require a new network to beat it 55% of the
> time to be used. (That struck me as strange at the time, when reading
> the AlphaGoZero paper - why not just >50%?)

I read that as a simple way of establishing confidence that the result
was statistically significant > 0. (+35 Elo over 400 games - I don't
know by hearth how large the typical error margin of 400 games is, but I
think it won't be far off!)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 19:48, Xavier Combelle wrote:
> Another result is that chess is really drawish, at the opposite of shogi

We sort-of knew that, but OTOH isn't that also because the resulting
engine strength was close to Stockfish, unlike in other games?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 21:19, Petr Baudis wrote:
> Yes, that also struck me.  I think it's good news for the community
> to see it reported that this works, as it makes the training process
> much more straightforward.  They also use just 800 simulations,
> another good news.  (Both were one of the first tradeoffs I made in
> Nochi.)

The 800 simulations are a setting that works over all 3 games. It's not
necessarily as good for 19x19 Go (more legal moves than the other games,
so less deep trees).

As for both the lack of testing and this parameter, someone has remarked
on github that the DeepMind hardware is fixed, so this also represents a
tuning between the speed of the learning machine and the speed of the
self-play machines.

In my experience, just continuing to train the network further (when no
new data is batched in) often regresses the performance by 200 or more
Elo. So it's not clear this step is *entirely* ignorable unless you have
already tuned the speed of the other two aspects.

> Another interesting tidbit: they use the TPUs to also generate the 
> selfplay games.

I think this was already known.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-07 Thread Gian-Carlo Pascutto
On 03-12-17 21:39, Brian Lee wrote:
> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get
> explored again - at least until all 362 top-level children have been
> explored and revealed to have negative values. So without initializing Q
> to the parent's Q, you would end up wasting 362 MCTS iterations.

Note that the same argument could be made for making it 0, which some
people think the AGZ paper implies, so the above can't be the entire
explanation.

That said, empirical testing indicates that initializing Q(s, a) to the
parent is indeed a well performing setting for both strong and weak
policy networks.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Gian-Carlo Pascutto
On 7/12/2017 13:20, Brian Sheppard via Computer-go wrote:
> The conversation on Stockfish's mailing list focused on how the
> match was imbalanced.

Which is IMHO missing the point a bit ;-)

> My concern about many of these points of comparison is that they 
> presume how AZ scales. In the absence of data, I would guess that AZ 
> gains much less from hardware than SF. I am basing this guess on two 
> known facts. First is that AZ did not lose a game, so the upper
> bound on its strength is perfection. Second, AZ is a knowledge
> intensive program, so it is counting on judgement to a larger
> degree.

What about the data point that AlphaGo Zero gained 2100 Elo from its
tree search? In a game commonly considered less tactical?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nvidia Titan V!

2017-12-08 Thread Gian-Carlo Pascutto
On 08-12-17 09:29, Rémi Coulom wrote:
> Hi,
> 
> Nvidia just announce the release of their new GPU for deep learning: 
> https://www.theverge.com/2017/12/8/16750326/nvidia-titan-v-announced-specs-price-release-date
>
>  "The Titan V is available today and is limited to two per
> customer."
> 
> $2,999, 110 TFLOPS!

You can test Voltas on AWS, the prices are very acceptable.

I had problems getting good convergence with fp16 training, even taking
into account all the tricks in NVIDIA's "mixed precision learning"
document and using the respective NVIDIA-caffe branches. It worked for
the policy network, but not for the value network.

You only get 110 TFLOPS when using the mixed precision fp16 into fp32
accumulator matrix multipliers from the Tensor Cores, otherwise it's not
so different from a 1080 Ti in speed. It has a lot of cores, but the
clock-speed is much lower.

I also had the impression that using the Tensor Cores disables the
Winograd transform, perhaps due to accuracy issues? So you lose a factor
~3 in speedup.

Things to consider before plunking down 3000 USD :-)

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Project Leela Zero

2018-01-07 Thread Gian-Carlo Pascutto
On 30/12/2017 10:31, mic wrote:
> I would like to have a non-GPU version of the WINDOWS-program of LeelaZ
> to be able to run it on my good old machine.
> -Michael.

This is now available:
https://github.com/gcp/leela-zero/releases/tag/v0.10

But note that playing strength and performance are very closely coupled
to your system performance. Lacking a GPU will make the program slower
and hence weaker.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MiniGo open sourced

2018-01-30 Thread Gian-Carlo Pascutto
On 30-01-18 02:50, Brian Lee wrote:
> We're not aiming for a top-level Go AI; we're merely aiming for a 
> correct, very readable implementation of the AlphaGoZero algorithm

I had a look around to see how you resolved what I'd consider the
ambiguities in the original paper:
https://github.com/gcp/leela-zero/issues/785

> Of course, in the end, strength is the best way to tell that your 
> implementation is correct :)

In other words, do not take "correct" as necessarily meaning "matching
the published research".

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MiniGo open sourced

2018-01-30 Thread Gian-Carlo Pascutto
On 30-01-18 20:59, Álvaro Begué wrote:
> Chrilly Donninger's quote was probably mostly true in the 90s, but
> it's now obsolete. That intellectual protectionism was motivated by
> the potential economic profit of having a strong engine. It probably
> slowed down computer chess for decades, until the advent of strong
> open-source programs. Paradoxically, when the economic incentive to
> create strong engines was removed, we saw an explosion in strength.

There still seems to be an economic incentive to improve [1] strong
engines and try to sell them.

It should be noted that until Stockfish came along, open source computer
chess engines were a graveyard where every strong enough engine just got
cloned or plagiarized and real enduring cooperation was essentially
nonexistent. You just had 10 non-cooperating forks (some closed source,
and some allegedly commercial ones) that added <-20 ... >+100 Elo.

There had been open source engines as early as GNUChess (or probably
earlier...), and very strong ones like Fruit.

I don't know for sure what allowed Stockfish to (mostly) escape the same
fate. Right now I would say fishtest is a huge factor, but it might've
been doing fine before that.

[1] I originally wrote "create" here but that might not be correct.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread Gian-Carlo Pascutto
On 13-02-18 16:05, "Ingo Althöfer" wrote:
> Hello,
> 
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
> 
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?

I treat draw the same as a 50% win-rate score. Works well enough, don't
really see what advantages treating it separately would give.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-05 Thread Gian-Carlo Pascutto
On 02-03-18 17:07, Dan wrote:
> Leela-chess is not performing well enough 

I don't understand how one can say that given that they started with the
random network last week only and a few clients. Of course it's bad!
That doesn't say anything about the approach.

Leela Zero has gotten strong but it has been learning for *months* with
~400 people. It also took a while to get to 30 kyu.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-03-05 Thread Gian-Carlo Pascutto
On 28-02-18 07:13, Rémi Coulom wrote:
> Hi,
> 
> I have just connected the newest version of Crazy Stone to CGOS. It
> is based on the AlphaZero approach.

In that regard, are you still using Monte Carlo playouts?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

2018-03-05 Thread Gian-Carlo Pascutto
On 5/03/2018 10:54, Dan wrote:
> I believe this is a problem of the MCTS used and not due
> to for lack of training. 
> 
> Go is a strategic game so that is different from chess that is full of
> traps.     

Does the Alpha Zero result not indicate the opposite, i.e. that MCTS is
workable?

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-03-05 Thread Gian-Carlo Pascutto
On 5/03/2018 12:28, valky...@phmp.se wrote:
> Remi twittered more details here (see the discussion with gghideki:
> 
> https://twitter.com/Remi_Coulom/status/969936332205318144

Thank you. So Remi gave up on rollouts as well. Interesting "difference
of opinion" there with Zen.

Last time I tested this in regular Leela, playouts were beneficial, but
this was before combined value+policy nets and much more training data
was available. I do not know what the current status would be.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall the
> result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but
after tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the
exploration term by the policy prior forever. In the original formula,
it is a diminishing term.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] PUCT formula

2018-03-09 Thread Gian-Carlo Pascutto
On 09-03-18 18:03, Brian Sheppard via Computer-go wrote:

> I am guessing that Chenjun and Martin decided (or knew) that the AGZ
> paper was incorrect and modified the equation accordingly.
> 

I doubt it's just the paper that was incorrect, given that the formula
has been given without log already in the original Alpha Go Lee Sedol paper.

Of course it would be funny if it was a mistake and just got copy pasted.

I never tried "fixing" the formula, I just tried the original (with
priors), and what they gave, and strength was rather similar.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Leela Zero on 9x9

2018-04-30 Thread Gian-Carlo Pascutto
There has been some discussion whether value networks can "work" on 9x9
and whether the bots can beat the best humans.

While I don't expect this to resolve the discussion, Leela Zero now tops
the CGOS 9x9 list. This seems to be entirely the work of a single user
who has ran 3.2M self-play games on a single GPU over the course of 3
months. He has made the resulting weight file available.

https://github.com/gcp/leela-zero/issues/1291

There was an interesting trick done with switching komi, which you can
read about above.

FWIW, BayesElo suggests there may have been another bot who is very
close in strength, but the name "Maximus_160B_512F" is rather suggestive
this is also a DCNN based bot...

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Message by Facebook AI group

2018-05-04 Thread Gian-Carlo Pascutto
On 3/05/2018 5:24, "Ingo Althöfer" wrote:
> Hello,
> 
> in the German computer go forum a link to this message by the
> Facebook AI Research group was posted: 
> https://research.fb.com/facebook-open-sources-elf-opengo/

FYI, we were able to convert the Facebook network into Leela Zero
format, which should make it a lot easier to play against or test with.

https://github.com/gcp/leela-zero/releases
https://github.com/gcp/leela-zero/issues/1329

> I think this action will speed up "the" development.

Depends on what "the" is, I guess.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Message by Facebook AI group

2018-05-05 Thread Gian-Carlo Pascutto
On 5/05/2018 7:30, "Ingo Althöfer" wrote:
> It was meant from the viewpoint of an
> outside observer/commentator.
> 
> In Germany we have a proverb:
> "Konkurrenz belebt das Geschaeft."
> Roughly translated:
> "Competition enlivens the bbusiness."

So does cooperation.

Thanks to Facebook for making (so far part of) the data public. They
have soundly beaten some of their competition in this regard.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AI Ryusei 2018 result

2018-12-18 Thread Gian-Carlo Pascutto
On 17/12/18 01:53, Hiroshi Yamashita wrote:
> Hi,
> 
> AI Ryusei 2018 was held on 15,16th December in Nihon-kiin, Japan.
> 14 programs played preliminary swiss 7 round, and top 6 programs
>  played round-robin final. Then, Golaxy won.
> 
> Result
> https://www.igoshogi.net/ai_ryusei/01/en/result.html

It appears the 2nd place finisher after Golaxy was a hybrid of Rn and
Leela Zero, using rollouts to compensate for Leela's network being
trained with the "wrong" komi for this competition:

https://github.com/zakki/Ray/issues/171#issuecomment-447637052
https://img.igoshogi.net/ai_ryusei/01/data/11.pdf

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] GCP passing on the staff ...

2019-01-29 Thread Gian-Carlo Pascutto
On 29/01/19 11:23, Petri Pitkanen wrote:
> Just purely curiosity: How strong is Leela now? googling up gives that
> it is better than best humasn already? Is that true?

The network is over 100 Elo stronger than the second generation of ELF,
which was about 100 Elo stronger than the first generation, which
defeated a set of Korean top professional players 14-0.

Differences in implementation speed will shift the strength difference
around a bit, but not enough to change the conclusion that it's likely a
lot better than the best humans now.

I hear rumors it's not 100% undefeatable, and that with some trial and
error you can occasionally still find a weaknesses to pounce on.

It is used by professionals for analysis, e.g.:
https://lifein19x19.com/viewtopic.php?f=13&t=16074

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] A new ELF OpenGo bot and analysis of historical Go games

2019-02-19 Thread Gian-Carlo Pascutto
On 17/02/19 23:24, Hiroshi Yamashita wrote:
> Hi Ingo,
> 
>> * How strong is the new ELF bot in comparison with Leela-Zero?
> 
> from CGOS BayesElo, new ELF(ELFv2) is about +100 stronger than Leela-Zero.

We ran a test match and ELFv2 lost 34 - 62 against LZ-204 at 1600 visits
each, so that's about +100 Elo in favor of LZ at visits rather than time
parity.

This would mean going 800p -> 400p gives -200 Elo? Seems more than I
would expect.

I think I'd want to do more testing before forming an opinion :-)

> Leela Zero's playout is half. Because its net size is double.
> http://www.yss-aya.com/cgos/19x19/bayes.html

Smaller networks have more overhead (heads etc are fixed), so it's
closer to ~60%. Probably depends on the GPU too.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-11 Thread Gian-Carlo Pascutto
On 8/03/19 16:14, David Wu wrote:
> I suspect Leela Zero would come off as far *less* favorable if one
> tried to do such a comparison using their actual existing code rather
> than abstracting down to counting neural net evals, because as far as
> I know in Leela Zero there is no cross-game batching of neural net
> evaluations, which makes a huge difference in the ability to use a
> strong GPU efficiently.

We found that the speedup from batching mostly depended on whether:

- you're using cuDNN (i.e. NVIDIA hardware)
- you're using cards with Tensor Cores
- you're using smaller boards

Not so coincidentally all of these are true for *you*, i.e. a
homogeneous farm of powerful NVIDIA Volta cards where you control the
entire software stack, and are mixing in different sized boards in the
games.

But that's not quite the platform Leela Zero was targeted at, it's
almost the exact opposite :-)

There are extensive benchmarks in the github issues of cuDNN vs TensorRT
vs generic OpenCL performance with varying batch sizes, and you'll see
that for the majority of hardware there wasn't that much to gain by
adding batching. Once Tensor Core support was added to the OpenCL code,
batching immediately made a huge difference (on Volta/RTX at least...)
and was thus merged shortly after.

> Only in the last couple months or so based on
> what I've been seeing in chat and pull requests, Leela Zero
> implemented within-search batching of neural net evals, but clients
> still only play one game at a time.

Playing multiple games at the same time has been supported since very
early on (-g switch in AutoGTP or always by running multiple clients),
it's batching networks over multiple games that wasn't (and still isn't)
implemented in the default client.

I do think it's useful to have, it's just that for Leela Zero this
wasn't - and probably still isn't - very important compared to
everything else.

Anyway, I agree and note this is all completely tangential to the
question of "extra computation overhead" for the changes, which should
be negligible.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

<    1   2   3