subject:"\[SPAM\] Re\: \[computer\-go\] Re\: A cluster version of Zen is running on cgos 19x19"

Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-25 Thread Olivier Teytaud

 In your (or Sylvain's?) recent paper, you wrote less than one second
 interval was useless.  I've observed similar.  I'm now evaluating the
 performance with 0.2, 0.4, 1 and 4 second intervals for 5 second per
 move setting on 19x19 board on 32 nodes of HA8000 cluster.


Yes, one second is fine for 5 seconds per move.
Maybe you can check if you have a linear speed-up if you artificially
simulate a
zero communication time ?
My guess is that the communication time should not be a trouble, but if you
don't use MPI, maybe there's something in your implementation of
communications ?

By the way, a cluster parallelization in MPI can be developped very quickly
and
MPI is efficient - mpi_all_reduce has a computational cost logarithmic in
the number of nodes.

Good luck,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-25 Thread Hideki Kato

Olivier Teytaud: aa5e3c330911250005v1d434a5bj8a09067a620ef...@mail.gmail.com:
 In your (or Sylvain's?) recent paper, you wrote less than one second
 interval was useless.  I've observed similar.  I'm now evaluating the
 performance with 0.2, 0.4, 1 and 4 second intervals for 5 second per
 move setting on 19x19 board on 32 nodes of HA8000 cluster.


Yes, one second is fine for 5 seconds per move.
Maybe you can check if you have a linear speed-up if you artificially
simulate a
zero communication time ?
My guess is that the communication time should not be a trouble, but if you
don't use MPI, maybe there's something in your implementation of
communications ?

Hmm, I think my communication code is not a trouble.

By the way, a cluster parallelization in MPI can be developped very quickly
and
MPI is efficient - mpi_all_reduce has a computational cost logarithmic in
the number of nodes.

Even if the sum-up is done in a logarithmic time (with binary tree 
style), the collecting time of all infomation from all nodes is 
proportional to the number of nodes if the master node has few 
communication ports, isn't it?

MPI is a best choice for dedicated HPC clusters, I agree.  It forces, 
however, several constraints such as each node cannot be unplugged or 
plugged during operation.  MPI cannot be installed some computers with 
not-so-common operating systems or small computers with not enough 
memory, such as game cosoles.  I just want freer parallel and 
distributed computing environment for MCTS than MPI.  My code is now 
running on a mini pc cluster at my home.  I don't want to install MPI 
to my computers :).

By the way, have you experimented not averaging but just adding sceme?  
When I tested that my code had some bugs and no success.

Good luck,

Thanks,

Hideki
--
g...@nue.ci.i.u-tokyo.ac.jp (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-25 Thread Olivier Teytaud

 Even if the sum-up is done in a logarithmic time (with binary tree
 style), the collecting time of all infomation from all nodes is
 proportional to the number of nodes if the master node has few
 communication ports, isn't it?


No (unless I misunderstood what you mean, sorry in that case!) !
Use a tree of nodes, to agregate informations, and everything is
logarithmic. This is implicitly done in MPI.

If you have 8 nodes A, B, C, D, E, F, G, H,
then
(i) first layer
A and B send information to B
C and D send information to D
E and F send information to F
G and H send information to H
 (ii) second layer
B and D send information to D
F and H send information to H
(iii) third layer
D and H send information to H

then do the same in the reverse order so that the cumulated information is
sent back to all nodes.


 By the way, have you experimented not averaging but just adding sceme?
 When I tested that my code had some bugs and no success.


Yes, we have tested. Surprisingly, no significant difference. But I don't
know
if this would still hold today, as we have some pattern-based exploration.
For a code with a score almost only depending on percentages, it's not
surprising that averaging and summing are equivalent.

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-25 Thread Hideki Kato

Olivier Teytaud: aa5e3c330911250119x5e01fa32w2e5f3db68704d...@mail.gmail.com:
 Even if the sum-up is done in a logarithmic time (with binary tree
 style), the collecting time of all infomation from all nodes is
 proportional to the number of nodes if the master node has few
 communication ports, isn't it?


No (unless I misunderstood what you mean, sorry in that case!) !
Use a tree of nodes, to agregate informations, and everything is
logarithmic. This is implicitly done in MPI.

If you have 8 nodes A, B, C, D, E, F, G, H,
then
(i) first layer
A and B send information to B
C and D send information to D
E and F send information to F
G and H send information to H
 (ii) second layer
B and D send information to D
F and H send information to H
(iii) third layer
D and H send information to H

then do the same in the reverse order so that the cumulated information is
sent back to all nodes.

Interesting, surely the order is almost logarithmic.  But how long it 
takes a packet to pass through a layer.  I'm afraid the actual delay 
time may increase.

 By the way, have you experimented not averaging but just adding sceme?
 When I tested that my code had some bugs and no success.


Yes, we have tested. Surprisingly, no significant difference. But I don't
know
if this would still hold today, as we have some pattern-based exploration.
For a code with a score almost only depending on percentages, it's not
surprising that averaging and summing are equivalent.

Simple adding has an advantage that no synchronization to sum-up all 
statstical numbers of all computers is required and so the time from 
sending a statistics packet to receiving  adding it to the root node 
will be reduced.This advantage, however, may not be effective in 
MPI environments because the number of packets inceases from N to N^2 
if real (ie. using UDP) broadcasting is not used.  It's not so 
surprising that there was no significant difference in MPI 
environments.  Ah, if the tree structure is used to broadcast packets, 
things may vary.

Thaks a lot,
Hideki
--
g...@nue.ci.i.u-tokyo.ac.jp (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-25 Thread Olivier Teytaud



 Interesting, surely the order is almost logarithmic.  But how long it
 takes a packet to pass through a layer.  I'm afraid the actual delay
 time may increase.


With gigabit ethernet my humble opinion is that you should have no problem.
But, testing what happens if you artificially cancel the time of the
messages might confirm/infirm this. If you have troubles due to the
communication
time, I'm sure you can optimize it.

MPI provides plenty of well done primitives for encoding communications.
Except if you need very precise optimization, it's not worth working
directly with sockets.
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-24 Thread Olivier Teytaud


 The performance gap is perhaps due to the algorithms.  Almost all
 cluster versions of current strong programs (MoGo, MFG, Fuego and Zen)
 use root parallel while shared memory computers allow us to use thread
 parallelism, which gives better performance.


I think you should not have troubles with your networks, at least with
the number of machines you are considering.

Perhaps you should increase a little the time between two communications ?
With something like mpi_all_reduce for averaging the statistics over all the
tree at each communication, more than 3 or 4 communications per second
is useless. Averaging statistics in nodes with less than 5% of the total
number of simulations might be useless also.

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

2009-11-24 Thread Hideki Kato

Thank you Oliver,

Olivier Teytaud: aa5e3c330911242304tc6b9e1bk466b1f08cb65d...@mail.gmail.com:

 The performance gap is perhaps due to the algorithms.  Almost all
 cluster versions of current strong programs (MoGo, MFG, Fuego and Zen)
 use root parallel while shared memory computers allow us to use thread
 parallelism, which gives better performance.


I think you should not have troubles with your networks, at least with
the number of machines you are considering.

Perhaps you should increase a little the time between two communications ?
With something like mpi_all_reduce for averaging the statistics over all the
tree at each communication, more than 3 or 4 communications per second
is useless. Averaging statistics in nodes with less than 5% of the total
number of simulations might be useless also.

In your (or Sylvain's?) recent paper, you wrote less than one second 
interval was useless.  I've observed similar.  I'm now evaluating the 
performance with 0.2, 0.4, 1 and 4 second intervals for 5 second per 
move setting on 19x19 board on 32 nodes of HA8000 cluster.

Though I have not enough games yet, current best is 1 second interval 
which improves about 400 Elo in self-play.  Then, why we have similar 
experiments with different implementations of root parallelism, based 
on different programs and on different clusters?  I don't use MPI for 
the cluster version of Zen. Zen's playouts are slower than MoGo's. 
Etc...  One second is a mysterious time :(.

Hideki
--
g...@nue.ci.i.u-tokyo.ac.jp (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

Re: [SPAM] Re: [computer-go] Re: A cluster version of Zen is running on cgos 19x19

7 matches

Site Navigation

Mail list logo

Footer information