[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
perhaps the [count*] notation means: repeat the line count times i.e.: hd1 hd1 hd2 hd2 : no time to dig or test yet, so this is just another guess. On Wednesday, October 28, 2015 at 9:36:08 PM UTC+2, Ismael VC wrote: > > Hello everyone, > > I have succesfully added all nodes and I can init julia like this: > > [root@hd0 ~]# julia -p 2 --machinefile Beowulf >_ >_ _ _(_)_ | A fresh approach to technical computing > (_) | (_) (_)| Documentation: http://docs.julialang.org >_ _ _| |_ __ _ | Type "help()" for help. > | | | | | | |/ _` | | > | | |_| | | | (_| | | Version 0.3.11 (2015-07-27 06:18 UTC) > _/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release > |__/ | x86_64-unknown-linux-gnu > > > julia> nprocs() > 22 > > > julia> nworkers() > 21 > > > julia> > > > Where Beowulf file is like this: > > hd1 > hd2 > hd3 > hd4 > hd5 > hd6 > hd7 > hd8 > hd9 > hd10 > hd11 > hd12 > hd13 > hd14 > hd15 > hd16 > hd17 > hd18 > hd19 > > If I change it to: > > 2 hd1 > 2 hd2 > 2 hd3 > 2 hd4 > 2 hd5 > 2 hd6 > 2 hd7 > 2 hd8 > 2 hd9 > 2 hd10 > 2 hd11 > 2 hd12 > 2 hd13 > 2 hd14 > 2 hd15 > 2 hd16 > 2 hd17 > 2 hd18 > 2 hd19 > > > > I get the same error I mentioned: > > [root@hd0 ~]# julia -p 2 --machinefile Beowulf2 > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ssh: connect to host 2 port 22: Invalid argument > ^CERROR: interrupt > in match at ./regex.jl:119 > in parse_connection_info at multi.jl:1090 > in read_worker_host_port at multi.jl:1037 > in read_cb_response at multi.jl:1015 > in start_cluster_workers at multi.jl:1027 > in addprocs_internal at multi.jl:1234 > in addprocs at multi.jl:1244 > in process_options at ./client.jl:240 > in _start at ./client.jl:354 > [root@hd0 ~]# > > > > El viernes, 25 de septiembre de 2015, 16:42:59 (UTC-5), Ismael VC escribió: > > Hello everyone! > > I am trying to set up a Julia cluster with 20 nodes, this is the very > first time I've tried something like this. I have looked around for > examples, but documentation is not very helpful for me: > > *Julia can be started in parallel mode with either the -p or > the --machinefile options. -p n will launch an additional n worker > processes, while --machinefile file will launch a worker for each line in > file file. The machines defined in file must be accessible via a > passwordless ssh login, with Julia installed at the same location as the > current host. Each machine definition takes the > form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to > current user, port to the standard ssh port. count is the number of workers > to spawn on the node, and defaults to 1. The > optional bind-to bind_addr[:port] specifies the ip-address and port that > other workers should use to connect to this worker.* > > This is what I think I have understood so far: > > Ok I list the machines on a machine file, that's easy, I have a file like > this: > > n user@555.555.555.555 > n user@555.555.555.556 > n user@555.555.555.555 > > > *The machines defined in file must be accessible via a > passwordless ssh login,* > > This is the part that is difficult for me the most, it says that machines > must be accesible via paswordless ssh > > * with Julia installed at the same location as the current host.* > > I understand this as I need to install Julia en every node in the same > locat > > ...
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
Thank you Seth, the count arg is not supported in 0.3.x, I'll update shortly to 0.4.x El miércoles, 28 de octubre de 2015, 13:42:55 (UTC-6), Seth escribió: > > > On Wednesday, October 28, 2015 at 10:20:00 AM UTC-7, Ismael VC wrote: >> >> How can I start 2 workers on each node, using Julia 0.3.11? >> >> [count*][user@]host[:port] [bind_addr[:port]] >> >> The way I understand: >> >> [count*][user@]host[:port] [bind_addr[:port]] >> >> Is that `count` is an integer while `*` means zero or more repetitions in >> REGEX lang, >> at first it seems it doesn't need a space character between the count and >> the `user@host`, >> but I have tried several forms and it doesn't work: >> > > I don't think your interpretation is correct. I think the "*" is syntax > for "(this many) times". Did you try appending an asterisk after the > number? That is, "2* user@host "? >
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
Thank you very much Greg that worked! :D El miércoles, 28 de octubre de 2015, 13:31:53 (UTC-6), Greg Plowman escribió: > > On v0.3 try multiple entries (lines) in machine file, one for each worker.
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
On Wednesday, October 28, 2015 at 10:20:00 AM UTC-7, Ismael VC wrote: > > How can I start 2 workers on each node, using Julia 0.3.11? > > [count*][user@]host[:port] [bind_addr[:port]] > > The way I understand: > > [count*][user@]host[:port] [bind_addr[:port]] > > Is that `count` is an integer while `*` means zero or more repetitions in > REGEX lang, > at first it seems it doesn't need a space character between the count and > the `user@host`, > but I have tried several forms and it doesn't work: > I don't think your interpretation is correct. I think the "*" is syntax for "(this many) times". Did you try appending an asterisk after the number? That is, "2* user@host "?
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
Hello everyone, I have succesfully added all nodes and I can init julia like this: [root@hd0 ~]# julia -p 2 --machinefile Beowulf _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "help()" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.11 (2015-07-27 06:18 UTC) _/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release |__/ | x86_64-unknown-linux-gnu julia> nprocs() 22 julia> nworkers() 21 julia> Where Beowulf file is like this: hd1 hd2 hd3 hd4 hd5 hd6 hd7 hd8 hd9 hd10 hd11 hd12 hd13 hd14 hd15 hd16 hd17 hd18 hd19 If I change it to: 2 hd1 2 hd2 2 hd3 2 hd4 2 hd5 2 hd6 2 hd7 2 hd8 2 hd9 2 hd10 2 hd11 2 hd12 2 hd13 2 hd14 2 hd15 2 hd16 2 hd17 2 hd18 2 hd19 I get the same error I mentioned: [root@hd0 ~]# julia -p 2 --machinefile Beowulf2 ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ^CERROR: interrupt in match at ./regex.jl:119 in parse_connection_info at multi.jl:1090 in read_worker_host_port at multi.jl:1037 in read_cb_response at multi.jl:1015 in start_cluster_workers at multi.jl:1027 in addprocs_internal at multi.jl:1234 in addprocs at multi.jl:1244 in process_options at ./client.jl:240 in _start at ./client.jl:354 [root@hd0 ~]# El viernes, 25 de septiembre de 2015, 16:42:59 (UTC-5), Ismael VC escribió: > > Hello everyone! > > I am trying to set up a Julia cluster with 20 nodes, this is the very > first time I've tried something like this. I have looked around for > examples, but documentation is not very helpful for me: > > *Julia can be started in parallel mode with either the -p or > the --machinefile options. -p n will launch an additional n worker > processes, while --machinefile file will launch a worker for each line in > file file. The machines defined in file must be accessible via a > passwordless ssh login, with Julia installed at the same location as the > current host. Each machine definition takes the > form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to > current user, port to the standard ssh port. count is the number of workers > to spawn on the node, and defaults to 1. The > optional bind-to bind_addr[:port] specifies the ip-address and port that > other workers should use to connect to this worker.* > > This is what I think I have understood so far: > > Ok I list the machines on a machine file, that's easy, I have a file like > this: > > n user@555.555.555.555 > n user@555.555.555.556 > n user@555.555.555.555 > > > *The machines defined in file must be accessible via a > passwordless ssh login,* > > This is the part that is difficult for me the most, it says that machines > must be accesible via paswordless ssh > > * with Julia installed at the same location as the current host.* > > I understand this as I need to install Julia en every node in the same > location, so I have 20 nodes, same software and hardware stacks. Does this > means that the nodes must be of the same operating system? the same bits > (32/64) only? > > Right now I have *20 CentOS 6.7 (64 bits)* nodes with* julia-0.3.11* > installed from the *generic linux binaries (64bits)*, all of them > installed at */opt/julia-0.3.11/bin* (added to the PATH and already > exported in /etc/profile) > > Now the plan in my mind is to use my laptop *(windows 7 64 bits, > julia-0.3.11 64 bits)* as master node and control the cluster with that, > so according to what I understand, I'll need to do (leaving password blank): > > ssh-keygen -t rsa > > > From my Windows laptop (I plan to install Arch Linux soon), in order to > create my ssh key and then: > > > cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys' > > > > To every node? So I have to be running the ssh server at every one of them? > (I understand I'll need it at the master node) This is where I simply don't > understand anymore, I haven't seen any tutorial, or article, or something > like that, just that paragraph in the manual, I know there
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
On v0.3 try multiple entries (lines) in machine file, one for each worker.
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
How can I start 2 workers on each node, using Julia 0.3.11? [count*][user@]host[:port] [bind_addr[:port]] I have a machine file, with only one node (one line), this examples are the ways it works, but adding only one worker per node, I'm using the default port for now and not using a different bind address: - Only host: 555.555.555.555 - User and host: root@555.555.555.555 The way I understand: [count*][user@]host[:port] [bind_addr[:port]] Is that `count` is an integer while `*` means zero or more repetitions in REGEX lang, at first it seems it doesn't need a space character between the count and the `user@host`, but I have tried several forms and it doesn't work: * Use `2` as `count`, separated by space, with `my_file` being either: 2 555.555.555.555 or 2 root@555.555.555.555 [root@example ~]# julia --machinefile my_file ssh: connect to host 2 port 22: Invalid argument It seems to me it tries to use the 2 as the host address :( Could anyone please give me an example off a machine file which specifies the worker count? Thanks in advance, cheers! El viernes, 25 de septiembre de 2015, 16:42:59 (UTC-5), Ismael VC escribió: > > Hello everyone! > > I am trying to set up a Julia cluster with 20 nodes, this is the very > first time I've tried something like this. I have looked around for > examples, but documentation is not very helpful for me: > > *Julia can be started in parallel mode with either the -p or > the --machinefile options. -p n will launch an additional n worker > processes, while --machinefile file will launch a worker for each line in > file file. The machines defined in file must be accessible via a > passwordless ssh login, with Julia installed at the same location as the > current host. Each machine definition takes the > form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to > current user, port to the standard ssh port. count is the number of workers > to spawn on the node, and defaults to 1. The > optional bind-to bind_addr[:port] specifies the ip-address and port that > other workers should use to connect to this worker.* > > This is what I think I have understood so far: > > Ok I list the machines on a machine file, that's easy, I have a file like > this: > > n user@555.555.555.555 > n user@555.555.555.556 > n user@555.555.555.555 > > > *The machines defined in file must be accessible via a > passwordless ssh login,* > > This is the part that is difficult for me the most, it says that machines > must be accesible via paswordless ssh > > * with Julia installed at the same location as the current host.* > > I understand this as I need to install Julia en every node in the same > location, so I have 20 nodes, same software and hardware stacks. Does this > means that the nodes must be of the same operating system? the same bits > (32/64) only? > > Right now I have *20 CentOS 6.7 (64 bits)* nodes with* julia-0.3.11* > installed from the *generic linux binaries (64bits)*, all of them > installed at */opt/julia-0.3.11/bin* (added to the PATH and already > exported in /etc/profile) > > Now the plan in my mind is to use my laptop *(windows 7 64 bits, > julia-0.3.11 64 bits)* as master node and control the cluster with that, > so according to what I understand, I'll need to do (leaving password blank): > > ssh-keygen -t rsa > > > From my Windows laptop (I plan to install Arch Linux soon), in order to > create my ssh key and then: > > > cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys' > > > > To every node? So I have to be running the ssh server at every one of them? > (I understand I'll need it at the master node) This is where I simply don't > understand anymore, I haven't seen any tutorial, or article, or something > like that, just that paragraph in the manual, I know there is > ClusterManagers.jl but that sounds even more complicated for me right now. > > > I also want to help David Sanders to set up another cluster (once I got this > figured out) in his lab at Science Faculty, UNAM. I promise to enhance the > documentation around this topic once I understand this. > > > What do you guys think, do I have it all wrong? > > > If anyone can help me, I'll be very grateful, thank's in advance! > >
[julia-users] Re: Is there a tutorial on how to set up my own Julia cluster?
A Julia cluster is just a cluster with Julia installed on all the nodes. One way of achieving this is to create a cluster using PelicanHPC, and then do one of: 1) install julia in the /home/user directory, for example, by compiling from source. This directory is NFS shared by all nodes, when the cluster is set up. or 2) run apt-get install julia on all the nodes. A PHPC cluster is a reasonable solution for a single user. I used to develop it, and used it for a number of years on a 4 node cluster. It's Debian-based. On Friday, September 25, 2015 at 11:42:59 PM UTC+2, Ismael VC wrote: > > Hello everyone! > > I am trying to set up a Julia cluster with 20 nodes, this is the very > first time I've tried something like this. I have looked around for > examples, but documentation is not very helpful for me: > > *Julia can be started in parallel mode with either the -p or > the --machinefile options. -p n will launch an additional n worker > processes, while --machinefile file will launch a worker for each line in > file file. The machines defined in file must be accessible via a > passwordless ssh login, with Julia installed at the same location as the > current host. Each machine definition takes the > form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to > current user, port to the standard ssh port. count is the number of workers > to spawn on the node, and defaults to 1. The > optional bind-to bind_addr[:port] specifies the ip-address and port that > other workers should use to connect to this worker.* > > This is what I think I have understood so far: > > Ok I list the machines on a machine file, that's easy, I have a file like > this: > > n user@555.555.555.555 > n user@555.555.555.556 > n user@555.555.555.555 > > > *The machines defined in file must be accessible via a > passwordless ssh login,* > > This is the part that is difficult for me the most, it says that machines > must be accesible via paswordless ssh > > * with Julia installed at the same location as the current host.* > > I understand this as I need to install Julia en every node in the same > location, so I have 20 nodes, same software and hardware stacks. Does this > means that the nodes must be of the same operating system? the same bits > (32/64) only? > > Right now I have *20 CentOS 6.7 (64 bits)* nodes with* julia-0.3.11* > installed from the *generic linux binaries (64bits)*, all of them > installed at */opt/julia-0.3.11/bin* (added to the PATH and already > exported in /etc/profile) > > Now the plan in my mind is to use my laptop *(windows 7 64 bits, > julia-0.3.11 64 bits)* as master node and control the cluster with that, > so according to what I understand, I'll need to do (leaving password blank): > > ssh-keygen -t rsa > > > From my Windows laptop (I plan to install Arch Linux soon), in order to > create my ssh key and then: > > > cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys' > > > > To every node? So I have to be running the ssh server at every one of them? > (I understand I'll need it at the master node) This is where I simply don't > understand anymore, I haven't seen any tutorial, or article, or something > like that, just that paragraph in the manual, I know there is > ClusterManagers.jl but that sounds even more complicated for me right now. > > > I also want to help David Sanders to set up another cluster (once I got this > figured out) in his lab at Science Faculty, UNAM. I promise to enhance the > documentation around this topic once I understand this. > > > What do you guys think, do I have it all wrong? > > > If anyone can help me, I'll be very grateful, thank's in advance! > >