Awesome, thanks for looking into it! Cheers, Adam
On 29 Nov 2013 at 02:03:44 , Ole Tange ([email protected]) wrote: > >On Sat, Nov 23, 2013 at 1:19 AM, Ole Tange wrote: >> On Wed, Nov 20, 2013 at 11:11 AM, Adam Lindberg wrote: >>> Running the following command results in a crash >> [...] >>> $ parallel --filter-hosts --controlmaster -j 128 --nonall --tag --slf >>> servers 'ps aux | grep [o]psworks | wc -l’ > >I have worked a bit on this. The problem seems to be that >--filter-hosts starts 4 ssh connections to each server in parallel. If >the connections are proxied through a single machine (e.g. using SSH's >ProxyCommand or ControlMaster) then this single machine's ssh daemon >may be overloaded and reject some ssh connections. > >The problem only arises when there are a lot of machines (e.g. if you >have 3 machines then it will never happen), and only when you do not >connect directly (i.e. no proxy = no problems). > >My experiments show that putting a delay (--delay 0.1) in for every >ssh command makes the problem much smaller. The same is true if the >connections are retried (--retries 3). > >The problem by using these is that it makes --filter-hosts slower, and >if you have many hosts and you connect directly to these hosts, then >you will be paying a price without getting any benefit. > >I have chosen safety over speed, so --filter-hosts ought to work >better now - albeit slower: 0.4 seconds per host + 18 seconds if one >or more hosts are down. > > >/Ole > Cheers, Adam
