On 05/31/2016 07:29 PM, Dietmar Maurer wrote: >> Further another problem would still be open if we tried to patch the >> SSH Forward method we currently use - which we solve for free with >> the approach of this patch - namely the problem that the method >> to get an available port (next_migration_port) has a serious race >> condition > Why is there a race condition exactly? If so, we have to fix that.
It's not directly in next_unused_port as this is flock'ed, but if the program which requests a port needs to long to open it, it may be seen as timeout-ed in next_unused_port and another program gets assigned the same port, then both may try to open/connect to it. As we did not have the SSH options ExitOnForwardFailure enabled the second migrations ssh tunnel trying to bind to the local port did not failed when it couldn't and qemu the writes also to a port where it gets a connection refused (as the other migration is running on it). This may give also troubles for other programs using (indirectly) next_unused_port, but at least the race condition should trigger really seldom, I'll look into a way to fix that after the v3 of this patches. _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel