[
https://issues.apache.org/jira/browse/LIBCLOUD-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Müller updated LIBCLOUD-532:
-----------------------------------
Description:
h2. Observed behaviour:
When I'm starting EC2 nodes with {{deploy_node(ssh_key=...)}} I occationally
(about 50% of the time) get a an error message indicating that my key is not a
valid DSA key.
This seems a bit odd, since I'm using an RSA key.
h2. Cause
Turns out the cause is somewhere else:
When starting a node, there is a short time during which the SSH daemon is
already up and running, but the public-key has not yet been put into the
`authorized_keys` file. Apparently the SSH daemon is started before Amazon's
key-injection magic has finished.
During this short time (I'd guess about a second) SSH is rejecting the private
key, with an authentication error.
libcloud then tries some other means of authentication during which it
apparently tries to parse the key as a DSA key, causing the reported error.
Note that the extra-long timeout used for the SSH connection attempt is not
helping in this case, since the SSH server is replying already.
h2. Suggested Fix
I suggest to react to a failed authentication with a few retries, with a second
or two delay between them. Similarly to {{wait_until_running()}}.
h2. Workaround
{code}
deploy_node(...,ssh_alternate_usernames=["root" for _ in range(10)])
{code}
This causes libcloud to make several authentification attempts. It is slow
enough to delay until the public-key is in place. Solves the problem reliably,
but not elegantly :)
was:
h2. Observed behaviour:
When I'm starting EC2 nodes with {{deploy_node(ssh_key=...)}} I occationally
(about 50% of the time) get a an error message indicating that my key is not a
valid DSA key.
This seems a bit odd, since I'm using an RSA key.
h2. Cause
Turns out the cause is somewhere else:
When starting a node, there is a short time during which the SSH daemon is
already up and running, but the public-key has not yet been put into the
`authorized_keys` file. Apparently the SSH daemon is started before Amazon's
key-injection magic has finished.
During this short time (I'd guess about a second) SSH is rejecting the private
key, with an authentication error.
libcloud then tries some other means of authentication during which it
apparently tries to parse the key as a DSA key, causing the reported error.
Note that the extra-long timeout used for the SSH connection attempt is not
helping in this case, since the SSH server is replying already.
h3. Suggested Fix
I suggest to react to a failed authentication with a few retries, with a second
or two delay between them. Similarly to {{wait_until_running()}}.
h3. Workaround
{code}
deploy_node(...,ssh_alternate_usernames=["root" for _ in range(10)])
{code}
This causes libcloud to make several authentification attempts. It is slow
enough to delay until the public-key is in place. Solves the problem reliably,
but not elegantly :)
> deploy_node(..) occasionally fails on EC2
> -----------------------------------------
>
> Key: LIBCLOUD-532
> URL: https://issues.apache.org/jira/browse/LIBCLOUD-532
> Project: Libcloud
> Issue Type: Bug
> Components: Compute
> Environment: apache-libcloud 0.14.1, Windows 7
> Reporter: Stefan Müller
>
> h2. Observed behaviour:
> When I'm starting EC2 nodes with {{deploy_node(ssh_key=...)}} I occationally
> (about 50% of the time) get a an error message indicating that my key is not
> a valid DSA key.
> This seems a bit odd, since I'm using an RSA key.
> h2. Cause
> Turns out the cause is somewhere else:
> When starting a node, there is a short time during which the SSH daemon is
> already up and running, but the public-key has not yet been put into the
> `authorized_keys` file. Apparently the SSH daemon is started before Amazon's
> key-injection magic has finished.
> During this short time (I'd guess about a second) SSH is rejecting the
> private key, with an authentication error.
> libcloud then tries some other means of authentication during which it
> apparently tries to parse the key as a DSA key, causing the reported error.
> Note that the extra-long timeout used for the SSH connection attempt is not
> helping in this case, since the SSH server is replying already.
> h2. Suggested Fix
> I suggest to react to a failed authentication with a few retries, with a
> second or two delay between them. Similarly to {{wait_until_running()}}.
> h2. Workaround
> {code}
> deploy_node(...,ssh_alternate_usernames=["root" for _ in range(10)])
> {code}
> This causes libcloud to make several authentification attempts. It is slow
> enough to delay until the public-key is in place. Solves the problem
> reliably, but not elegantly :)
--
This message was sent by Atlassian JIRA
(v6.2#6252)