Aled Sage created BROOKLYN-115:
----------------------------------
Summary: ssh failure repeatedly (out of retries: Timeout expired)
on health-check after successfully provisoining jboss in CentOS on
vcloud-director
Key: BROOKLYN-115
URL: https://issues.apache.org/jira/browse/BROOKLYN-115
Project: Brooklyn
Issue Type: Bug
Affects Versions: 0.7.0-SNAPSHOT
Reporter: Aled Sage
I successfully provisioned a Jboss7 entity to a CentOS 6.4 VM in VMware's
vcloud air (running vcloud-director 5.5, going over NAT).
However, several hours after the VM was provisioned it went on-fire because the
check-running script (which goes over ssh to ensure the script is still
running) began to fail repeatedly. Note this is using the new "async exec", but
that is unconnected because it is *every* ssh command that fails rather than
the long-poll.
{noformat}
Caused by: brooklyn.util.internal.ssh.SshException: ([email protected]:11955)
([email protected]:11955) error acquiring Shell(command=[[touch
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid, (
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh >
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout 2>
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr <
/dev/null ; echo $? >
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
) & disown, echo $! >
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid,
RESULT=$?, echo Executing async
/tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh, exit
$RESULT]]) (attempt 1/1, in time 1m/2m); out of retries: Timeout expired
at
brooklyn.util.internal.ssh.SshAbstractTool.propagate(SshAbstractTool.java:169)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:584)
~[patch-ssh-longpolling-retry.jar:na]
at brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:537)
~[patch-ssh-longpolling-retry.jar:na]
at brooklyn.util.internal.ssh.sshj.SshjTool$3.run(SshjTool.java:354)
~[patch-ssh-longpolling-retry.jar:na]
at
brooklyn.util.internal.ssh.sshj.SshjTool.execScriptAsyncAndPoll(SshjTool.java:478)
~[patch-ssh-longpolling-retry.jar:na]
at
brooklyn.util.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:320)
~[patch-ssh-longpolling-retry.jar:na]
at
brooklyn.util.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:83)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:167)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:1)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.pool.BasicPool.exec(BasicPool.java:147)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.location.basic.SshMachineLocation.execSsh(SshMachineLocation.java:495)
~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
at
brooklyn.location.basic.SshMachineLocation$11.execWithTool(SshMachineLocation.java:635)
~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:165)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:81)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.location.basic.SshMachineLocation.execScript(SshMachineLocation.java:628)
~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:322)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:363)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
... 8 common frames omitted
Caused by: net.schmizz.sshj.connection.ConnectionException: Timeout expired
at
net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:32)
~[sshj-0.8.1.jar:na]
at
net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:26)
~[sshj-0.8.1.jar:na]
at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
~[sshj-0.8.1.jar:na]
at net.schmizz.concurrent.Event.await(Event.java:103)
~[sshj-0.8.1.jar:na]
at
net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
~[sshj-0.8.1.jar:na]
at
brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:932)
~[patch-ssh-longpolling-retry.jar:na]
at
brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1)
~[patch-ssh-longpolling-retry.jar:na]
at brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:551)
~[patch-ssh-longpolling-retry.jar:na]
... 23 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timeout expired
... 29 common frames omitted
{noformat}
Trying to ssh manually to the VM (from the Brooklyn VM), I get:
{noformat}
[amp@AMP amp]$ ssh [email protected] -p 11955
Connection to 23.92.230.21 closed by remote host.
Connection to 23.92.230.21 closed.
{noformat}
>From that Brooklyn VM, I can successfully ssh to the box as a different user
>though (which uses password rather than ssh key).
>From my mac laptop, it is simiilar(ish). I can ssh as the different user, but
>when I try as the user amp I get a slightly different error:
{noformat}
Aleds-MacBook-Pro:vchs-ssh-hangs-20141222 aled $ssh -i
~/.ssh/id_rsa-canopy-tai-server [email protected] -p 11955
Write failed: Broken pipe
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)