[
https://issues.apache.org/jira/browse/BROOKLYN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271197#comment-14271197
]
Aled Sage commented on BROOKLYN-115:
------------------------------------
Looking at the VM, we had too many open files so it was not letting us ssh. We
saw this via `lsof` and checking `uname -a`.
The reason for too many open files was a bug in the async-exec (that I said was
unconnected!). It was leaving a bunch of `tail` commands running from each of
the polls, so we ended up with 1000ish processes running, each with open
file(s).
> ssh failure repeatedly (out of retries: Timeout expired) on health-check
> after successfully provisoining jboss in CentOS on vcloud-director
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BROOKLYN-115
> URL: https://issues.apache.org/jira/browse/BROOKLYN-115
> Project: Brooklyn
> Issue Type: Bug
> Affects Versions: 0.7.0-SNAPSHOT
> Reporter: Aled Sage
>
> I successfully provisioned a Jboss7 entity to a CentOS 6.4 VM in VMware's
> vcloud air (running vcloud-director 5.5, going over NAT).
> However, several hours after the VM was provisioned it went on-fire because
> the check-running script (which goes over ssh to ensure the script is still
> running) began to fail repeatedly. Note this is using the new "async exec",
> but that is unconnected because it is *every* ssh command that fails rather
> than the long-poll.
> {noformat}
> Caused by: brooklyn.util.internal.ssh.SshException: ([email protected]:11955)
> ([email protected]:11955) error acquiring Shell(command=[[touch
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid, (
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh >
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout
> 2>
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr <
> /dev/null ; echo $? >
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
> ) & disown, echo $! >
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid,
> RESULT=$?, echo Executing async
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh, exit
> $RESULT]]) (attempt 1/1, in time 1m/2m); out of retries: Timeout expired
> at
> brooklyn.util.internal.ssh.SshAbstractTool.propagate(SshAbstractTool.java:169)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:584)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:537)
> ~[patch-ssh-longpolling-retry.jar:na]
> at brooklyn.util.internal.ssh.sshj.SshjTool$3.run(SshjTool.java:354)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool.execScriptAsyncAndPoll(SshjTool.java:478)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:320)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:83)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:167)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:1)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at brooklyn.util.pool.BasicPool.exec(BasicPool.java:147)
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.location.basic.SshMachineLocation.execSsh(SshMachineLocation.java:495)
> ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.location.basic.SshMachineLocation$11.execWithTool(SshMachineLocation.java:635)
> ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:165)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:81)
> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.location.basic.SshMachineLocation.execScript(SshMachineLocation.java:628)
> ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.basic.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:322)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> at
> brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:363)
> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
> ... 8 common frames omitted
> Caused by: net.schmizz.sshj.connection.ConnectionException: Timeout expired
> at
> net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:32)
> ~[sshj-0.8.1.jar:na]
> at
> net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:26)
> ~[sshj-0.8.1.jar:na]
> at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
> ~[sshj-0.8.1.jar:na]
> at net.schmizz.concurrent.Event.await(Event.java:103)
> ~[sshj-0.8.1.jar:na]
> at
> net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
> ~[sshj-0.8.1.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:932)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1)
> ~[patch-ssh-longpolling-retry.jar:na]
> at
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:551)
> ~[patch-ssh-longpolling-retry.jar:na]
> ... 23 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: Timeout expired
> ... 29 common frames omitted
> {noformat}
> Trying to ssh manually to the VM (from the Brooklyn VM), I get:
> {noformat}
> [amp@AMP amp]$ ssh [email protected] -p 11955
> Connection to 23.92.230.21 closed by remote host.
> Connection to 23.92.230.21 closed.
> {noformat}
> From that Brooklyn VM, I can successfully ssh to the box as a different user
> though (which uses password rather than ssh key).
> From my mac laptop, it is simiilar(ish). I can ssh as the different user, but
> when I try as the user amp I get a slightly different error:
> {noformat}
> Aleds-MacBook-Pro:vchs-ssh-hangs-20141222 aled $ssh -i
> ~/.ssh/id_rsa-canopy-tai-server [email protected] -p 11955
> Write failed: Broken pipe
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)