[ 
https://issues.apache.org/jira/browse/BROOKLYN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271197#comment-14271197
 ] 

Aled Sage commented on BROOKLYN-115:
------------------------------------

Looking at the VM, we had too many open files so it was not letting us ssh. We 
saw this via `lsof` and checking `uname -a`.

The reason for too many open files was a bug in the async-exec (that I said was 
unconnected!). It was leaving a bunch of `tail` commands running from each of 
the polls, so we ended up with 1000ish processes running, each with open 
file(s).

> ssh failure repeatedly (out of retries: Timeout expired) on health-check 
> after successfully provisoining jboss in CentOS on vcloud-director
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BROOKLYN-115
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-115
>             Project: Brooklyn
>          Issue Type: Bug
>    Affects Versions: 0.7.0-SNAPSHOT
>            Reporter: Aled Sage
>
> I successfully provisioned a Jboss7 entity to a CentOS 6.4 VM in VMware's 
> vcloud air (running vcloud-director 5.5, going over NAT).
> However, several hours after the VM was provisioned it went on-fire because 
> the check-running script (which goes over ssh to ensure the script is still 
> running) began to fail repeatedly. Note this is using the new "async exec", 
> but that is unconnected because it is *every* ssh command that fails rather 
> than the long-poll.
> {noformat}
> Caused by: brooklyn.util.internal.ssh.SshException: ([email protected]:11955) 
> ([email protected]:11955) error acquiring Shell(command=[[touch 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
>  /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid, ( 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh > 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stdout 
> 2> 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.stderr < 
> /dev/null ; echo $? > 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.exitstatus
>  ) & disown, echo $! > 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.pid, 
> RESULT=$?, echo Executing async 
> /tmp/brooklyn-20150109-063334594-LtUy-check-running_JBoss7ServerImpl.sh, exit 
> $RESULT]]) (attempt 1/1, in time 1m/2m); out of retries: Timeout expired
>         at 
> brooklyn.util.internal.ssh.SshAbstractTool.propagate(SshAbstractTool.java:169)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:584) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:537) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         at brooklyn.util.internal.ssh.sshj.SshjTool$3.run(SshjTool.java:354) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool.execScriptAsyncAndPoll(SshjTool.java:478)
>  ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:320) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:83)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:167)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:1)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at brooklyn.util.pool.BasicPool.exec(BasicPool.java:147) 
> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.location.basic.SshMachineLocation.execSsh(SshMachineLocation.java:495)
>  ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.location.basic.SshMachineLocation$11.execWithTool(SshMachineLocation.java:635)
>  ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:165)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.util.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:81)
>  ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.location.basic.SshMachineLocation.execScript(SshMachineLocation.java:628)
>  ~[patch-ssh-longpolling-retry.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.entity.basic.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:322)
>  ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         at 
> brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:363)
>  ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>         ... 8 common frames omitted
> Caused by: net.schmizz.sshj.connection.ConnectionException: Timeout expired
>         at 
> net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:32)
>  ~[sshj-0.8.1.jar:na]
>         at 
> net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:26)
>  ~[sshj-0.8.1.jar:na]
>         at net.schmizz.concurrent.Promise.retrieve(Promise.java:139) 
> ~[sshj-0.8.1.jar:na]
>         at net.schmizz.concurrent.Event.await(Event.java:103) 
> ~[sshj-0.8.1.jar:na]
>         at 
> net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
>  ~[sshj-0.8.1.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:932)
>  ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         at 
> brooklyn.util.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:551) 
> ~[patch-ssh-longpolling-retry.jar:na]
>         ... 23 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: Timeout expired
>         ... 29 common frames omitted
> {noformat}
> Trying to ssh manually to the VM (from the Brooklyn VM), I get:
> {noformat}
> [amp@AMP amp]$ ssh [email protected] -p 11955
> Connection to 23.92.230.21 closed by remote host.
> Connection to 23.92.230.21 closed.
> {noformat}
> From that Brooklyn VM, I can successfully ssh to the box as a different user 
> though (which uses password rather than ssh key).
> From my mac laptop, it is simiilar(ish). I can ssh as the different user, but 
> when I try as the user amp I get a slightly different error:
> {noformat}
> Aleds-MacBook-Pro:vchs-ssh-hangs-20141222 aled $ssh -i 
> ~/.ssh/id_rsa-canopy-tai-server [email protected] -p 11955
> Write failed: Broken pipe
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to