[
https://issues.apache.org/jira/browse/WHIRR-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135890#comment-13135890
]
Alex Heneveld commented on WHIRR-410:
-------------------------------------
I'm currently seeing ERROR messages intermittently but they seem to be benign
(chef tests passing), using default 32-bit Ubuntu ami-ab36fbc2 us-east small.
(I think this should be classed as a bug in jclouds, described more below, but
not serious.
Part of me suspects there are other issues with these Ubuntu images because
I've seen more worrying failures (timeouts a couple minutes in to setup) but I
can't isolate them and they might be my own fault (in some cases I know they
are my own fault, whirr picking up active servers from previous test runs).
As for jclouds default image selection strategy -- _give me some decent
standard recent Linux_ -- I like it and the querying is really useful when I
care. To guarantee image reliability my preference would be to see jclouds
maintain and publish a list of favourite images for popular OS/arch combos at
popular providers+locations, and to support looking these up at runtime (and
also maybe caching in ~/.jclouds could be faster than current query strategy?).
Would be interesting to know if jclouds users would like that.
*Benign ERROR Messages*
The worrying but benign messages I'm seeing are below.
Based on {{auth.log}} on the server, I think these are due to sshj connection
attempts while the server has a race: {{sshd}} started but the host key not set
up ({{auth.log}} shows {{error: Could not load host key:
/etc/ssh/ssh_host_rsa_key}} and same for {{dsa}}), causing {{sshd}} to break
the transport, and sshj to complain loudly. These should probably be logged as
<=INFO in jclouds, suppressing sshj's eagerness to log ERROR.
{code:title=terminal}
2011-10-26 11:54:14,606 ERROR [net.schmizz.sshj.transport.TransportImpl]
(reader) Dying because - net.schmizz.sshj.transport.TransportException: Broken
transport; encountered EOF
2011-10-26 11:54:14,607 ERROR [net.schmizz.concurrent.Promise] (user thread 3)
<<kex done>> woke to: net.schmizz.sshj.transport.TransportException: Broken
transport; encountered EOF
2011-10-26 11:54:14,632 WARN [jclouds.ssh] (user thread 3) <<
(ubuntu:rsa[fingerprint(e8:73:76:c8:b7:3a:22:dc:64:f1:10:60:c9:bc:a9:2d),sha1(ca:30:55:b6:ba:34:83:78:9d:15:2c:a1:c9:20:4e:4f:37:4a:8a:d4)]@107.22.32.196:22)
error acquiring SSHClient(timeout=60000): Broken transport; encountered EOF
net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF
at net.schmizz.sshj.transport.Reader.run(Reader.java:70)
...
{code}
The above doesn't end up in any of the {{test-data/}} logs (using default whirr
test logging), but I do see:
{code:title=jclouds.log}
2011-10-26 11:54:14,634 DEBUG
[org.jclouds.http.handlers.BackoffLimitedRetryHandler] (user thread 3) Retry
1/7: delaying for 200 ms:
(ubuntu:rsa[fingerprint(e8:73:76:c8:b7:3a:22:dc:64:f1:10:60:c9:bc:a9:2d),sha1(ca:30:55:b6:ba:34:83:78:9d:15:2c:a1:c9:20:4e:4f:37:4a:8a:d4)]@107.22.32.196:22)
error acquiring SSHClient(timeout=60000): Broken transport; encountered EOF
2011-10-26 11:54:15,099 DEBUG
[org.jclouds.http.handlers.BackoffLimitedRetryHandler] (user thread 3) Retry
2/7: delaying for 800 ms:
(ubuntu:rsa[fingerprint(e8:73:76:c8:b7:3a:22:dc:64:f1:10:60:c9:bc:a9:2d),sha1(ca:30:55:b6:ba:34:83:78:9d:15:2c:a1:c9:20:4e:4f:37:4a:8a:d4)]@107.22.32.196:22)
error acquiring SSHClient(timeout=60000): Broken transport; encountered EOF
2011-10-26 11:54:16,184 DEBUG
[org.jclouds.http.handlers.BackoffLimitedRetryHandler] (user thread 3) Retry
3/7: delaying for 1800 ms:
(ubuntu:rsa[fingerprint(e8:73:76:c8:b7:3a:22:dc:64:f1:10:60:c9:bc:a9:2d),sha1(ca:30:55:b6:ba:34:83:78:9d:15:2c:a1:c9:20:4e:4f:37:4a:8a:d4)]@107.22.32.196:22)
error acquiring SSHClient(timeout=60000): Broken transport; encountered EOF
2011-10-26 11:54:18,343 DEBUG
[org.jclouds.http.handlers.BackoffLimitedRetryHandler] (user thread 3) Retry
4/7: delaying for 2000 ms:
(ubuntu:rsa[fingerprint(e8:73:76:c8:b7:3a:22:dc:64:f1:10:60:c9:bc:a9:2d),sha1(ca:30:55:b6:ba:34:83:78:9d:15:2c:a1:c9:20:4e:4f:37:4a:8a:d4)]@107.22.32.196:22)
error acquiring SSHClient(timeout=60000): Broken transport; encountered EOF
{code}
We could probably do with cleaner logging in jclouds as there's no clue what
the sshj is doing (e.g. machine status, script, or script status check) and
that would be fantastically useful.
(Will raise this issue at jclouds once we're clear what is jclouds, what is
whirr, and what is ubuntu/amazon.)
> Review automatic image selection
> --------------------------------
>
> Key: WHIRR-410
> URL: https://issues.apache.org/jira/browse/WHIRR-410
> Project: Whirr
> Issue Type: Bug
> Reporter: Andrei Savu
> Fix For: 0.7.0
>
> Attachments: WHIRR-410.patch
>
>
> While I was testing WHIRR-400 I have noticed that the ZooKeeper integration
> tests are failing on aws-ec2 with the automatically selected AMI but they are
> working as expected with the Amazon Linux AMI. The tests are also working as
> expected on cloudservers-us. This makes me think the failure is not related
> to our code changes and we should look for an external factor as the root
> cause.
> As part of this issue we should think about how to improve the automatic AMI
> selection mechanism in order to make it more robust and less likely to fail
> due to AMI upgrades and other external changes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira