I also meant to ask if you could do a simple `ansible -m ping all` test
(before and after changing the ulimit settings), to see if you still see
slowness with that simple test or if it is directly related to
fact-gathering.

Thanks!

On Mon, Sep 29, 2014 at 10:28 AM, James Cammarata <jcammar...@ansible.com>
wrote:

> Hi Barry,
>
> One thing I did notice when testing your configuration was that, with my
> default ulimit settings, large -f settings were causing similar tracebacks
> and failures. In my case setting `ulimit -u 4096` (may also have to do
> `ulimit -f 4096`) resolved that issue. I noticed this when using the
> "ansible" command vs. "ansible-playbook", the later of which may have been
> hidding the underlying issue.
>
> We are still looking into the host-key checking issue to see if we can
> replicate that.
>
> Thanks!
>
>
> On Wed, Sep 24, 2014 at 2:23 PM, Michael DeHaan <mich...@ansible.com>
> wrote:
>
>> Hmm, curious.
>>
>> Yeah there's not really any extra SSH debug detail in the above.
>>
>> The error in question occurs in two places - one when the pipe slams shut
>> for no good reason, and another when ssh exists with error 255 (aka unknown
>> error).
>>
>> We're still looking into the known hosts awareness question.
>>
>>
>>
>>
>>
>> On Wed, Sep 24, 2014 at 3:17 PM, Barry Morrison <bdmorri...@gmail.com>
>> wrote:
>>
>>> Support Request #2904 has known_hosts file attached to it
>>>
>>> Hopefully this is the pertinent part from failed facts gathering:
>>>
>>> <server1020.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
>>> <server1020.prod.domain> REMOTE_MODULE setup CHECKMODE=True
>>> <server1020.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o',
>>> 'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o',
>>> 'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o',
>>> 'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o',
>>> 'KbdInteractiveAuthentication=no', '-o',
>>> 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey',
>>> '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10',
>>> 'server1020.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
>>> ok: [server88.prod.domain]
>>> <server3033.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
>>> <server3033.prod.domain> REMOTE_MODULE setup CHECKMODE=True
>>> <server3033.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o',
>>> 'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o',
>>> 'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o',
>>> 'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o',
>>> 'KbdInteractiveAuthentication=no', '-o',
>>> 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey',
>>> '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10',
>>> 'server3033.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
>>> fatal: [server4214.prod.domain] => SSH Error: data could not be sent to
>>> the remote host. Make sure this host can be reached over ssh
>>> <server1028.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
>>> <server1028.prod.domain> REMOTE_MODULE setup CHECKMODE=True
>>> <server1028.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o',
>>> 'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o',
>>> 'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o',
>>> 'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o',
>>> 'KbdInteractiveAuthentication=no', '-o',
>>> 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey',
>>> '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10',
>>> 'server1028.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
>>> fatal: [server1020.prod.domain] => SSH Error: data could not be sent to
>>> the remote host. Make sure this host can be reached over ssh
>>>
>>>
>>> specifically server1020.prod.domain
>>>
>>> And I was trying to get the tasks to fail as they had many times earlier
>>> -- nothing failed. Everything worked as expected and completed in 59s. I'm
>>> not convinced its fixed, but it's behaving. I'll poke at it in the AM. It
>>> was too easy to reproduce earlier.
>>>
>>> On Tuesday, September 23, 2014 7:43:12 PM UTC-7, Michael DeHaan wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Sep 23, 2014 at 10:29 PM, Barry Morrison <bdmor...@gmail.com>
>>>> wrote:
>>>>
>>>>> It is not consistent each attempt as far as which hosts fail. On one
>>>>> attempt a server will fail, on the next attempt the same server will not
>>>>> fail and if I attempt to gather facts manually after it failed, it is able
>>>>> to gather facts successfully each time.
>>>>>
>>>>> But here is a host that failed on the last attempt:
>>>>>
>>>>> /usr/bin/ansible server74.prod.domain -m ping -vvvv -c ssh
>>>>> <server74.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
>>>>> <server74.prod.domain> REMOTE_MODULE ping
>>>>> <server74.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o',
>>>>> 'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o',
>>>>> 'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o',
>>>>> 'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o',
>>>>> 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=
>>>>> gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o',
>>>>> 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10',
>>>>> 'server74.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
>>>>> server74.prod.domain | success >> {
>>>>>     "changed": false,
>>>>>     "ping": "pong"
>>>>> }
>>>>>
>>>>
>>>> Ok so this one is successful and SSH debug levels are not helping.  I'm
>>>> going to need to see one that fails, unfortunately.  That may need a
>>>> capture from the long run....
>>>>
>>>>
>>>>>
>>>>>
>>>>> --forks was set to 50 and I saw ~35 hosts fail
>>>>> --forks set to 25, only 5 failed and it ran in 47s
>>>>> --forks set to 15, none failed and it ran in 53s
>>>>> --forks set to 20, none failed and it ran in 45s
>>>>>
>>>>> The above are all with host checking off.
>>>>>
>>>>> Here is another "twist". With --forks passed, if the fact gathering
>>>>> doesn't fail, a task will, and like fact gathering before, it's never the
>>>>> same task that fails.
>>>>>
>>>>
>>>>
>>>> This feels to me like there may be some problem keeping ControlPersist
>>>> sockets open.
>>>>
>>>> One thing to note is they do typically consume about ~1MB per host,
>>>> though at -f 50 this shouldn't be a problem.
>>>>
>>>> Also that version of Ubuntu should be perfectly fine.
>>>>
>>>> I've occasionally heard of issues with network hardware in the way - a
>>>> particularly badly misconfigured switch clamping things down.
>>>>
>>>> In this particular case, once discovered, the user was soon managing
>>>> thousands of nodes at a time.
>>>>
>>>> Though it's hard to say.  More digging is definitely required.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Task fails with: "ssh connection closed waiting for sudo or su
>>>>> password prompt"
>>>>>
>>>>>
>>>>> With host checking on
>>>>>
>>>>> --forks 25 = 10m
>>>>> --forks 50 = 10m
>>>>>
>>>>> FWIW, If I set forks: 50 in /etc/ansible/ansible.cfg -- it still acts
>>>>> as if it is set to 5, only when I pass --forks 50 in the command does it
>>>>> actually seem to run at 50.
>>>>>
>>>>
>>>>
>>>> This is curious.   Possibly a permissions issue on ansible.cfg keeping
>>>> it from being read, or the value out of the right section.
>>>> [defaults] vs [default] or something is possible.
>>>>
>>>> If you can email us the file, I'd be interested in seeing it.
>>>>
>>>> Again, also interested in your known_hosts to try to see if we can tell
>>>> why it might not be detecting that your host is in the file.
>>>>
>>>> That SSH is asking shows it's there, but for some reason Ansible is
>>>> thinking it may need to ask you.
>>>>
>>>> Again, about 65-75% of our users are using these default options vs
>>>> paramiko - and haven't heard this reported recently  - so hope to get to
>>>> the bottom of this.
>>>>
>>>> Help with the above questions and info would be greatly appreciated!
>>>>
>>>>
>>>>>
>>>>> Also, no "old" version of Ansible
>>>>>
>>>>> which ansible
>>>>> /usr/bin/ansible
>>>>>
>>>>> /usr/bin/ansible --version
>>>>> ansible 1.7.1
>>>>>
>>>>> Hope this helps, but fear it may add to the confusion.
>>>>>
>>>>> On Tuesday, September 23, 2014 6:39:47 PM UTC-7, Michael DeHaan wrote:
>>>>>>
>>>>>> "With it commented, no failures, I'm able to communicate with all
>>>>>> servers. "
>>>>>>
>>>>>> This part is a little interesting.
>>>>>>
>>>>>> Turning off host checking and going slow you can talk to all your
>>>>>> hosts.  Going fast you cannot?
>>>>>>
>>>>>> (If this is repeatable, I wonder if maybe you have an SSH jumphost
>>>>>> configured that might be getting overwhelmed?   Or perhaps something
>>>>>> similar on the network?)
>>>>>>
>>>>>> Can I ask what --forks is set to?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 23, 2014 at 9:36 PM, Michael DeHaan <mic...@ansible.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok Barry,
>>>>>>>
>>>>>>> We'll get you sorted before you wander off and lose a limb :)
>>>>>>>
>>>>>>> These things seem to be unrelated.
>>>>>>>
>>>>>>> (A)
>>>>>>>
>>>>>>> This has happened in the past when the host key of a host doesn't
>>>>>>> *appear* to Ansible's ssh.py connection type to be in the known hosts 
>>>>>>> file,
>>>>>>> and it creates a serial lock to ask you the question about whether it
>>>>>>> should be added - but for whatever reason, knew it was actually there.
>>>>>>> The result of this is that --forks is not used on the first task per 
>>>>>>> host,
>>>>>>> which makes things not be parallel.   It's frustrating.
>>>>>>>
>>>>>>> This was fixed long ago, when we added knowledge about hashed
>>>>>>> known_hosts entries, and should be quite good today, especially on a 
>>>>>>> well
>>>>>>> tested OS like 14.04, basically at the top of our test matrix.   
>>>>>>> Finding it
>>>>>>> again now is curious.
>>>>>>>
>>>>>>> I'd worry if something else might be interferring with the lock.
>>>>>>> My first question is if (maybe privately), we could see your known_hosts
>>>>>>> file?
>>>>>>>
>>>>>>> So we're not quite out of that territory yet with host key checking
>>>>>>> on, but I'm still curious about why it may still be doing that.
>>>>>>>
>>>>>>> There may be a slim chance you're actually using an older ansible
>>>>>>> version, or they are hashed weirdly for some reason.
>>>>>>>
>>>>>>> I'll assume this is happening with "-c ssh".
>>>>>>>
>>>>>>> (I'd also be curious if this happens on the development branch, but
>>>>>>> I don't anticipate any changes there)
>>>>>>>
>>>>>>> (B)
>>>>>>>
>>>>>>> On the second question, I'm expecting these 10 hosts are
>>>>>>> consistently doing that between runs, as in the same hosts?
>>>>>>>
>>>>>>> Can I get the result of an /usr/bin/ansible hostname -m ping -vvvv
>>>>>>> -c ssh against one of them?
>>>>>>>
>>>>>>> That will engage SSH debug mode and tell us a little more about what
>>>>>>> may be up.
>>>>>>>
>>>>>>> They could actually be down, but I'm guessing you checked that.
>>>>>>> That being returned extraneously is not expected.
>>>>>>>
>>>>>>> It could also be that ansible_ssh_port or something needs to be set
>>>>>>> in inventory or whatever, and it's not normally set, firewall issues, or
>>>>>>> things like that?
>>>>>>>
>>>>>>> Let's start with the "-vvvv" part.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 23, 2014 at 9:20 PM, Barry Morrison <bdmor...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Oh, FWIW, I'm touching over 350 servers with this playbook and
>>>>>>>> gathering facts from all of them.
>>>>>>>>
>>>>>>>> On Tuesday, September 23, 2014 6:17:53 PM UTC-7, Barry Morrison
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Spawned from Conversation with Michael on Twitter
>>>>>>>>> https://twitter.com/esacteksab/status/514558427217936384
>>>>>>>>>
>>>>>>>>> Uncommenting host_key_checking = False, a playbook runs in 35s
>>>>>>>>> Commenting host_key_checking = False, the playbook runs in 9m25s
>>>>>>>>>
>>>>>>>>> But with it uncommented, ~10% of the servers return: "SSH Error:
>>>>>>>>> data could not be sent to the remote host. Make sure this host can be
>>>>>>>>> reached over ssh"
>>>>>>>>>
>>>>>>>>> With it commented, no failures, I'm able to communicate with all
>>>>>>>>> servers.
>>>>>>>>>
>>>>>>>>> This is a topic for to troubleshoot further, because Twitter and
>>>>>>>>> 140 chars isn't all that great.
>>>>>>>>>
>>>>>>>>> Ansible is 1.7.1 on Ubuntu 14.04
>>>>>>>>> Servers are a combination of Ubuntu 12.04 and 14.04
>>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Ansible Project" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to ansible-proje...@googlegroups.com.
>>>>>>>> To post to this group, send email to ansible...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/ansible-project/78fd3ef2-
>>>>>>>> 1b80-4167-b2f6-99d49569a177%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Ansible Project" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to ansible-proje...@googlegroups.com.
>>>>> To post to this group, send email to ansible...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/ansible-project/819205f6-e110-47d2-a43c-
>>>>> 1b93897322f6%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/ansible-project/819205f6-e110-47d2-a43c-1b93897322f6%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to ansible-project+unsubscr...@googlegroups.com.
>>> To post to this group, send email to ansible-project@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/327689b5-e7d1-4246-84a6-7f395b31fd1f%40googlegroups.com
>>> <https://groups.google.com/d/msgid/ansible-project/327689b5-e7d1-4246-84a6-7f395b31fd1f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Ansible Project" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ansible-project+unsubscr...@googlegroups.com.
>> To post to this group, send email to ansible-project@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ansible-project/CA%2BnsWgz7M1gwcx1e_RDC8Vr%2BuAd96xS5rw5LZMn4AUjEofLmdw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/ansible-project/CA%2BnsWgz7M1gwcx1e_RDC8Vr%2BuAd96xS5rw5LZMn4AUjEofLmdw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-project+unsubscr...@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CAMFyvFh0f0qARoHPbga3iT8GwSzD6iUXdJeA5dezgja-00ajGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to