Thanks for sharing!

For what it's worth the
https://github.com/gocd-contrib/docker-swarm-elastic-agent-plugin ha snot
been released for a long time. It probably works ok with respect to GoCD
interfaces (as these have not changed), but may or may not work correctly
with latest swarm features and likely has outdated dependent libs. While
have been merging some PRs for dependencies the lack of release is partly
because of lack of perceived interest and partly because I don't personally
have experience with swarm to sanity test it or understanding of the
ecosystem since docker's spin offs and such. Worth keeping in mind if you
go this path, rather than, say Kubernetes, plain docker on a single host or
cloud provider plugins (which have had releases).

-Chad

On Thu, 18 May 2023, 08:35 'Hans Dampf' via go-cd, <go-cd@googlegroups.com>
wrote:

> So we basically "fixed" our problem. The problem was mitogen scales awful
> with multiple playbook on at the same time on the same machine. It seems it
> does not support multicore CPU. CPU 1 on every server is always running at
> 100%, we suspect this is because mitogen only uses this one core for its
> calculations, and then it blocks itself. If you can keep the usage of this
> one core below 100% everything seems fine and the acceleration by mitogen
> is noticeable again.
>
> The fix now was to install more server running go-agents and reduce the
> number of go-agents on every server. It's more a workaround, but I don't
> expect mitogen will get ever any bigger updates again.
> Currently, we have 20 servers with 10 go-agent each. To be honest, I think
> 10 agents are still to much and if all 10 are running mitogen will start
> slowing down again.
>
> Next we will try to get the go-agent running in docker-swarm. We hope this
> scales better.
>
> Chad Wilson schrieb am Freitag, 5. Mai 2023 um 08:58:23 UTC+2:
>
>> What is a "workernode" in this context? This isn't GoCD terminology, so
>> it's unclear what this means?
>>
>> GoCD agents simply fork processes to run your tasks within the 'go' user
>> context of the agent process. IIRC the entire "wrapping" environment from
>> the agent process should be propagated to the tasks, so could be
>> differences there depending on how you install and launch your agents.
>>
>> There's not really any magic here, and the server has no role
>> (synchronously) once the agent knows what job needs to be run, and starts
>> cloning/fetching materials and kicking off tasks. You can see what the
>> agent is doing for each job/task in the console log to see where the time
>> is being spent.
>>
>> If the agents are "static" and the jobs create mutable content locally
>> (e.g virtualenvs or other such stuff) you also might want to consider
>> whether you should enable "Clean working directory" on the stage level to
>> ensure a clean state before your jobs' tasks run.
>>
>> Other than that, it seems likely to me that there is some kind of
>> configuration at your host or OS user level (as Ketan hints at) that is
>> affecting mitogen/ansible. Perhaps the way mitogen, ansible or python are
>> installed, something different in the python environment, or some kind of
>> different configuration that is applyied when run via the agent vs via
>> directly on the node (ssh config? mitogen or ansible config?).
>>
>> I'd dump both env and tool config from within a GoCD task and compare
>> between "good" and "bad" setups. There is likely *something* different
>> there in how things are running.
>>
>> -Chad
>>
>>
>> On Fri, May 5, 2023 at 2:07 PM 'Hans Dampf' via go-cd <
>> go...@googlegroups.com> wrote:
>>
>>> Ok did more testing and build a new setup from scratch. As expected, the
>>> performance was very good.
>>> Then we moved one of the old "broken" workernodes from the old setup to
>>> the new setup and unexpectedly the performance was also very good again.
>>>
>>> So there seems to be some slowdown on the go-server side or with the
>>> communication with the nodes.
>>>
>>> ketanpad...@gmail.com schrieb am Donnerstag, 4. Mai 2023 um 12:06:13
>>> UTC+2:
>>>
>>>> > Is there maybe a cachefile or lockfile created by the agents which
>>>> does not get deleted with a deinstallation?
>>>>
>>>> This might help find anything owned by the go user.
>>>>
>>>> $ sudo find / -user go
>>>>
>>>> - Ketan
>>>>
>>>>
>>>>
>>>> On Thu, May 4, 2023 at 3:16 PM 'Hans Dampf' via go-cd <
>>>> go...@googlegroups.com> wrote:
>>>>
>>>>>
>>>>> It's not just one task, it's the whole playbook which is slower.
>>>>> Local yes as user go.
>>>>> This runs in a normal performance
>>>>> go@host1:~$ ansible-playbook slowplaybook.yaml -i inventory
>>>>>
>>>>> On the same machine the same playbook but executed by the go-agent is
>>>>> slow.
>>>>> It ran fast in the past until the incident with the heavy load on the
>>>>> agents and big backlog
>>>>> 100% Usage of all 150 agents + 200 Jobs in the backlog.
>>>>> Beside this there where no changes on the playbook or the settings of
>>>>> the agents (env variables)
>>>>>
>>>>> Normaly we only use about 40-50 agents and no backlog
>>>>>
>>>>> Is there maybe a cachefile or lockfile created by the agents which
>>>>> does not get deleted with a deinstallation?
>>>>>
>>>>> ketanpad...@gmail.com schrieb am Donnerstag, 4. Mai 2023 um 10:43:29
>>>>> UTC+2:
>>>>>
>>>>>> It's unclear from your problem description if the entire job is
>>>>>> taking 10-30 minutes, or the task is taking 10-30 minutes. You mention 
>>>>>> that
>>>>>> running locally from the agent is quick — it is unclear if you're running
>>>>>> your task as `go` user or `root` user. For context, there are other
>>>>>> overheads in jobs that include for example — checking out code, cleaning
>>>>>> the working directory (if configured to do so). At the end of all tasks,
>>>>>> the agent will also upload all artifacts/console logs back to the gocd
>>>>>> server.
>>>>>>
>>>>>> If I were in your place, I would do the following next steps:
>>>>>>
>>>>>> - See if the script can be run in quiet mode. Maybe redirect the
>>>>>> output to /dev/null, if possible and check how long it takes to run just
>>>>>> ansible+mitogen. This is to eliminate possible issues or slowness with 
>>>>>> gocd
>>>>>> taking time to "read" the output from your deployment.
>>>>>> - Next — turn on more debug/verbose output in ansible + mitogen to
>>>>>> see if there are things that the gocd agent might be doing that could be
>>>>>> affecting your deploy timings. For e.g — any spurious environment
>>>>>> variables, that gocd might be setting, or perhaps some SSH configs that
>>>>>> might be affecting the deployment.
>>>>>> - Run the `env` command before your job — to dump any environment
>>>>>> variables that are applicable for that job. You can then `export` these
>>>>>> environment variables from the shell (as `go` user) — and then run the
>>>>>> script to see if there is any difference.
>>>>>>
>>>>>> - Ketan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 4, 2023 at 2:03 PM 'Hans Dampf' via go-cd <
>>>>>> go...@googlegroups.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> our setup consists of 10 worker with 15 agents each. We run ansible
>>>>>>> + mitogen on the agents. Currently, we have a problem with the go-agent 
>>>>>>> +
>>>>>>> mitogen.
>>>>>>>
>>>>>>> Mitogen itself is a tool to speedup ansible runs by "tunneling"
>>>>>>> multiple tasks over one ssh connection.
>>>>>>> https://mitogen.networkgenomics.com/ansible_detailed.html
>>>>>>>
>>>>>>> If we use i on the worker without the agent directly on the cli it
>>>>>>> runs very well
>>>>>>>
>>>>>>> Basic Ansible: ~ 5min
>>>>>>> Ansible + Mitogen: ~ 1.5 min
>>>>>>> Ansible + Mitogen + Go-agent (expected): ~2 min
>>>>>>> Ansible + Mitogen + Go-agent (currently): ~ 10 -  30 min
>>>>>>>
>>>>>>> Now, if we start ansible with mitogen enabled IN the go-agent, the
>>>>>>> runtime is significant longer than the basic run.
>>>>>>> Some runs can slow down to 10 - 30 min is highly unusual since it
>>>>>>> should only take 2 - 5 min. Run directly on the cli it's fast as 
>>>>>>> expected.
>>>>>>>
>>>>>>> Strangely, this was not from the beginning. This is only after
>>>>>>> because of an incident we had to stress all 150 agents at once.
>>>>>>>
>>>>>>> We already reinstalled ansible, mitogen and the go-agent itself, but
>>>>>>> the degraded performance persists.
>>>>>>>
>>>>>>> I hope somebody can help in how further debug this, since the last
>>>>>>> resort would be to complete reinstall the whole workernodes.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "go-cd" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to go-cd+un...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "go-cd" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to go-cd+un...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "go-cd" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to go-cd+un...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/go-cd/5b9bf950-3c9d-436c-be48-24ecfef342dfn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/go-cd/5b9bf950-3c9d-436c-be48-24ecfef342dfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "go-cd" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to go-cd+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/go-cd/6f4782c8-071d-4a68-a72d-60f3d8d94f83n%40googlegroups.com
> <https://groups.google.com/d/msgid/go-cd/6f4782c8-071d-4a68-a72d-60f3d8d94f83n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to go-cd+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/CAA1RwH_QTaSQn%3D0oaTAErPX5pi1w7hMBeum21yvjQyzzKvNqUA%40mail.gmail.com.

Reply via email to