Re: [go-cd] Config Repositories has always in fetch status

Chad Wilson Tue, 15 Aug 2023 22:22:14 -0700

Thanks so much for sharing back! This does make sense. I have also
experienced issues with EFS related to things like this. I would have
suggested checking disk performance if I had realised a GoCD server
replacement+upgrade was part of what had changed :-)


While it has some challenges in terms of being AZ specific, generally I
have had better experience mounting EBS volumes for use by GoCD (rather
than network-based stores such as EFS), although that does limit which AZ
your GoCD server can run in (without manual intervention), so it depends on
your wider deployment architecture whether that is acceptable.

I can't think of any major reason the GoCD server version change on its own
would cause higher throughput usage than your older version GoCD Server.
One thing worth thinking about is that, in my recollection, EFS in bursting
mode does vary speeds a lot based on the size of the storage. If your *new
server *has much lower storage/use of EFS than your *old server* then the
limits may be different. (e.g if you wiped a lot of artifacts while
re-using the same EFS volume or created a new EFS volume which is a lot
smaller) I'd suggest comparing the AWS side metrics for your EFS throughput
between the two to compare what their usage, credits, and limits are per
https://docs.aws.amazon.com/efs/latest/ug/performance.html.

For small EFS volumes, the baseline throughput is pretty terrible (15 MiBps
read, 5 MiBps continuously), and GoCD servers tend to be rather write heavy
if you have heavy use of artifacts within GoCD itself.

I am not sure if use of https-git has any major implications for disk usage
on the git side of things compared to ssh , but I would not have thought
it'd majorly change the throughput requirements. If EFS volume size doesn't
explain issues, and you have changed all of the material URLs to https://,
perhaps you want to compare other aspects of your material configuration
for changes in

   - the # of distinct materials known to GoCD on the Materials tab (old vs
   new)
   - the # of these materials that are auto-updating (polling, the default)
   compared to having auto-update disabled (e.g if you also use Webhooks)

-Chad

On Wed, Aug 16, 2023 at 1:01 PM Komgrit Aneksri <tanakacu...@gmail.com>
wrote:

> Hello Chad,
>
> Thank you for your suggestion. Just for your infomation.
>
> I had fixed this issue since last week.
>
> This root cause issue related to our EFS throughput for stored flyweight,
> /home/go, artifacts, ....
>
> This issue solved by I changed EFS  throughput mode from bursting to
> elastic instead.
>
> But I am investigating about why the new our GoCD server (v23.1.0) use
> higher throughput than the current GoCD server (v22.1.0).
>
> So they are the same configuration but there are difference only GoCD
> version and new GOCD server use only git over https.
>
> Best Regards,
> Komgrit
>
> On Tuesday, August 8, 2023 at 11:23:25 PM UTC+7 Chad Wilson wrote:
>
>> That is quite a lot of forked git processes. If there are constantly the
>> same amount of forked git processes (over, say, a 1 minute period), that is
>> likely the server at maximum default "material updates" concurrency. This
>> may mean git operations are queued behind each other and you possibly can't
>> fetch/check for git updates fast enough. The server typically logs
>> something when this is happening - you might want to inspect the logs more
>> closely.
>>
>> Git operations being queued is possibly also what is happening to your
>> config repositories, which is why you see them constantly "refreshing".
>>
>> Since all the processes seem to be at low CPU usage, this implies to me
>> that they are probably waiting for the network or your GitLab server (or
>> theoretically the local disk in the container is slow, but less likely a
>> cause). As suggested earlier, I think you are going to need to analyze git
>> speed further, to see what is happening with your network connectivity to
>> the GitLab server, and possibly check the GitLab server metrics itself. If
>> you upgraded or changed something on GitLab, I'd suggest comparing its
>> metrics from before the change/upgrade to afterwards and that type of thing.
>>
>> -Chad
>>
>> On Tue, Aug 8, 2023 at 11:54 PM Komgrit Aneksri <tanak...@gmail.com>
>> wrote:
>>
>>> Hello Chad,
>>>
>>> I look in top and ps but no any weird or stuck processes.
>>>
>>> As i attached pictures.
>>>
>>> [image: 1691509554118.jpg]
>>>
>>> [image: 1691509751182.jpg]
>>>
>>> BR,
>>> Komgrit
>>> On Tuesday, August 8, 2023 at 9:24:43 PM UTC+7 Chad Wilson wrote:
>>>
>>>> Hmm, given your description and the basic metrics you shared, the
>>>> behaviour sounds strange.
>>>>
>>>> To step back slightly and confirm the issue, please look inside the
>>>> container at the process tree (via ps, top etc) and see whether there are a
>>>> large number of forked git processes. If there *are*, we want to see
>>>> what they are doing and focus there. If there *are not*, the problem
>>>> may be somewhere else inside GoCD causing the config repo loads to get
>>>> stuck, and we need to look in a different area.
>>>>
>>>> If you changed nothing on GoCD or its hardware/host/config, that seems
>>>> to point to something outside GoCD as the source of the problem, unless it
>>>> is a problem that started as a side effect of restarting GoCD.
>>>>
>>>> -Chad
>>>>
>>>> On Tue, Aug 8, 2023 at 9:27 PM Komgrit Aneksri <tanak...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Chad,
>>>>>
>>>>> What did you change around the time this behaviour started happening?
>>>>> -GoCD = no
>>>>> -Gitlab = upgrade from 16.2.0 to 16.2.3
>>>>>
>>>>> Which container image for the server are you using? (only
>>>>> gocd-server-centos-9
>>>>> <https://hub.docker.com/r/gocd/gocd-server-centos-9/> is built for
>>>>> ARM64 - if you are trying to run emulated you will probably have many
>>>>> problems)
>>>>> - I am using official gocd/gocd-server-centos-9:v23.1.0
>>>>>
>>>>> What are the CPU requests and limits that you have assigned to the
>>>>> gocd server pod? And are you deploying with the standard GoCD helm chart?
>>>>> If the limits are too low, or the requests are low and other processes on
>>>>> the same node are using too much CPU you can still end up with CPU
>>>>> starvation, even if it looks like the pod isn't using much CPU, because it
>>>>> may be throttled.
>>>>> - I am using officail helm chart v 2.1.6. and no limit cpu and memory
>>>>> and set request CPU and Memory
>>>>>  resources:
>>>>>     requests:
>>>>>       memory: 2048Mi
>>>>>       cpu: 1000m
>>>>> PS. GoCD server is running on c6g.large instance type and this below
>>>>> is top node result.
>>>>> NAME                                              CPU(cores)   CPU%
>>>>> MEMORY(bytes)   MEMORY%
>>>>> ip-xx-xxx-xx-xx.ap-southeast-1.compute.internal   74m          3%
>>>>> 2570Mi          81%
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Komgrit
>>>>>
>>>>> On Tuesday, August 8, 2023 at 4:11:18 PM UTC+7 Chad Wilson wrote:
>>>>>
>>>>>> If you are getting logs like that, it sounds like the container is
>>>>>> experiencing CPU starvation.
>>>>>>
>>>>>>    - What did you change around the time this behaviour started
>>>>>>    happening?
>>>>>>    - Which container image for the server are you using? (only
>>>>>>    gocd-server-centos-9
>>>>>>    <https://hub.docker.com/r/gocd/gocd-server-centos-9/> is built
>>>>>>    for ARM64 - if you are trying to run emulated you will probably have 
>>>>>> many
>>>>>>    problems)
>>>>>>    - What are the CPU requests and limits that you have assigned to
>>>>>>    the gocd server pod? And are you deploying with the standard GoCD helm
>>>>>>    chart? If the limits are too low, or the requests are low and other
>>>>>>    processes on the same node are using too much CPU you can still end 
>>>>>> up with
>>>>>>    CPU starvation, even if it looks like the pod isn't using much CPU, 
>>>>>> because
>>>>>>    it may be throttled.
>>>>>>
>>>>>> -Chad
>>>>>>
>>>>>> On Tue, Aug 8, 2023 at 4:06 PM Komgrit Aneksri <tanak...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Chad,
>>>>>>>
>>>>>>> Thank you for your advice.
>>>>>>>
>>>>>>> I tried run git clone with debug environment variables command in
>>>>>>> gocd server pod. So the command run successfully and there is no any
>>>>>>> abnormal log.
>>>>>>>
>>>>>>> And CPU and memory are not high usage.
>>>>>>>
>>>>>>> We tried restart gocd server pods many time but it is not help.
>>>>>>>
>>>>>>> And I dig to see timeout logs. There is log messages.
>>>>>>>  INFO   | wrapper  | 2023/08/08 07:43:48 | Wrapper Process has not
>>>>>>> received any CPU time for 22 seconds.  Extending timeouts.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Komgrit
>>>>>>> On Tuesday, August 8, 2023 at 12:16:45 PM UTC+7 Chad Wilson wrote:
>>>>>>>
>>>>>>>> That error message is not relevant to the issue, you can ignore it.
>>>>>>>> Are there other errors or timeouts in the logs?
>>>>>>>>
>>>>>>>> To refresh config repos, GoCD forks regular git processes to clone
>>>>>>>> and then fetches. You might want to exec into the container and see 
>>>>>>>> what
>>>>>>>> these processes are doing (high CPU? stuck somehow?)
>>>>>>>>
>>>>>>>> There are also a few general suggestions on similar issues around
>>>>>>>> which you might want to check:
>>>>>>>>
>>>>>>>>    - https://github.com/gocd/gocd/issues/10480
>>>>>>>>    - https://github.com/gocd/gocd/issues/9588
>>>>>>>>    - https://github.com/gocd/gocd/issues/8565
>>>>>>>>
>>>>>>>> If this is happening after a restart of the GoCD server, or due to
>>>>>>>> some other change you've made it's possible Gitlab is throttling the
>>>>>>>> requests.
>>>>>>>>
>>>>>>>> Depending on whether you are using https or ssh connections, you
>>>>>>>> may want to use standard git environment variables to debug what's
>>>>>>>> happening (GIT_TRACE=1, GIT_CURL_VERBOSE=1 etc).
>>>>>>>>
>>>>>>>> -Chad
>>>>>>>>
>>>>>>>> On Tue, Aug 8, 2023 at 12:29 PM Komgrit Aneksri <tanak...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello GoCD team,
>>>>>>>>>
>>>>>>>>> I am facing issue in Config Repositories page.
>>>>>>>>>
>>>>>>>>> So All Config Repositories are always in fetching status for
>>>>>>>>> Gitlab(16.2.3)
>>>>>>>>> Our GoCD is 23.1.0 version on kubernetes and running ARM64.
>>>>>>>>>
>>>>>>>>> In server log, There is error message
>>>>>>>>>
>>>>>>>>> "jvm 1    | 2023-08-08 04:18:17,151 ERROR [qtp1962126505-38]
>>>>>>>>> VariableReplacer:385 - function ${escape:} type 'escape:' not a valid 
>>>>>>>>> type"
>>>>>>>>>
>>>>>>>>> when i did refresh or open the Config Repositories page.
>>>>>>>>> [image: 1691468556134.jpg]
>>>>>>>>>
>>>>>>>>> Please help us for fix it.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Komgrit
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "go-cd" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to go-cd+un...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/go-cd/af5f5dd4-6f27-47d3-8eb3-1a19c0675ae7n%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/go-cd/af5f5dd4-6f27-47d3-8eb3-1a19c0675ae7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "go-cd" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to go-cd+un...@googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/go-cd/f5b73449-c62b-426c-8bf6-1a4911a8d521n%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/go-cd/f5b73449-c62b-426c-8bf6-1a4911a8d521n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "go-cd" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to go-cd+un...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/go-cd/665e02e9-8b59-4573-8e3f-97a6366d180cn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/go-cd/665e02e9-8b59-4573-8e3f-97a6366d180cn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "go-cd" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to go-cd+un...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/go-cd/706940e3-2252-4254-ac54-c7fc671f6746n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/go-cd/706940e3-2252-4254-ac54-c7fc671f6746n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "go-cd" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to go-cd+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/go-cd/dda5fffc-35f5-4d2b-bf36-333634468301n%40googlegroups.com
> <https://groups.google.com/d/msgid/go-cd/dda5fffc-35f5-4d2b-bf36-333634468301n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to go-cd+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/CAA1RwH_mr3Pa07GQUzW2Zx%3DeX-y_D82%2Bk%2BoC69S3Ndhr07FC5g%40mail.gmail.com.

Re: [go-cd] Config Repositories has always in fetch status

Reply via email to