Re: [go-cd] Re: GOCD AWS ECS Elastic Agent allocation is falling

Chad Wilson Mon, 09 Dec 2024 07:30:38 -0800

This problem should be fixed now with the new plugin version here (or a
later release)
https://github.com/gocd/gocd-ecs-elastic-agent/releases/tag/v8.0.0-775


Validated with Amazon Linux 2023 / Docker 25.0.6 via
al2023-ami-ecs-hvm-2023.0.20241115-kernel-6.1-x86_64 (and the arm64
version).

-Chad

On Wed, Sep 4, 2024 at 4:16 PM Chad Wilson <[email protected]> wrote:

> With some trial and error, seems these are us-east-1 AMIs. The last one
> you shared is indeed too new - this wont work. (2024-08-21).
>
> {
>   "body": {
>     "VirtualizationType": "hvm",
>     "Description": "Amazon Linux AMI 2.0.20240821 x86_64 ECS HVM GP2",
>     "Hypervisor": "xen",
>     "ImageOwnerAlias": "amazon",
>     "EnaSupport": true,
>     "SriovNetSupport": "simple",
>     "ImageId": "ami-0a5f593ecaa0f722d",
>     "State": "available",
>     "BlockDeviceMappings": [
>       {
>         "DeviceName": "/dev/xvda",
>         "Ebs": {
>           "DeleteOnTermination": true,
>           "SnapshotId": "snap-0dc7b37b7792952a7",
>           "VolumeSize": 30,
>           "VolumeType": "gp2",
>           "Encrypted": false
>         }
>       }
>     ],
>     "Architecture": "x86_64",
>     "ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs",
>     "RootDeviceType": "ebs",
>     "OwnerId": "591542846629",
>     "RootDeviceName": "/dev/xvda",
>     "CreationDate": "2024-08-22T20:53:11.000Z",
>     "Public": true,
>     "ImageType": "machine",
>     "Name": "amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs"
>   }
> }
>
>
> The earlier one you shared is this one: (2024-02-01)
>
> {
>   "body": {
>     "VirtualizationType": "hvm",
>     "Description": "Amazon Linux AMI 2.0.20240201 x86_64 ECS HVM GP2",
>     "Hypervisor": "xen",
>     "ImageOwnerAlias": "amazon",
>     "EnaSupport": true,
>     "SriovNetSupport": "simple",
>     "ImageId": "ami-0ba9fb6bc8faf1fe0",
>     "State": "available",
>     "BlockDeviceMappings": [
>       {
>         "DeviceName": "/dev/xvda",
>         "Ebs": {
>           "DeleteOnTermination": true,
>           "SnapshotId": "snap-0ca36cd61121c93d2",
>           "VolumeSize": 30,
>           "VolumeType": "gp2",
>           "Encrypted": false
>         }
>       }
>     ],
>     "Architecture": "x86_64",
>     "ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs",
>     "RootDeviceType": "ebs",
>     "OwnerId": "591542846629",
>     "RootDeviceName": "/dev/xvda",
>     "CreationDate": "2024-02-03T00:52:53.000Z",
>     "Public": true,
>     "ImageType": "machine",
>     "Name": "amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs"
>   }
> }
>
>
> This second one *might* work, as it at least had Docker 20.10 on it, but
> since you have shared two different AMIs and I'm not sure which log is from
> which, I don't know what the problem is here.
>
> https://build.gocd.org is using
> *amzn2-ami-ecs-kernel-5.10-hvm-2.0.20240625-x86_64-ebs* so this one
> definitely works. Find the AMI ID for your region (us-east-1 it seems) and
> try that?
>
> -Chad
>
> On Wed, Sep 4, 2024 at 4:02 PM Chad Wilson <[email protected]> wrote:
>
>> *Something* must have changed, e.g you changed AMI, or when instances
>> start they now upgrade pre-installed software during cloud-init to
>> different versions of pre-installed tools. In future, you need to share the
>> specific name of the AMI, the release date and the region etc - an AMI ID
>> on its own is not useful to look up.
>>
>> The plugin doesn't work with Docker 25, so I doubt it was using the same
>> AMI before - did you see
>> https://github.com/gocd/gocd-ecs-elastic-agent/issues/345 ? You'll have
>> to find/use an Amazon Linux 2 (*not 2023*) AMI which still has Docker
>> 20.10 pre-installed until the plugin can be modified to support Docker 25.
>>
>> According to https://alas.aws.amazon.com/announcements/2024-009.html as
>> of September 3 a yum upgrade --security on AL2 will cause Docker to
>> upgrade to Docker 25, which would break the plugin. Likely if you are using
>> a new ECS AMI it is pre-upgraded. However additionally, the *last AL2
>> AMI that will work* is
>> https://github.com/aws/amazon-ecs-ami/releases/tag/20240625
>>
>> Any Amazon Linux 2 ECS AMIs *newer* than 2024-06-05 will not work, as
>> Docker has been upgraded to v25:
>>
>>    -
>>    https://github.com/aws/amazon-ecs-ami/blob/main/CHANGELOG.md#20240709
>>    - https://github.com/aws/amazon-ecs-ami/pull/267
>>
>> Since the plugin is still working for https://build.gocd.org which uses
>> the ECS plugin, it's definitely possible to have it work - but it does mean
>> using an unpatched ECS image, or managing the patching yourself to upgrade
>> everything *except* Docker.
>>
>> -Chad
>>
>>
>>
>> On Tue, Sep 3, 2024 at 9:55 PM pradeep devaraj <[email protected]>
>> wrote:
>>
>>> Hi  Sriram,
>>>
>>> - does the ECS consumer get created and registered if you remove the
>>> user data script? : *yes*.
>>>     We have taken marketplace AMI:  ami-0a5f593ecaa0f722d ,  if we
>>> create the server manually via the launch template and added to the ECS
>>> cluster, its works. the same step if we are doing it via from GOCD - GoCD
>>> Elastic Agent Plugin for Amazon ECS. its failing and ocker, ECS is not
>>> running.
>>> Docker version: *D**ocker version 25.0.5, build 5dc9bcc*
>>>
>>> - what changed between when this ECS used to work vs now?
>>> nothing has changed, it was working till last Thursday night
>>>
>>>
>>>
>>>
>>> On Tuesday, September 3, 2024 at 6:06:02 PM UTC+5:30 Sriram Narayanan
>>> wrote:
>>>
>>>> ( I am ill so please excuse the limited questions)
>>>> - does the ECS consumer get created and registered if you remove the
>>>> user data script?
>>>> - what changed between when this ECS used to work vs now?
>>>>
>>>> — Sriram
>>>>
>>>> On Tue, 3 Sep 2024 at 7:23 PM, pradeep devaraj <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Team / Chad Wilson.
>>>>>
>>>>> Docker service and ECS service is failing when new server comes up.
>>>>> AMI id: ami-0a5f593ecaa0f722d  community one.  when we manully  spin the
>>>>> server and attach via ASG it's registering to cluster. when we try the 
>>>>> same
>>>>> from gocd ecs cluster profile(AWS ECS ELastic plugin) it's not working and
>>>>> Docker service and ECS service is failing.
>>>>>
>>>>>
>>>>>
>>>>> On Monday, September 2, 2024 at 11:21:06 PM UTC+5:30 pradeep devaraj
>>>>> wrote:
>>>>>
>>>>>> Adding++
>>>>>>
>>>>>> we are getting the agnet creation and deletion in loop
>>>>>> [go] Received a request to create an agent for the job:
>>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
>>>>>> [go] No running instance(s) found to build the ECS Task to perform
>>>>>> current job.
>>>>>> [go] Creating a new container instance to schedule ECS Task.
>>>>>> [go] Waiting for instance(s) ([i-061187c3d2ea07317]) to register with
>>>>>> cluster.
>>>>>> [go] Received a request to create an agent for the job:
>>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
>>>>>> [go] No running instance(s) found to build the ECS Task to perform
>>>>>> current job.
>>>>>> [go] Creating a new container instance to schedule ECS Task.
>>>>>> [go] Waiting for instance(s) ([i-00bb68d594121ab15]) to register with
>>>>>> cluster.
>>>>>> [go] Received a request to create an agent for the job:
>>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
>>>>>> [go] No running instance(s) found to build the ECS Task to perform
>>>>>> current job.
>>>>>> [go] Creating a new container instance to schedule ECS Task.
>>>>>>
>>>>>> On Monday, September 2, 2024 at 9:55:48 PM UTC+5:30 pradeep devaraj
>>>>>> wrote:
>>>>>>
>>>>>>> We are using a GOCD AWS ECS elastic agent plugin.
>>>>>>> GOCD version: GoCD Version: 23.4.0
>>>>>>>
>>>>>>> GoCD Elastic Agent Plugin for Amazon ECS
>>>>>>>
>>>>>>>    - Version7.3.0-416
>>>>>>>    -
>>>>>>>    -
>>>>>>>    -
>>>>>>>    -
>>>>>>>
>>>>>>>
>>>>>>> *AMI id: *ami-0ba9fb6bc8faf1fe0
>>>>>>>
>>>>>>>
>>>>>>> *Elastic instance is coming up and its not getting assigned to ECS
>>>>>>> cluster, we logged in to server and found the blow error. *
>>>>>>>
>>>>>>> [root@ip-******* ~]# systemctl restart docker
>>>>>>> Job for docker.service failed because start of the service was
>>>>>>> attempted too often. See "systemctl status docker.service" and 
>>>>>>> "journalctl
>>>>>>> -xe" for details.
>>>>>>> To force a start use "systemctl reset-failed docker.service"
>>>>>>> followed by "systemctl start docker.service" again.
>>>>>>> [root@ip- *******   ~]# journalctl -xe
>>>>>>> -- Defined-By: systemd
>>>>>>> -- Support:
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>>>>>> --
>>>>>>> -- Unit ecs.service has finished shutting down.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: start
>>>>>>> request repeated too quickly for docker.service
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed
>>>>>>> to start Docker Application Container Engine.
>>>>>>> -- Subject: Unit docker.service has failed
>>>>>>> -- Defined-By: systemd
>>>>>>> -- Support:
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>>>>>> --
>>>>>>> -- Unit docker.service has failed.
>>>>>>> --
>>>>>>> -- The result is failed.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]:
>>>>>>> docker.service failed.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]:
>>>>>>> Starting Amazon Elastic Container Service - container agent...
>>>>>>> -- Subject: Unit ecs.service has begun start-up
>>>>>>> -- Defined-By: systemd
>>>>>>> -- Support:
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>>>>>> --
>>>>>>> -- Unit ecs.service has begun starting up.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]:
>>>>>>> ecs.service: control process exited, code=exited status=1
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z 
>>>>>>> msg="post-stop"
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z 
>>>>>>> msg="Cleaning
>>>>>>> up the credentials endpoint setup for Amazon El
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error
>>>>>>> performing action 'delete' for iptables route: ex
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error
>>>>>>> performing action 'delete' for iptables route: ex
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error
>>>>>>> performing action 'delete' for iptables route: ex
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon
>>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error
>>>>>>> performing action 'delete' for iptables route: ex
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed
>>>>>>> to start Amazon Elastic Container Service - container agent.
>>>>>>> -- Subject: Unit ecs.service has failed
>>>>>>> -- Defined-By: systemd
>>>>>>> -- Support:
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>>>>>> --
>>>>>>> -- Unit ecs.service has failed.
>>>>>>> --
>>>>>>> -- The result is failed.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Unit
>>>>>>> ecs.service entered failed state.
>>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]:
>>>>>>> ecs.service failed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [root@ipXXXX ~]# df -hT
>>>>>>> Filesystem     Type      Size  Used Avail Use% Mounted on
>>>>>>> devtmpfs       devtmpfs  7.7G     0  7.7G   0% /dev
>>>>>>> tmpfs          tmpfs     7.7G     0  7.7G   0% /dev/shm
>>>>>>> tmpfs          tmpfs     7.7G  376K  7.7G   1% /run
>>>>>>> tmpfs          tmpfs     7.7G     0  7.7G   0% /sys/fs/cgroup
>>>>>>> /dev/nvme0n1p1 xfs       100G  2.4G   98G   3% /
>>>>>>> tmpfs          tmpfs     1.6G     0  1.6G   0% /run/user/0
>>>>>>> [root@ip-10-226-11-63 ~]# docker --version
>>>>>>> Docker version 25.0.5, build 5dc9bcc
>>>>>>>
>>>>>>> BELOW User data script we are using and getting excited while
>>>>>>> spinning up an error.
>>>>>>>
>>>>>>> "ECS_INSTANCE_ATTRIBUTES={"server-id":"31e424ad-e242-45d2-a5bb-0ef7be0d8306"}
>>>>>>> EOT echo 'File /etc/ecs/ecs.config successfully created.' log "Finished
>>>>>>> executing GoCD's user data script, now executing custom user data script
>>>>>>> from use, if present." #!/bin/bash echo "ECS_CLUSTER=GoCD-ECS-UAT"  >>
>>>>>>> /etc/ecs/ecs.config log "Finished executing user specified user data
>>>>>>> script." --// #cloud-config cloud_final_modules: - [scripts-user, 
>>>>>>> always]
>>>>>>> --// Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: 
>>>>>>> 1.0
>>>>>>> Content-Transfer-Encoding: 7bit Content-Disposition: attachment;
>>>>>>> filename="initialize_instance_store" #!/bin/bash exec > >(tee
>>>>>>> /var/log/initialize_instance_store.log | logger -t user-data -s
>>>>>>> 2>/dev/console) 2>&1 function log() {     echo "[$(date "+%Y-%m-%d
>>>>>>> %H:%M:%S")] - $1" >> /var/log/initialize_instance_store.log } function
>>>>>>> try() {    $@    return 0 } log "Starting to setup instance store for 
>>>>>>> the
>>>>>>> docker." INSTANCE_STORES=$(ls
>>>>>>> /dev/disk/by-id/*EC2_NVMe_Instance_Storage*-ns-1) if [ -z
>>>>>>> "${INSTANCE_STORES}" ]; then     log "No instance store detected." fi
>>>>>>> VOLUMES="$INSTANCE_STORES" if [ -e "/dev/xvdcz" ]; then     log 
>>>>>>> "Instance
>>>>>>> has /dev/xvdcz EBS volume. Using it for docker logical volume group."
>>>>>>> VOLUMES="$VOLUMES /dev/xvdcz" fi if [ -z "${VOLUMES}" ]; then     log 
>>>>>>> "No
>>>>>>> addition volumes. Using box standard docker setup." else     log 
>>>>>>> "Available
>>>>>>> instance stores: ${VOLUMES}."     log "Setting up the docker logical 
>>>>>>> volume
>>>>>>> group."     service docker stop     rm -rf /var/lib/docker/*     dmsetup
>>>>>>> remove_all     VOLUME_GROUP=docker     LOGICAL_VOLUME=docker-pool     
>>>>>>> try
>>>>>>> vgremove -y "${VOLUME_GROUP}"     try lvremove -y "${LOGICAL_VOLUME}"
>>>>>>> vgcreate -y "${VOLUME_GROUP}" ${VOLUMES}     sleep 2     lvcreate -y -l
>>>>>>> 5%VG -n ${LOGICAL_VOLUME}\meta ${VOLUME_GROUP}     lvcreate -y -l 90%VG 
>>>>>>> -n
>>>>>>> ${LOGICAL_VOLUME} ${VOLUME_GROUP}     sleep 2     lvconvert -y --zero n
>>>>>>> --thinpool ${VOLUME_GROUP}/${LOGICAL_VOLUME} --poolmetadata
>>>>>>> ${VOLUME_GROUP}/${LOGICAL_VOLUME}\meta     echo 
>>>>>>> 'DOCKER_STORAGE_OPTIONS="
>>>>>>> --storage-driver devicemapper --storage-opt
>>>>>>> dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt
>>>>>>> dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true
>>>>>>> --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true"' >
>>>>>>> /etc/sysconfig/docker-storage     test -f /bin/systemctl && systemctl
>>>>>>> reset-failed docker.service     service docker restart     test -f
>>>>>>> /bin/systemctl && systemctl enable --no-block --now ecs fi log "Setup
>>>>>>> completed." --//"
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "go-cd" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "go-cd" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/go-cd/6c87f4bc-3fff-450f-9182-c6854fb06c1en%40googlegroups.com
>>> <https://groups.google.com/d/msgid/go-cd/6c87f4bc-3fff-450f-9182-c6854fb06c1en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/go-cd/CAA1RwH-XPPOXCKnPaJyyCNsqv2c2wNhyu-ZEZBraP2F3rLvAPA%40mail.gmail.com.

Re: [go-cd] Re: GOCD AWS ECS Elastic Agent allocation is falling

Reply via email to