This problem should be fixed now with the new plugin version here (or a later release) https://github.com/gocd/gocd-ecs-elastic-agent/releases/tag/v8.0.0-775
Validated with Amazon Linux 2023 / Docker 25.0.6 via al2023-ami-ecs-hvm-2023.0.20241115-kernel-6.1-x86_64 (and the arm64 version). -Chad On Wed, Sep 4, 2024 at 4:16 PM Chad Wilson <[email protected]> wrote: > With some trial and error, seems these are us-east-1 AMIs. The last one > you shared is indeed too new - this wont work. (2024-08-21). > > { > "body": { > "VirtualizationType": "hvm", > "Description": "Amazon Linux AMI 2.0.20240821 x86_64 ECS HVM GP2", > "Hypervisor": "xen", > "ImageOwnerAlias": "amazon", > "EnaSupport": true, > "SriovNetSupport": "simple", > "ImageId": "ami-0a5f593ecaa0f722d", > "State": "available", > "BlockDeviceMappings": [ > { > "DeviceName": "/dev/xvda", > "Ebs": { > "DeleteOnTermination": true, > "SnapshotId": "snap-0dc7b37b7792952a7", > "VolumeSize": 30, > "VolumeType": "gp2", > "Encrypted": false > } > } > ], > "Architecture": "x86_64", > "ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs", > "RootDeviceType": "ebs", > "OwnerId": "591542846629", > "RootDeviceName": "/dev/xvda", > "CreationDate": "2024-08-22T20:53:11.000Z", > "Public": true, > "ImageType": "machine", > "Name": "amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs" > } > } > > > The earlier one you shared is this one: (2024-02-01) > > { > "body": { > "VirtualizationType": "hvm", > "Description": "Amazon Linux AMI 2.0.20240201 x86_64 ECS HVM GP2", > "Hypervisor": "xen", > "ImageOwnerAlias": "amazon", > "EnaSupport": true, > "SriovNetSupport": "simple", > "ImageId": "ami-0ba9fb6bc8faf1fe0", > "State": "available", > "BlockDeviceMappings": [ > { > "DeviceName": "/dev/xvda", > "Ebs": { > "DeleteOnTermination": true, > "SnapshotId": "snap-0ca36cd61121c93d2", > "VolumeSize": 30, > "VolumeType": "gp2", > "Encrypted": false > } > } > ], > "Architecture": "x86_64", > "ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs", > "RootDeviceType": "ebs", > "OwnerId": "591542846629", > "RootDeviceName": "/dev/xvda", > "CreationDate": "2024-02-03T00:52:53.000Z", > "Public": true, > "ImageType": "machine", > "Name": "amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs" > } > } > > > This second one *might* work, as it at least had Docker 20.10 on it, but > since you have shared two different AMIs and I'm not sure which log is from > which, I don't know what the problem is here. > > https://build.gocd.org is using > *amzn2-ami-ecs-kernel-5.10-hvm-2.0.20240625-x86_64-ebs* so this one > definitely works. Find the AMI ID for your region (us-east-1 it seems) and > try that? > > -Chad > > On Wed, Sep 4, 2024 at 4:02 PM Chad Wilson <[email protected]> wrote: > >> *Something* must have changed, e.g you changed AMI, or when instances >> start they now upgrade pre-installed software during cloud-init to >> different versions of pre-installed tools. In future, you need to share the >> specific name of the AMI, the release date and the region etc - an AMI ID >> on its own is not useful to look up. >> >> The plugin doesn't work with Docker 25, so I doubt it was using the same >> AMI before - did you see >> https://github.com/gocd/gocd-ecs-elastic-agent/issues/345 ? You'll have >> to find/use an Amazon Linux 2 (*not 2023*) AMI which still has Docker >> 20.10 pre-installed until the plugin can be modified to support Docker 25. >> >> According to https://alas.aws.amazon.com/announcements/2024-009.html as >> of September 3 a yum upgrade --security on AL2 will cause Docker to >> upgrade to Docker 25, which would break the plugin. Likely if you are using >> a new ECS AMI it is pre-upgraded. However additionally, the *last AL2 >> AMI that will work* is >> https://github.com/aws/amazon-ecs-ami/releases/tag/20240625 >> >> Any Amazon Linux 2 ECS AMIs *newer* than 2024-06-05 will not work, as >> Docker has been upgraded to v25: >> >> - >> https://github.com/aws/amazon-ecs-ami/blob/main/CHANGELOG.md#20240709 >> - https://github.com/aws/amazon-ecs-ami/pull/267 >> >> Since the plugin is still working for https://build.gocd.org which uses >> the ECS plugin, it's definitely possible to have it work - but it does mean >> using an unpatched ECS image, or managing the patching yourself to upgrade >> everything *except* Docker. >> >> -Chad >> >> >> >> On Tue, Sep 3, 2024 at 9:55 PM pradeep devaraj <[email protected]> >> wrote: >> >>> Hi Sriram, >>> >>> - does the ECS consumer get created and registered if you remove the >>> user data script? : *yes*. >>> We have taken marketplace AMI: ami-0a5f593ecaa0f722d , if we >>> create the server manually via the launch template and added to the ECS >>> cluster, its works. the same step if we are doing it via from GOCD - GoCD >>> Elastic Agent Plugin for Amazon ECS. its failing and ocker, ECS is not >>> running. >>> Docker version: *D**ocker version 25.0.5, build 5dc9bcc* >>> >>> - what changed between when this ECS used to work vs now? >>> nothing has changed, it was working till last Thursday night >>> >>> >>> >>> >>> On Tuesday, September 3, 2024 at 6:06:02 PM UTC+5:30 Sriram Narayanan >>> wrote: >>> >>>> ( I am ill so please excuse the limited questions) >>>> - does the ECS consumer get created and registered if you remove the >>>> user data script? >>>> - what changed between when this ECS used to work vs now? >>>> >>>> — Sriram >>>> >>>> On Tue, 3 Sep 2024 at 7:23 PM, pradeep devaraj <[email protected]> >>>> wrote: >>>> >>>>> Hi Team / Chad Wilson. >>>>> >>>>> Docker service and ECS service is failing when new server comes up. >>>>> AMI id: ami-0a5f593ecaa0f722d community one. when we manully spin the >>>>> server and attach via ASG it's registering to cluster. when we try the >>>>> same >>>>> from gocd ecs cluster profile(AWS ECS ELastic plugin) it's not working and >>>>> Docker service and ECS service is failing. >>>>> >>>>> >>>>> >>>>> On Monday, September 2, 2024 at 11:21:06 PM UTC+5:30 pradeep devaraj >>>>> wrote: >>>>> >>>>>> Adding++ >>>>>> >>>>>> we are getting the agnet creation and deletion in loop >>>>>> [go] Received a request to create an agent for the job: >>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >>>>>> [go] No running instance(s) found to build the ECS Task to perform >>>>>> current job. >>>>>> [go] Creating a new container instance to schedule ECS Task. >>>>>> [go] Waiting for instance(s) ([i-061187c3d2ea07317]) to register with >>>>>> cluster. >>>>>> [go] Received a request to create an agent for the job: >>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >>>>>> [go] No running instance(s) found to build the ECS Task to perform >>>>>> current job. >>>>>> [go] Creating a new container instance to schedule ECS Task. >>>>>> [go] Waiting for instance(s) ([i-00bb68d594121ab15]) to register with >>>>>> cluster. >>>>>> [go] Received a request to create an agent for the job: >>>>>> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >>>>>> [go] No running instance(s) found to build the ECS Task to perform >>>>>> current job. >>>>>> [go] Creating a new container instance to schedule ECS Task. >>>>>> >>>>>> On Monday, September 2, 2024 at 9:55:48 PM UTC+5:30 pradeep devaraj >>>>>> wrote: >>>>>> >>>>>>> We are using a GOCD AWS ECS elastic agent plugin. >>>>>>> GOCD version: GoCD Version: 23.4.0 >>>>>>> >>>>>>> GoCD Elastic Agent Plugin for Amazon ECS >>>>>>> >>>>>>> - Version7.3.0-416 >>>>>>> - >>>>>>> - >>>>>>> - >>>>>>> - >>>>>>> >>>>>>> >>>>>>> *AMI id: *ami-0ba9fb6bc8faf1fe0 >>>>>>> >>>>>>> >>>>>>> *Elastic instance is coming up and its not getting assigned to ECS >>>>>>> cluster, we logged in to server and found the blow error. * >>>>>>> >>>>>>> [root@ip-******* ~]# systemctl restart docker >>>>>>> Job for docker.service failed because start of the service was >>>>>>> attempted too often. See "systemctl status docker.service" and >>>>>>> "journalctl >>>>>>> -xe" for details. >>>>>>> To force a start use "systemctl reset-failed docker.service" >>>>>>> followed by "systemctl start docker.service" again. >>>>>>> [root@ip- ******* ~]# journalctl -xe >>>>>>> -- Defined-By: systemd >>>>>>> -- Support: >>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>>>>>> -- >>>>>>> -- Unit ecs.service has finished shutting down. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: start >>>>>>> request repeated too quickly for docker.service >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed >>>>>>> to start Docker Application Container Engine. >>>>>>> -- Subject: Unit docker.service has failed >>>>>>> -- Defined-By: systemd >>>>>>> -- Support: >>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>>>>>> -- >>>>>>> -- Unit docker.service has failed. >>>>>>> -- >>>>>>> -- The result is failed. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>>>>>> docker.service failed. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>>>>>> Starting Amazon Elastic Container Service - container agent... >>>>>>> -- Subject: Unit ecs.service has begun start-up >>>>>>> -- Defined-By: systemd >>>>>>> -- Support: >>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>>>>>> -- >>>>>>> -- Unit ecs.service has begun starting up. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>>>>>> ecs.service: control process exited, code=exited status=1 >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z >>>>>>> msg="post-stop" >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z >>>>>>> msg="Cleaning >>>>>>> up the credentials endpoint setup for Amazon El >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error >>>>>>> performing action 'delete' for iptables route: ex >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error >>>>>>> performing action 'delete' for iptables route: ex >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error >>>>>>> performing action 'delete' for iptables route: ex >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon >>>>>>> amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error >>>>>>> performing action 'delete' for iptables route: ex >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed >>>>>>> to start Amazon Elastic Container Service - container agent. >>>>>>> -- Subject: Unit ecs.service has failed >>>>>>> -- Defined-By: systemd >>>>>>> -- Support: >>>>>>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>>>>>> -- >>>>>>> -- Unit ecs.service has failed. >>>>>>> -- >>>>>>> -- The result is failed. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Unit >>>>>>> ecs.service entered failed state. >>>>>>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>>>>>> ecs.service failed. >>>>>>> >>>>>>> >>>>>>> >>>>>>> [root@ipXXXX ~]# df -hT >>>>>>> Filesystem Type Size Used Avail Use% Mounted on >>>>>>> devtmpfs devtmpfs 7.7G 0 7.7G 0% /dev >>>>>>> tmpfs tmpfs 7.7G 0 7.7G 0% /dev/shm >>>>>>> tmpfs tmpfs 7.7G 376K 7.7G 1% /run >>>>>>> tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup >>>>>>> /dev/nvme0n1p1 xfs 100G 2.4G 98G 3% / >>>>>>> tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/0 >>>>>>> [root@ip-10-226-11-63 ~]# docker --version >>>>>>> Docker version 25.0.5, build 5dc9bcc >>>>>>> >>>>>>> BELOW User data script we are using and getting excited while >>>>>>> spinning up an error. >>>>>>> >>>>>>> "ECS_INSTANCE_ATTRIBUTES={"server-id":"31e424ad-e242-45d2-a5bb-0ef7be0d8306"} >>>>>>> EOT echo 'File /etc/ecs/ecs.config successfully created.' log "Finished >>>>>>> executing GoCD's user data script, now executing custom user data script >>>>>>> from use, if present." #!/bin/bash echo "ECS_CLUSTER=GoCD-ECS-UAT" >> >>>>>>> /etc/ecs/ecs.config log "Finished executing user specified user data >>>>>>> script." --// #cloud-config cloud_final_modules: - [scripts-user, >>>>>>> always] >>>>>>> --// Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: >>>>>>> 1.0 >>>>>>> Content-Transfer-Encoding: 7bit Content-Disposition: attachment; >>>>>>> filename="initialize_instance_store" #!/bin/bash exec > >(tee >>>>>>> /var/log/initialize_instance_store.log | logger -t user-data -s >>>>>>> 2>/dev/console) 2>&1 function log() { echo "[$(date "+%Y-%m-%d >>>>>>> %H:%M:%S")] - $1" >> /var/log/initialize_instance_store.log } function >>>>>>> try() { $@ return 0 } log "Starting to setup instance store for >>>>>>> the >>>>>>> docker." INSTANCE_STORES=$(ls >>>>>>> /dev/disk/by-id/*EC2_NVMe_Instance_Storage*-ns-1) if [ -z >>>>>>> "${INSTANCE_STORES}" ]; then log "No instance store detected." fi >>>>>>> VOLUMES="$INSTANCE_STORES" if [ -e "/dev/xvdcz" ]; then log >>>>>>> "Instance >>>>>>> has /dev/xvdcz EBS volume. Using it for docker logical volume group." >>>>>>> VOLUMES="$VOLUMES /dev/xvdcz" fi if [ -z "${VOLUMES}" ]; then log >>>>>>> "No >>>>>>> addition volumes. Using box standard docker setup." else log >>>>>>> "Available >>>>>>> instance stores: ${VOLUMES}." log "Setting up the docker logical >>>>>>> volume >>>>>>> group." service docker stop rm -rf /var/lib/docker/* dmsetup >>>>>>> remove_all VOLUME_GROUP=docker LOGICAL_VOLUME=docker-pool >>>>>>> try >>>>>>> vgremove -y "${VOLUME_GROUP}" try lvremove -y "${LOGICAL_VOLUME}" >>>>>>> vgcreate -y "${VOLUME_GROUP}" ${VOLUMES} sleep 2 lvcreate -y -l >>>>>>> 5%VG -n ${LOGICAL_VOLUME}\meta ${VOLUME_GROUP} lvcreate -y -l 90%VG >>>>>>> -n >>>>>>> ${LOGICAL_VOLUME} ${VOLUME_GROUP} sleep 2 lvconvert -y --zero n >>>>>>> --thinpool ${VOLUME_GROUP}/${LOGICAL_VOLUME} --poolmetadata >>>>>>> ${VOLUME_GROUP}/${LOGICAL_VOLUME}\meta echo >>>>>>> 'DOCKER_STORAGE_OPTIONS=" >>>>>>> --storage-driver devicemapper --storage-opt >>>>>>> dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt >>>>>>> dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true >>>>>>> --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true"' > >>>>>>> /etc/sysconfig/docker-storage test -f /bin/systemctl && systemctl >>>>>>> reset-failed docker.service service docker restart test -f >>>>>>> /bin/systemctl && systemctl enable --no-block --now ecs fi log "Setup >>>>>>> completed." --//" >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "go-cd" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "go-cd" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/go-cd/6c87f4bc-3fff-450f-9182-c6854fb06c1en%40googlegroups.com >>> <https://groups.google.com/d/msgid/go-cd/6c87f4bc-3fff-450f-9182-c6854fb06c1en%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-XPPOXCKnPaJyyCNsqv2c2wNhyu-ZEZBraP2F3rLvAPA%40mail.gmail.com.
