( I am ill so please excuse the limited questions) - does the ECS consumer get created and registered if you remove the user data script? - what changed between when this ECS used to work vs now?
— Sriram On Tue, 3 Sep 2024 at 7:23 PM, pradeep devaraj <[email protected]> wrote: > Hi Team / Chad Wilson. > > Docker service and ECS service is failing when new server comes up. AMI > id: ami-0a5f593ecaa0f722d community one. when we manully spin the server > and attach via ASG it's registering to cluster. when we try the same from > gocd ecs cluster profile(AWS ECS ELastic plugin) it's not working and > Docker service and ECS service is failing. > > > > On Monday, September 2, 2024 at 11:21:06 PM UTC+5:30 pradeep devaraj wrote: > >> Adding++ >> >> we are getting the agnet creation and deletion in loop >> [go] Received a request to create an agent for the job: >> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >> [go] No running instance(s) found to build the ECS Task to perform >> current job. >> [go] Creating a new container instance to schedule ECS Task. >> [go] Waiting for instance(s) ([i-061187c3d2ea07317]) to register with >> cluster. >> [go] Received a request to create an agent for the job: >> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >> [go] No running instance(s) found to build the ECS Task to perform >> current job. >> [go] Creating a new container instance to schedule ECS Task. >> [go] Waiting for instance(s) ([i-00bb68d594121ab15]) to register with >> cluster. >> [go] Received a request to create an agent for the job: >> [SpecOps_UAT_Elastic_Img_crt/6/test/1/test] >> [go] No running instance(s) found to build the ECS Task to perform >> current job. >> [go] Creating a new container instance to schedule ECS Task. >> >> On Monday, September 2, 2024 at 9:55:48 PM UTC+5:30 pradeep devaraj wrote: >> >>> We are using a GOCD AWS ECS elastic agent plugin. >>> GOCD version: GoCD Version: 23.4.0 >>> >>> GoCD Elastic Agent Plugin for Amazon ECS >>> >>> - Version7.3.0-416 >>> - >>> - >>> - >>> - >>> >>> >>> *AMI id: *ami-0ba9fb6bc8faf1fe0 >>> >>> >>> *Elastic instance is coming up and its not getting assigned to ECS >>> cluster, we logged in to server and found the blow error. * >>> >>> [root@ip-******* ~]# systemctl restart docker >>> Job for docker.service failed because start of the service was attempted >>> too often. See "systemctl status docker.service" and "journalctl -xe" for >>> details. >>> To force a start use "systemctl reset-failed docker.service" followed by >>> "systemctl start docker.service" again. >>> [root@ip- ******* ~]# journalctl -xe >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit ecs.service has finished shutting down. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: start >>> request repeated too quickly for docker.service >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed to >>> start Docker Application Container Engine. >>> -- Subject: Unit docker.service has failed >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit docker.service has failed. >>> -- >>> -- The result is failed. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>> docker.service failed. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Starting >>> Amazon Elastic Container Service - container agent... >>> -- Subject: Unit ecs.service has begun start-up >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit ecs.service has begun starting up. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>> ecs.service: control process exited, code=exited status=1 >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=info time=2024-09-02T16:03:20Z msg="post-stop" >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=info time=2024-09-02T16:03:20Z msg="Cleaning up the credentials >>> endpoint setup for Amazon El >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' >>> for iptables route: ex >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' >>> for iptables route: ex >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' >>> for iptables route: ex >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: >>> level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' >>> for iptables route: ex >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed to >>> start Amazon Elastic Container Service - container agent. >>> -- Subject: Unit ecs.service has failed >>> -- Defined-By: systemd >>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >>> -- >>> -- Unit ecs.service has failed. >>> -- >>> -- The result is failed. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Unit >>> ecs.service entered failed state. >>> Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: >>> ecs.service failed. >>> >>> >>> >>> [root@ipXXXX ~]# df -hT >>> Filesystem Type Size Used Avail Use% Mounted on >>> devtmpfs devtmpfs 7.7G 0 7.7G 0% /dev >>> tmpfs tmpfs 7.7G 0 7.7G 0% /dev/shm >>> tmpfs tmpfs 7.7G 376K 7.7G 1% /run >>> tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup >>> /dev/nvme0n1p1 xfs 100G 2.4G 98G 3% / >>> tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/0 >>> [root@ip-10-226-11-63 ~]# docker --version >>> Docker version 25.0.5, build 5dc9bcc >>> >>> BELOW User data script we are using and getting excited while spinning >>> up an error. >>> >>> "ECS_INSTANCE_ATTRIBUTES={"server-id":"31e424ad-e242-45d2-a5bb-0ef7be0d8306"} >>> EOT echo 'File /etc/ecs/ecs.config successfully created.' log "Finished >>> executing GoCD's user data script, now executing custom user data script >>> from use, if present." #!/bin/bash echo "ECS_CLUSTER=GoCD-ECS-UAT" >> >>> /etc/ecs/ecs.config log "Finished executing user specified user data >>> script." --// #cloud-config cloud_final_modules: - [scripts-user, always] >>> --// Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: 1.0 >>> Content-Transfer-Encoding: 7bit Content-Disposition: attachment; >>> filename="initialize_instance_store" #!/bin/bash exec > >(tee >>> /var/log/initialize_instance_store.log | logger -t user-data -s >>> 2>/dev/console) 2>&1 function log() { echo "[$(date "+%Y-%m-%d >>> %H:%M:%S")] - $1" >> /var/log/initialize_instance_store.log } function >>> try() { $@ return 0 } log "Starting to setup instance store for the >>> docker." INSTANCE_STORES=$(ls >>> /dev/disk/by-id/*EC2_NVMe_Instance_Storage*-ns-1) if [ -z >>> "${INSTANCE_STORES}" ]; then log "No instance store detected." fi >>> VOLUMES="$INSTANCE_STORES" if [ -e "/dev/xvdcz" ]; then log "Instance >>> has /dev/xvdcz EBS volume. Using it for docker logical volume group." >>> VOLUMES="$VOLUMES /dev/xvdcz" fi if [ -z "${VOLUMES}" ]; then log "No >>> addition volumes. Using box standard docker setup." else log "Available >>> instance stores: ${VOLUMES}." log "Setting up the docker logical volume >>> group." service docker stop rm -rf /var/lib/docker/* dmsetup >>> remove_all VOLUME_GROUP=docker LOGICAL_VOLUME=docker-pool try >>> vgremove -y "${VOLUME_GROUP}" try lvremove -y "${LOGICAL_VOLUME}" >>> vgcreate -y "${VOLUME_GROUP}" ${VOLUMES} sleep 2 lvcreate -y -l >>> 5%VG -n ${LOGICAL_VOLUME}\meta ${VOLUME_GROUP} lvcreate -y -l 90%VG -n >>> ${LOGICAL_VOLUME} ${VOLUME_GROUP} sleep 2 lvconvert -y --zero n >>> --thinpool ${VOLUME_GROUP}/${LOGICAL_VOLUME} --poolmetadata >>> ${VOLUME_GROUP}/${LOGICAL_VOLUME}\meta echo 'DOCKER_STORAGE_OPTIONS=" >>> --storage-driver devicemapper --storage-opt >>> dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt >>> dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true >>> --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true"' > >>> /etc/sysconfig/docker-storage test -f /bin/systemctl && systemctl >>> reset-failed docker.service service docker restart test -f >>> /bin/systemctl && systemctl enable --no-block --now ecs fi log "Setup >>> completed." --//" >>> >> -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com > <https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CANiY96Y-svaDeOWqseTpqSPEE48G1rogWc%3DznGHuYLH5Nr%2B%2Bwg%40mail.gmail.com.
