Not sure if it's related but I'll leave this here. We have a bunch of EC2 instances running Ubuntu 18.04 LTS and GitLab Runner, with the latter configured to execute build jobs inside Docker containers. Our EC2 instances are of the t2.medium, t2.xlarge and t3.medium sizes and run in the eu-west-1 region (in different AZs).
After upgrading from kernel 4.15.0-1021-aws to 4.15.0-1023-aws we've seen a lot of build failures across all of our GitLab Runner instances on Ubuntu 18.04 LTS. The problems happen intermittently. Specifically the build jobs fail during the startup phase where GitLab Runner clones the git repository. The git clone operation fails with: ``` Cloning repository... Cloning into '/builds/my-group/my-project'... error: RPC failed; curl 18 transfer closed with outstanding read data remaining fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed ``` According to the documentation at https://curl.haxx.se/libcurl/c /libcurl-errors.html this means: > CURLE_PARTIAL_FILE (18) > > A file transfer was shorter or larger than expected. This happens when the > server first reports an expected transfer size, and then delivers data that > doesn't match the previously given size. When this problem happened it seems that the bandwidth used by the `git clone` operation eventually dropped to a few or zero kByte/s (the reporting wasn't granular enough to determine which). I assume that the GitLab server later closed/timed out the connection before the git client had downloaded all of the data it expected, leaving us with a `CURLE_PARTIAL_FILE` error. We've also seen larger (10s-100s of MBs) file downloads hang indefinitely (e.g. apt install). We've rolled back changes made during the last few weeks one by one and eventually found that downgrading to 4.15.0-1021-aws solved the issue on all of our GitLab Runner instances (ranging from 11.2.0, 11.2.1 and 11.3.1). I couldn't spot anything obvious that seemed to be related to the issue in the changelog here: https://changelogs.ubuntu.com/changelogs/pool/main/l/linux-aws/linux-aws_4.15.0-1023.23/changelog FWIW, this is what `docker version` reports: ``` Client: Version: 18.06.1-ce API version: 1.38 Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:24:56 2018 OS/Arch: linux/amd64 Experimental: false Server: Engine: Version: 18.06.1-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:23:21 2018 OS/Arch: linux/amd64 Experimental: false ``` -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1796469 Title: aws s3 cp --recursive hangs on the last file on a large file transfer to instance Status in linux-aws package in Ubuntu: New Bug description: aws s3 cp --recursive hangs on the last file on a large transfer to an instance I have confirmed that this works on version Linux/4.15.0-1021-aws aws cli version aws-cli/1.16.23 Python/2.7.15rc1 Linux/4.15.0-1023-aws botocore/1.12.13 Ubuntu version Description: Ubuntu 18.04.1 LTS Release: 18.04 eu-west-1 - ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180912 - ami-00035f41c82244dab Package version linux-aws: Installed: 4.15.0.1023.23 Candidate: 4.15.0.1023.23 Version table: *** 4.15.0.1023.23 500 500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 100 /var/lib/dpkg/status 4.15.0.1007.7 500 500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1796469/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp