Re: [PATCH v5 1/4] Jobs based on custom runners: documentation and configuration placeholder

Philippe Mathieu-Daudé Wed, 24 Feb 2021 03:59:13 -0800

On 2/23/21 7:09 PM, Cleber Rosa wrote:
> On Tue, Feb 23, 2021 at 06:34:07PM +0100, Philippe Mathieu-Daudé wrote:
>> On 2/23/21 6:24 PM, Philippe Mathieu-Daudé wrote:
>>> On 2/23/21 5:47 PM, Cleber Rosa wrote:
>>>> On Tue, Feb 23, 2021 at 05:37:04PM +0100, Philippe Mathieu-Daudé wrote:
>>>>> On 2/23/21 12:25 PM, Thomas Huth wrote:
>>>>>> On 19/02/2021 22.58, Cleber Rosa wrote:
>>>>>>> As described in the included documentation, the "custom runner" jobs
>>>>>>> extend the GitLab CI jobs already in place.  One of their primary
>>>>>>> goals of catching and preventing regressions on a wider number of host
>>>>>>> systems than the ones provided by GitLab's shared runners.
>>>>>>>
>>>>>>> This sets the stage in which other community members can add their own
>>>>>>> machine configuration documentation/scripts, and accompanying job
>>>>>>> definitions.  As a general rule, those newly added contributed jobs
>>>>>>> should run as "non-gating", until their reliability is verified (AKA
>>>>>>> "allow_failure: true").
>>>>>>>
>>>>>>> Signed-off-by: Cleber Rosa <cr...@redhat.com>
>>>>>>> ---
>>>>>>>   .gitlab-ci.d/custom-runners.yml | 14 ++++++++++++++
>>>>>>>   .gitlab-ci.yml                  |  1 +
>>>>>>>   docs/devel/ci.rst               | 28 ++++++++++++++++++++++++++++
>>>>>>>   docs/devel/index.rst            |  1 +
>>>>>>>   4 files changed, 44 insertions(+)
>>>>>>>   create mode 100644 .gitlab-ci.d/custom-runners.yml
>>>>>>>   create mode 100644 docs/devel/ci.rst
>>>>>>>
>>>>>>> diff --git a/.gitlab-ci.d/custom-runners.yml
>>>>>>> b/.gitlab-ci.d/custom-runners.yml
>>>>>>> new file mode 100644
>>>>>>> index 0000000000..3004da2bda
>>>>>>> --- /dev/null
>>>>>>> +++ b/.gitlab-ci.d/custom-runners.yml
>>>>>>> @@ -0,0 +1,14 @@
>>>>>>> +# The CI jobs defined here require GitLab runners installed and
>>>>>>> +# registered on machines that match their operating system names,
>>>>>>> +# versions and architectures.  This is in contrast to the other CI
>>>>>>> +# jobs that are intended to run on GitLab's "shared" runners.
>>>>>>> +
>>>>>>> +# Different than the default approach on "shared" runners, based on
>>>>>>> +# containers, the custom runners have no such *requirement*, as those
>>>>>>> +# jobs should be capable of running on operating systems with no
>>>>>>> +# compatible container implementation, or no support from
>>>>>>> +# gitlab-runner.  To avoid problems that gitlab-runner can cause while
>>>>>>> +# reusing the GIT repository, let's enable the recursive submodule
>>>>>>> +# strategy.
>>>>>>> +variables:
>>>>>>> +  GIT_SUBMODULE_STRATEGY: recursive
>>>>>>
>>>>>> Is it really necessary? I thought our configure script would take care
>>>>>> of the submodules?
>>>>>
>>>>
>>>> I've done a lot of testing on bare metal systems, and the problems
>>>> that come from reusing the same system and failed cleanups can be very
>>>> frustrating.  It's unfortunate that we need this, but it was the
>>>> simplest and most reliable solution I found.  :/
>>>>
>>>> Having said that, I noticed after I posted this series that this is
>>>> affecting all other jobs.  We don't need it that in the jobs based
>>>> on containers (for obvious reasons), so I see two options:
>>>>
>>>> 1) have it enabled on all jobs for consistency
>>>>
>>>> 2) have it enabled only on jobs that will reuse the repo
>>>>
>>>>> Well, if there is a failure during the first clone (I got one network
>>>>> timeout in the middle) 
>>>
>>> [This network failure is pasted at the end]
>>>
>>>>> then next time it doesn't work:
>>>>>
>>>>> Updating/initializing submodules recursively...
>>>>> Synchronizing submodule url for 'capstone'
>>>>> Synchronizing submodule url for 'dtc'
>>>>> Synchronizing submodule url for 'meson'
>>>>> Synchronizing submodule url for 'roms/QemuMacDrivers'
>>>>> Synchronizing submodule url for 'roms/SLOF'
>>>>> Synchronizing submodule url for 'roms/edk2'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/BaseTools/Source/C/BrotliCompress/brotli'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/BaseTools/Source/C/BrotliCompress/brotli/research/esaxx'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/BaseTools/Source/C/BrotliCompress/brotli/research/libdivsufsort'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/MdeModulePkg/Library/BrotliCustomDecompressLib/brotli'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/MdeModulePkg/Universal/RegularExpressionDxe/oniguruma'
>>>>> Synchronizing submodule url for
>>>>> 'roms/edk2/UnitTestFrameworkPkg/Library/CmockaLib/cmocka'
>>
>> So far, beside the repository useful for QEMU, I cloned:
>>
>> - boringssl
>> - krb5
>> - pyca-cryptography
>> - esaxx
>> - libdivsufsort
>> - oniguruma
>> - openssl
>> - brotli
>> - cmocka
>>
> 
> Hi Phil,
> 
> I'm not following what you meant by "I cloned"... Are you experimenting
> with this on a machine of your own and manually cloning the submodules?


I meant "my test runner has been cloning ..."

>> But reach the runner time limit of 2h.

The first failure was 1h, I raised the job limit to the maximum
I could use for this runner, 2h.

>> The directory reports 3GB of source code.
>>
>> I don't think the series has been tested enough before posting,
> 
> Please take into consideration that this series, although simple in
> content, touches and interacts with a lot of moving pieces, and
> possibly with personal systems that I did not have, or will have,
> access to.  As far as public testing proof goes, you can see a
> pipeline here with this version of this series here:
> 
>    https://gitlab.com/cleber.gnu/qemu/-/pipelines/258982039/builds

Expand the timeout and retry the same job on the same runner
various times:

diff --git a/.gitlab-ci.d/custom-runners.yml
b/.gitlab-ci.d/custom-runners.yml
@@ -17,6 +17,7 @@ variables:
 # setup by the scripts/ci/setup/build-environment.yml task
 # "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
 ubuntu-18.04-s390x-all-linux-static:
+ timeout: 2h 30m
  allow_failure: true
  needs: []
  stage: build

Each time it will clone more submodules.

I stopped at the 3rd intent.

> As I said elsewhere, I only noticed the recursive submodule being
> applied to the existing jobs after I submitted the series.  Mea culpa.
> But:
> 
>  * none of the jobs took noticeably longer than the previous baseline
>  * there was one *container build failure* (safe to say it's not
>    related)
>  * all other jobs passed successfully

I had less luck then (see the docker-dind jobs started on the custom
runner commented elsewhere in this thread).

> And, along with the previous versions, this series were tested on all
> the previously included architectures and operating systems.  It's
> unfortunate that because of your experience at this time (my
> apologies), you don't realize the amount of testing done so far.

As I commented to Erik on IRC, the single difference I did
is use the distribution runner, not the official one:

$ sudo apt-get install gitlab-runner docker.io

Then registered changing the path (/usr/bin/gitlab-runner instead
of /usr/local/bin/gitlab-runner). Everything else left unchanged.

>> I'm stopping here my experiments.
>>
>> Regards,
>>
>> Phil.
>>
> 
> I honestly appreciate your help here up to this point.
> 
> Regards,
> - Cleber.
>

Re: [PATCH v5 1/4] Jobs based on custom runners: documentation and configuration placeholder

Reply via email to