OK. I took a look . It looks like indeed "core" tests" are briefly (and
sometimes for  a longer time) pass over 50% of memory available on Github
Runners. I do not think optimizing them now makes little sense - because
even if we optimize them now, they will likely soon again reach 50-60% of
available memory, which - when ther are other parallel tests running might
easily get OOM.

It looks like those are only "Core" type of tests so the solution will be
(similarly as with "Integration" tests) to separate them out to a
non-parallel run for github runners.

On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]> wrote:

> Yep. Apparently one of the recent tests is using too much memory. I had
> some private errands that made me less available last few days - but I will
> have time to catch-up tonight/tomorrow.
>
> Thanks for changing the "parallel" level in your PR - that will give me
> more datapoints. I've just re-run both PRs with "debug-ci-resources" label.
> This is our "debug" label to show resource use during the build and i might
> be able to find and fix the root cause.
>
> For the future - in case any other committer wants to investigate it,
> setting the "debug-ci-resources" labels turns on the debugging mode showing
> this information periodically  alongside the progress of tests - it can be
> helpful in determining what caused the OOM:
>
> CONTAINER ID   NAME                                            CPU %
> MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
> c46832148ff7   airflow-always-mssql_airflow_run_e59b6039c3d8   99.59%
>  365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB   8.97MB / 20.5kB   8
> f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1               0.00%
> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
> a668cdedc717   airflow-api-mssql_airflow_run_bcc466077ac0      35.07%
>  431.4MiB / 6.789GiB   6.21%     2.26MB / 4.47MB   73.2MB / 20.5kB   8
> f306f4221ba1   airflow-api-mssql_mssqlsetup_1                  0.00%
> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
> 7f10748e9496   airflow-api-mssql_mssql_1                       30.66%
>  735.5MiB / 6.789GiB   10.58%    4.47MB / 2.26MB   36.8MB / 124MB    132
> 8b5ca767ed0c   airflow-always-mssql_mssql_1                    12.59%
>  716.5MiB / 6.789GiB   10.31%    3.33MB / 1.63MB   36.7MB / 52.7MB   131
>
>               total        used        free      shared  buff/cache
> available
> Mem:           6951        2939         200           6        3811
>  3702
> Swap:             0           0           0
>
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/root        84G   51G   33G  61% /
> /dev/sda15      105M  5.2M  100M   5% /boot/efi
> /dev/sdb1        14G  4.1G  9.0G  32% /mnt
>
> J.
>
>
> On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko <[email protected]>
> wrote:
>
>> Hey all,
>>
>>
>> Just to throw another data point in the ring, I've had a PR
>> <https://github.com/apache/airflow/pull/19410> stuck in the same way as
>> well. Several retries are all failing with the same OOM.
>>
>>
>> I've also dug through the Github Actions history and found a few others.
>> So it doesn't seem to be just a one-off.
>>
>>
>> Cheers,
>> Niko
>> ------------------------------
>> *From:* Khalid Mammadov <[email protected]>
>> *Sent:* Tuesday, November 9, 2021 6:24 AM
>> *To:* [email protected]
>> *Subject:* [EXTERNAL] OOM issue in the CI
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>> Hi Devs,
>>
>> I have been working on below PR for and run into OOM issue during testing
>> on GitHub actions (you can see in commit history).
>>
>> https://github.com/apache/airflow/pull/19139/files
>>
>> The tests for databases Postgres, MySQL etc. fails due to OOM and docker
>> gets killed.
>>
>> I have reduced parallelism to 1 "in the code" *temporarily* (the only
>> extra change in the PR) and it passes all the checks which confirms the
>> issue.
>>
>>
>> I was hoping if you could advise the best course of action in this
>> situation so I can force parallelism to 1 to get all checks passed or some
>> other way to solve OOM?
>>
>> Any help would be appreciated.
>>
>>
>> Thanks in advance
>>
>> Khalid
>>
>

Reply via email to