Fix being tested in: https://github.com/apache/airflow/pull/19512
(committer PR) and https://github.com/apache/airflow/pull/19514 (regular
user PR).


On Wed, Nov 10, 2021 at 11:25 AM Jarek Potiuk <[email protected]> wrote:

> OK. I took a look . It looks like indeed "core" tests" are briefly (and
> sometimes for  a longer time) pass over 50% of memory available on Github
> Runners. I do not think optimizing them now makes little sense - because
> even if we optimize them now, they will likely soon again reach 50-60% of
> available memory, which - when ther are other parallel tests running might
> easily get OOM.
>
> It looks like those are only "Core" type of tests so the solution will be
> (similarly as with "Integration" tests) to separate them out to a
> non-parallel run for github runners.
>
> On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]> wrote:
>
>> Yep. Apparently one of the recent tests is using too much memory. I had
>> some private errands that made me less available last few days - but I will
>> have time to catch-up tonight/tomorrow.
>>
>> Thanks for changing the "parallel" level in your PR - that will give me
>> more datapoints. I've just re-run both PRs with "debug-ci-resources" label.
>> This is our "debug" label to show resource use during the build and i might
>> be able to find and fix the root cause.
>>
>> For the future - in case any other committer wants to investigate it,
>> setting the "debug-ci-resources" labels turns on the debugging mode showing
>> this information periodically  alongside the progress of tests - it can be
>> helpful in determining what caused the OOM:
>>
>> CONTAINER ID   NAME                                            CPU %
>> MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
>> c46832148ff7   airflow-always-mssql_airflow_run_e59b6039c3d8   99.59%
>>  365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB   8.97MB / 20.5kB   8
>> f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1               0.00%
>> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
>> a668cdedc717   airflow-api-mssql_airflow_run_bcc466077ac0      35.07%
>>  431.4MiB / 6.789GiB   6.21%     2.26MB / 4.47MB   73.2MB / 20.5kB   8
>> f306f4221ba1   airflow-api-mssql_mssqlsetup_1                  0.00%
>> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
>> 7f10748e9496   airflow-api-mssql_mssql_1                       30.66%
>>  735.5MiB / 6.789GiB   10.58%    4.47MB / 2.26MB   36.8MB / 124MB    132
>> 8b5ca767ed0c   airflow-always-mssql_mssql_1                    12.59%
>>  716.5MiB / 6.789GiB   10.31%    3.33MB / 1.63MB   36.7MB / 52.7MB   131
>>
>>               total        used        free      shared  buff/cache
>> available
>> Mem:           6951        2939         200           6        3811
>>  3702
>> Swap:             0           0           0
>>
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/root        84G   51G   33G  61% /
>> /dev/sda15      105M  5.2M  100M   5% /boot/efi
>> /dev/sdb1        14G  4.1G  9.0G  32% /mnt
>>
>> J.
>>
>>
>> On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko <[email protected]>
>> wrote:
>>
>>> Hey all,
>>>
>>>
>>> Just to throw another data point in the ring, I've had a PR
>>> <https://github.com/apache/airflow/pull/19410> stuck in the same way as
>>> well. Several retries are all failing with the same OOM.
>>>
>>>
>>> I've also dug through the Github Actions history and found a few others.
>>> So it doesn't seem to be just a one-off.
>>>
>>>
>>> Cheers,
>>> Niko
>>> ------------------------------
>>> *From:* Khalid Mammadov <[email protected]>
>>> *Sent:* Tuesday, November 9, 2021 6:24 AM
>>> *To:* [email protected]
>>> *Subject:* [EXTERNAL] OOM issue in the CI
>>>
>>>
>>> *CAUTION*: This email originated from outside of the organization. Do
>>> not click links or open attachments unless you can confirm the sender and
>>> know the content is safe.
>>>
>>> Hi Devs,
>>>
>>> I have been working on below PR for and run into OOM issue during
>>> testing on GitHub actions (you can see in commit history).
>>>
>>> https://github.com/apache/airflow/pull/19139/files
>>>
>>> The tests for databases Postgres, MySQL etc. fails due to OOM and docker
>>> gets killed.
>>>
>>> I have reduced parallelism to 1 "in the code" *temporarily* (the only
>>> extra change in the PR) and it passes all the checks which confirms the
>>> issue.
>>>
>>>
>>> I was hoping if you could advise the best course of action in this
>>> situation so I can force parallelism to 1 to get all checks passed or some
>>> other way to solve OOM?
>>>
>>> Any help would be appreciated.
>>>
>>>
>>> Thanks in advance
>>>
>>> Khalid
>>>
>>

Reply via email to