Merged!  Please rebase (Khalid- you can remove the workaround of yours) and
let me know.

There is one failure that happened in my tests:

https://github.com/apache/airflow/runs/4165358689?check_suite_focus=true  -
but we can observe results of this one and try to find the reason
separately if it continues to repeat.

J.

On Wed, Nov 10, 2021 at 12:49 PM Jarek Potiuk <[email protected]> wrote:

> Fix being tested in: https://github.com/apache/airflow/pull/19512
> (committer PR) and https://github.com/apache/airflow/pull/19514 (regular
> user PR).
>
>
> On Wed, Nov 10, 2021 at 11:25 AM Jarek Potiuk <[email protected]> wrote:
>
>> OK. I took a look . It looks like indeed "core" tests" are briefly (and
>> sometimes for  a longer time) pass over 50% of memory available on Github
>> Runners. I do not think optimizing them now makes little sense - because
>> even if we optimize them now, they will likely soon again reach 50-60% of
>> available memory, which - when ther are other parallel tests running might
>> easily get OOM.
>>
>> It looks like those are only "Core" type of tests so the solution will be
>> (similarly as with "Integration" tests) to separate them out to a
>> non-parallel run for github runners.
>>
>> On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]> wrote:
>>
>>> Yep. Apparently one of the recent tests is using too much memory. I had
>>> some private errands that made me less available last few days - but I will
>>> have time to catch-up tonight/tomorrow.
>>>
>>> Thanks for changing the "parallel" level in your PR - that will give me
>>> more datapoints. I've just re-run both PRs with "debug-ci-resources" label.
>>> This is our "debug" label to show resource use during the build and i might
>>> be able to find and fix the root cause.
>>>
>>> For the future - in case any other committer wants to investigate it,
>>> setting the "debug-ci-resources" labels turns on the debugging mode showing
>>> this information periodically  alongside the progress of tests - it can be
>>> helpful in determining what caused the OOM:
>>>
>>> CONTAINER ID   NAME                                            CPU %
>>> MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
>>> c46832148ff7   airflow-always-mssql_airflow_run_e59b6039c3d8   99.59%
>>>  365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB   8.97MB / 20.5kB   8
>>> f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1               0.00%
>>> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
>>> a668cdedc717   airflow-api-mssql_airflow_run_bcc466077ac0      35.07%
>>>  431.4MiB / 6.789GiB   6.21%     2.26MB / 4.47MB   73.2MB / 20.5kB   8
>>> f306f4221ba1   airflow-api-mssql_mssqlsetup_1                  0.00%
>>> 0B / 0B               0.00%     0B / 0B           0B / 0B           0
>>> 7f10748e9496   airflow-api-mssql_mssql_1                       30.66%
>>>  735.5MiB / 6.789GiB   10.58%    4.47MB / 2.26MB   36.8MB / 124MB    132
>>> 8b5ca767ed0c   airflow-always-mssql_mssql_1                    12.59%
>>>  716.5MiB / 6.789GiB   10.31%    3.33MB / 1.63MB   36.7MB / 52.7MB   131
>>>
>>>               total        used        free      shared  buff/cache
>>> available
>>> Mem:           6951        2939         200           6        3811
>>>    3702
>>> Swap:             0           0           0
>>>
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/root        84G   51G   33G  61% /
>>> /dev/sda15      105M  5.2M  100M   5% /boot/efi
>>> /dev/sdb1        14G  4.1G  9.0G  32% /mnt
>>>
>>> J.
>>>
>>>
>>> On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko
>>> <[email protected]> wrote:
>>>
>>>> Hey all,
>>>>
>>>>
>>>> Just to throw another data point in the ring, I've had a PR
>>>> <https://github.com/apache/airflow/pull/19410> stuck in the same way
>>>> as well. Several retries are all failing with the same OOM.
>>>>
>>>>
>>>> I've also dug through the Github Actions history and found a few
>>>> others. So it doesn't seem to be just a one-off.
>>>>
>>>>
>>>> Cheers,
>>>> Niko
>>>> ------------------------------
>>>> *From:* Khalid Mammadov <[email protected]>
>>>> *Sent:* Tuesday, November 9, 2021 6:24 AM
>>>> *To:* [email protected]
>>>> *Subject:* [EXTERNAL] OOM issue in the CI
>>>>
>>>>
>>>> *CAUTION*: This email originated from outside of the organization. Do
>>>> not click links or open attachments unless you can confirm the sender and
>>>> know the content is safe.
>>>>
>>>> Hi Devs,
>>>>
>>>> I have been working on below PR for and run into OOM issue during
>>>> testing on GitHub actions (you can see in commit history).
>>>>
>>>> https://github.com/apache/airflow/pull/19139/files
>>>>
>>>> The tests for databases Postgres, MySQL etc. fails due to OOM and
>>>> docker gets killed.
>>>>
>>>> I have reduced parallelism to 1 "in the code" *temporarily* (the only
>>>> extra change in the PR) and it passes all the checks which confirms the
>>>> issue.
>>>>
>>>>
>>>> I was hoping if you could advise the best course of action in this
>>>> situation so I can force parallelism to 1 to get all checks passed or some
>>>> other way to solve OOM?
>>>>
>>>> Any help would be appreciated.
>>>>
>>>>
>>>> Thanks in advance
>>>>
>>>> Khalid
>>>>
>>>

Reply via email to