Thanks a lot Jarek, will do! On Wed, 10 Nov 2021, 13:40 Jarek Potiuk, <[email protected]> wrote:
> Merged! Please rebase (Khalid- you can remove the workaround of yours) > and let me know. > > There is one failure that happened in my tests: > > https://github.com/apache/airflow/runs/4165358689?check_suite_focus=true > - but we can observe results of this one and try to find the reason > separately if it continues to repeat. > > J. > > On Wed, Nov 10, 2021 at 12:49 PM Jarek Potiuk <[email protected]> wrote: > >> Fix being tested in: https://github.com/apache/airflow/pull/19512 >> (committer PR) and https://github.com/apache/airflow/pull/19514 (regular >> user PR). >> >> >> On Wed, Nov 10, 2021 at 11:25 AM Jarek Potiuk <[email protected]> wrote: >> >>> OK. I took a look . It looks like indeed "core" tests" are briefly (and >>> sometimes for a longer time) pass over 50% of memory available on Github >>> Runners. I do not think optimizing them now makes little sense - because >>> even if we optimize them now, they will likely soon again reach 50-60% of >>> available memory, which - when ther are other parallel tests running might >>> easily get OOM. >>> >>> It looks like those are only "Core" type of tests so the solution will >>> be (similarly as with "Integration" tests) to separate them out to a >>> non-parallel run for github runners. >>> >>> On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]> wrote: >>> >>>> Yep. Apparently one of the recent tests is using too much memory. I had >>>> some private errands that made me less available last few days - but I will >>>> have time to catch-up tonight/tomorrow. >>>> >>>> Thanks for changing the "parallel" level in your PR - that will give me >>>> more datapoints. I've just re-run both PRs with "debug-ci-resources" label. >>>> This is our "debug" label to show resource use during the build and i might >>>> be able to find and fix the root cause. >>>> >>>> For the future - in case any other committer wants to investigate it, >>>> setting the "debug-ci-resources" labels turns on the debugging mode showing >>>> this information periodically alongside the progress of tests - it can be >>>> helpful in determining what caused the OOM: >>>> >>>> CONTAINER ID NAME CPU % >>>> MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS >>>> c46832148ff7 airflow-always-mssql_airflow_run_e59b6039c3d8 99.59% >>>> 365.1MiB / 6.789GiB 5.25% 1.62MB / 3.33MB 8.97MB / 20.5kB 8 >>>> f4d2a192d6fc airflow-always-mssql_mssqlsetup_1 0.00% >>>> 0B / 0B 0.00% 0B / 0B 0B / 0B 0 >>>> a668cdedc717 airflow-api-mssql_airflow_run_bcc466077ac0 35.07% >>>> 431.4MiB / 6.789GiB 6.21% 2.26MB / 4.47MB 73.2MB / 20.5kB 8 >>>> f306f4221ba1 airflow-api-mssql_mssqlsetup_1 0.00% >>>> 0B / 0B 0.00% 0B / 0B 0B / 0B 0 >>>> 7f10748e9496 airflow-api-mssql_mssql_1 30.66% >>>> 735.5MiB / 6.789GiB 10.58% 4.47MB / 2.26MB 36.8MB / 124MB 132 >>>> 8b5ca767ed0c airflow-always-mssql_mssql_1 12.59% >>>> 716.5MiB / 6.789GiB 10.31% 3.33MB / 1.63MB 36.7MB / 52.7MB 131 >>>> >>>> total used free shared buff/cache >>>> available >>>> Mem: 6951 2939 200 6 3811 >>>> 3702 >>>> Swap: 0 0 0 >>>> >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/root 84G 51G 33G 61% / >>>> /dev/sda15 105M 5.2M 100M 5% /boot/efi >>>> /dev/sdb1 14G 4.1G 9.0G 32% /mnt >>>> >>>> J. >>>> >>>> >>>> On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko >>>> <[email protected]> wrote: >>>> >>>>> Hey all, >>>>> >>>>> >>>>> Just to throw another data point in the ring, I've had a PR >>>>> <https://github.com/apache/airflow/pull/19410> stuck in the same way >>>>> as well. Several retries are all failing with the same OOM. >>>>> >>>>> >>>>> I've also dug through the Github Actions history and found a few >>>>> others. So it doesn't seem to be just a one-off. >>>>> >>>>> >>>>> Cheers, >>>>> Niko >>>>> ------------------------------ >>>>> *From:* Khalid Mammadov <[email protected]> >>>>> *Sent:* Tuesday, November 9, 2021 6:24 AM >>>>> *To:* [email protected] >>>>> *Subject:* [EXTERNAL] OOM issue in the CI >>>>> >>>>> >>>>> *CAUTION*: This email originated from outside of the organization. Do >>>>> not click links or open attachments unless you can confirm the sender and >>>>> know the content is safe. >>>>> >>>>> Hi Devs, >>>>> >>>>> I have been working on below PR for and run into OOM issue during >>>>> testing on GitHub actions (you can see in commit history). >>>>> >>>>> https://github.com/apache/airflow/pull/19139/files >>>>> >>>>> The tests for databases Postgres, MySQL etc. fails due to OOM and >>>>> docker gets killed. >>>>> >>>>> I have reduced parallelism to 1 "in the code" *temporarily* (the only >>>>> extra change in the PR) and it passes all the checks which confirms the >>>>> issue. >>>>> >>>>> >>>>> I was hoping if you could advise the best course of action in this >>>>> situation so I can force parallelism to 1 to get all checks passed or some >>>>> other way to solve OOM? >>>>> >>>>> Any help would be appreciated. >>>>> >>>>> >>>>> Thanks in advance >>>>> >>>>> Khalid >>>>> >>>>
