On Sun May 25, 2025 at 7:19 PM UTC, Ujwal Kundur wrote: >> I'm afraid I'm too ignorant of this code to be able to suggest something >> good here. But, can we just remove the comment and plumb the gopts >> through to uffd_poll_thread()->uffd_handle_page_fault()->__copy_page()? >> >> This is not pretty but it lets us remove the global vars which is >> clearly a step in the right direction. > > Perhaps Andrew can weigh in? If I understood this correctly, we're > trying to assert that retrying a successful UFFDIO_COPY operation > always results in EEXIST. This is being done in a somewhat racy > fashion where a flag (test_uffdio_copy_eexist) is set every 10 seconds > using alarm(2). IMO this is a flaky test, we should either: > - remove this variable and associated logic entirely (preferred) > - use a probability function to set this a % of the time instead of > every 10 seconds > - use an async library that can replace the implementation without the > use of global vars
Sorry I don't have an opinion on which of these is the best (I can try to find some time to form an opionion on this later!), but: Fixing the flakiness sounds great, but I would suggest decoupling that from the refactoring. If it's practical, focus on removing the globals first, while leaving the fundamental logic the same, even if it's bad. Then as a separate series, fix the logic.