On Tue, Jan 06, 2026 at 09:19:59PM +0100, Mikulas Patocka wrote:
> 
> 
> On Tue, 6 Jan 2026, Liam R. Howlett wrote:
> 
> > * Mikulas Patocka <[email protected]> [260105 15:08]:
> > > 
> > > > If you only get the error message sometimes, does that mean there is
> > > > another signal check that isn't covered by this change - or another call
> > > > path?
> > > 
> > > This call path is also triggered by -EINTR from mm_take_all_locks: 
> > > "init_user_pages -> amdgpu_hmm_register -> mmu_interval_notifier_insert 
> > > -> 
> > > mmu_notifier_register -> __mmu_notifier_register -> mm_take_all_locks -> 
> > > return -EINTR". I am not expert in the GPU code, so I don't know how much 
> > > serious it is.
> > 
> > Okay, so the other call paths also end up getting the -EINTR from this
> > function?  Can you please add that detail to the commit message?
> 
> Yes. I'd like to ask the GPU people to look at it and say how much damage 
> this -EINTR could do. I don't know - I just saw the messages "Failed to 
> register MMU notifier: -4" in the syslog.
> 
> > This means that -EINTR can no longer be returned from open(), right?
> > Otherwise you are just reducing a race condition between open() and a
> > signal entering from your timer.
> 
> EINTR can be returned from open() in cases when it was historically 
> behaving this way - such as opening a fifo when there is no matching 
> process having it open.
> 
> But I think that opening /dev/kfd doesn't fall into this category.
>

Well, it's a device - opening can and often does have side-effects.
It's not too far-fetched to -EINTR here.

> NFS has an "intr" flag that makes the filesystem syscalls interruptible by 
> signals. It is off by default, because many programs don't expect EINTR 
> when opening, reading or writing plain files on a filesystem.
> 
> > Any other -EINTR system call will also cause you problems since you
> > continuously send signals to your process, so we'll have to change them
> > all for this to work?
> 
> I use SA_RESTART for the signals. And I retry all the syscalls on EINTR 
> just in case SA_RESTART didn't work. So, I don't experience random 
> failures in my code due to the periodic signal.
> 
> But there is code that I have no control over - such as the OpenCL shared 
> library.

Right. So I am wondering if just returning -ERESTARTSYS (whether in
mm_take_all_locks(), or in the AMD driver) would satisfy both parties.

Folks installing and using signals need to pay attention and set
SA_RESTART, but that's already best practice when dealing with third-party
code. open(2) should be transparently restartable.

WDYT?

-- 
Pedro

Reply via email to