Re: May close() return any error code?
On Sun, 02 Aug 2015 16:07:06 +0200, Al Viro wrote: > > On Wed, Jul 29, 2015 at 07:45:11AM -0700, Dmitry Torokhov wrote: > > > This seems coming from evdev_flush(). As there is no fd leak, it's no > > > big problem per se. But, now the question is whether returning such > > > an error code is correct behavior at all. At least, it doesn't seem > > > defined in POSIX: > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html > > > > Hmm, if I checked the right version of the code close_nointr_nofail() > > expects only 0 as the return code so even if we change the kernel to > > use more conforming -EIO instead of -ENODEV systemd will still die... > > > > The question is whether we really need to propagate return value from > > f_op->flush() up to userspace in filp_close(). Why don't we ask Al? > > That's the whole damn point of having ->flush(). And yes, we do need that - > things like NFS (not to mention tapes, etc.) do rely on that. > > Whether it makes sense to do this kind of "do something that might have > a failure to report on each close()" for evdev is up to driver, obviously. So, the behavior of VFS layer is as designed. Then I suppose the fix should be rather in evdev.c. Dmitry, could you paper over it? thanks, Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Sun, 02 Aug 2015 16:07:06 +0200, Al Viro wrote: On Wed, Jul 29, 2015 at 07:45:11AM -0700, Dmitry Torokhov wrote: This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Hmm, if I checked the right version of the code close_nointr_nofail() expects only 0 as the return code so even if we change the kernel to use more conforming -EIO instead of -ENODEV systemd will still die... The question is whether we really need to propagate return value from f_op-flush() up to userspace in filp_close(). Why don't we ask Al? That's the whole damn point of having -flush(). And yes, we do need that - things like NFS (not to mention tapes, etc.) do rely on that. Whether it makes sense to do this kind of do something that might have a failure to report on each close() for evdev is up to driver, obviously. So, the behavior of VFS layer is as designed. Then I suppose the fix should be rather in evdev.c. Dmitry, could you paper over it? thanks, Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed, Jul 29, 2015 at 07:45:11AM -0700, Dmitry Torokhov wrote: > > This seems coming from evdev_flush(). As there is no fd leak, it's no > > big problem per se. But, now the question is whether returning such > > an error code is correct behavior at all. At least, it doesn't seem > > defined in POSIX: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html > > Hmm, if I checked the right version of the code close_nointr_nofail() > expects only 0 as the return code so even if we change the kernel to > use more conforming -EIO instead of -ENODEV systemd will still die... > > The question is whether we really need to propagate return value from > f_op->flush() up to userspace in filp_close(). Why don't we ask Al? That's the whole damn point of having ->flush(). And yes, we do need that - things like NFS (not to mention tapes, etc.) do rely on that. Whether it makes sense to do this kind of "do something that might have a failure to report on each close()" for evdev is up to driver, obviously. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Sun, Aug 02, 2015 at 09:42:20AM +0200, Pavel Machek wrote: > > This seems coming from evdev_flush(). As there is no fd leak, it's no > > big problem per se. But, now the question is whether returning such > > an error code is correct behavior at all. At least, it doesn't seem > > defined in POSIX: > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html > > Returning an error from close() would imply that file descriptor is > not closed seems like bad idea. Just fix the kernel not to do it. The only thing implied here is failure to RTF{M,S}. To quote close(2): NOTES Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a pre‐ vious write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota. Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed. That's Linux one. FreeBSD one says In case of any error except EBADF, the supplied file descriptor is deal- located and therefore is no longer valid. and that matches behaviour of historical BSD variants as well. POSIX is being its usual charming self and says "if a kernel shipped by $VALUED_MEMBER does this and that cretinous thing, far be it from us to call it broken". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed 2015-07-29 12:46:59, Takashi Iwai wrote: > Hi, > > while debugging a problem of X and gdm with the old systemd-210, we > encountered a sudden death of systemd-logind, and this turned out to > be an unexpected errno from close(). The close() call for input > devices returns ENODEV error. The logind in systemd-210 treats this > error code as fatal, triggers assert() and eventually kills itself. > The details are found in an openSUSE bugzilla thread: > https://bugzilla.opensuse.org/show_bug.cgi?id=939571 > > This seems coming from evdev_flush(). As there is no fd leak, it's no > big problem per se. But, now the question is whether returning such > an error code is correct behavior at all. At least, it doesn't seem > defined in POSIX: > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Returning an error from close() would imply that file descriptor is not closed seems like bad idea. Just fix the kernel not to do it. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed 2015-07-29 12:46:59, Takashi Iwai wrote: Hi, while debugging a problem of X and gdm with the old systemd-210, we encountered a sudden death of systemd-logind, and this turned out to be an unexpected errno from close(). The close() call for input devices returns ENODEV error. The logind in systemd-210 treats this error code as fatal, triggers assert() and eventually kills itself. The details are found in an openSUSE bugzilla thread: https://bugzilla.opensuse.org/show_bug.cgi?id=939571 This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Returning an error from close() would imply that file descriptor is not closed seems like bad idea. Just fix the kernel not to do it. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed, Jul 29, 2015 at 07:45:11AM -0700, Dmitry Torokhov wrote: This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Hmm, if I checked the right version of the code close_nointr_nofail() expects only 0 as the return code so even if we change the kernel to use more conforming -EIO instead of -ENODEV systemd will still die... The question is whether we really need to propagate return value from f_op-flush() up to userspace in filp_close(). Why don't we ask Al? That's the whole damn point of having -flush(). And yes, we do need that - things like NFS (not to mention tapes, etc.) do rely on that. Whether it makes sense to do this kind of do something that might have a failure to report on each close() for evdev is up to driver, obviously. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Sun, Aug 02, 2015 at 09:42:20AM +0200, Pavel Machek wrote: This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Returning an error from close() would imply that file descriptor is not closed seems like bad idea. Just fix the kernel not to do it. The only thing implied here is failure to RTF{M,S}. To quote close(2): NOTES Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a pre‐ vious write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota. Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed. That's Linux one. FreeBSD one says In case of any error except EBADF, the supplied file descriptor is deal- located and therefore is no longer valid. and that matches behaviour of historical BSD variants as well. POSIX is being its usual charming self and says if a kernel shipped by $VALUED_MEMBER does this and that cretinous thing, far be it from us to call it broken. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed, 29 Jul 2015 16:45:11 +0200, Dmitry Torokhov wrote: > > HI Takashi, > > On Wed, Jul 29, 2015 at 12:46:59PM +0200, Takashi Iwai wrote: > > Hi, > > > > while debugging a problem of X and gdm with the old systemd-210, we > > encountered a sudden death of systemd-logind, and this turned out to > > be an unexpected errno from close(). The close() call for input > > devices returns ENODEV error. The logind in systemd-210 treats this > > error code as fatal, triggers assert() and eventually kills itself. > > The details are found in an openSUSE bugzilla thread: > > https://bugzilla.opensuse.org/show_bug.cgi?id=939571 > > > > This seems coming from evdev_flush(). As there is no fd leak, it's no > > big problem per se. But, now the question is whether returning such > > an error code is correct behavior at all. At least, it doesn't seem > > defined in POSIX: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html > > Hmm, if I checked the right version of the code close_nointr_nofail() > expects only 0 as the return code so even if we change the kernel to > use more conforming -EIO instead of -ENODEV systemd will still die... Yes, it can be seen rather as a bug in systemd-210, and the later systemd version doesn't seem to trigger assert() with that condition. > The question is whether we really need to propagate return value from > f_op->flush() up to userspace in filp_close(). Why don't we ask Al? Oh yes, of course, Al is the best person to ask :) Looking back at this problem now, I find it even dangerous to propagate an error, especially -EINTR. Then user-space may interpret it as if the close didn't succeed and the fd were alive, although the kernel performed the close itself. thanks, Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
On Wed, 29 Jul 2015 16:45:11 +0200, Dmitry Torokhov wrote: HI Takashi, On Wed, Jul 29, 2015 at 12:46:59PM +0200, Takashi Iwai wrote: Hi, while debugging a problem of X and gdm with the old systemd-210, we encountered a sudden death of systemd-logind, and this turned out to be an unexpected errno from close(). The close() call for input devices returns ENODEV error. The logind in systemd-210 treats this error code as fatal, triggers assert() and eventually kills itself. The details are found in an openSUSE bugzilla thread: https://bugzilla.opensuse.org/show_bug.cgi?id=939571 This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Hmm, if I checked the right version of the code close_nointr_nofail() expects only 0 as the return code so even if we change the kernel to use more conforming -EIO instead of -ENODEV systemd will still die... Yes, it can be seen rather as a bug in systemd-210, and the later systemd version doesn't seem to trigger assert() with that condition. The question is whether we really need to propagate return value from f_op-flush() up to userspace in filp_close(). Why don't we ask Al? Oh yes, of course, Al is the best person to ask :) Looking back at this problem now, I find it even dangerous to propagate an error, especially -EINTR. Then user-space may interpret it as if the close didn't succeed and the fd were alive, although the kernel performed the close itself. thanks, Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
HI Takashi, On Wed, Jul 29, 2015 at 12:46:59PM +0200, Takashi Iwai wrote: > Hi, > > while debugging a problem of X and gdm with the old systemd-210, we > encountered a sudden death of systemd-logind, and this turned out to > be an unexpected errno from close(). The close() call for input > devices returns ENODEV error. The logind in systemd-210 treats this > error code as fatal, triggers assert() and eventually kills itself. > The details are found in an openSUSE bugzilla thread: > https://bugzilla.opensuse.org/show_bug.cgi?id=939571 > > This seems coming from evdev_flush(). As there is no fd leak, it's no > big problem per se. But, now the question is whether returning such > an error code is correct behavior at all. At least, it doesn't seem > defined in POSIX: > http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Hmm, if I checked the right version of the code close_nointr_nofail() expects only 0 as the return code so even if we change the kernel to use more conforming -EIO instead of -ENODEV systemd will still die... The question is whether we really need to propagate return value from f_op->flush() up to userspace in filp_close(). Why don't we ask Al? Thanks. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
May close() return any error code?
Hi, while debugging a problem of X and gdm with the old systemd-210, we encountered a sudden death of systemd-logind, and this turned out to be an unexpected errno from close(). The close() call for input devices returns ENODEV error. The logind in systemd-210 treats this error code as fatal, triggers assert() and eventually kills itself. The details are found in an openSUSE bugzilla thread: https://bugzilla.opensuse.org/show_bug.cgi?id=939571 This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html I myself have no strong opinion here, so would like just to ask others suggestions / comments. thanks, Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
May close() return any error code?
Hi, while debugging a problem of X and gdm with the old systemd-210, we encountered a sudden death of systemd-logind, and this turned out to be an unexpected errno from close(). The close() call for input devices returns ENODEV error. The logind in systemd-210 treats this error code as fatal, triggers assert() and eventually kills itself. The details are found in an openSUSE bugzilla thread: https://bugzilla.opensuse.org/show_bug.cgi?id=939571 This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html I myself have no strong opinion here, so would like just to ask others suggestions / comments. thanks, Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: May close() return any error code?
HI Takashi, On Wed, Jul 29, 2015 at 12:46:59PM +0200, Takashi Iwai wrote: Hi, while debugging a problem of X and gdm with the old systemd-210, we encountered a sudden death of systemd-logind, and this turned out to be an unexpected errno from close(). The close() call for input devices returns ENODEV error. The logind in systemd-210 treats this error code as fatal, triggers assert() and eventually kills itself. The details are found in an openSUSE bugzilla thread: https://bugzilla.opensuse.org/show_bug.cgi?id=939571 This seems coming from evdev_flush(). As there is no fd leak, it's no big problem per se. But, now the question is whether returning such an error code is correct behavior at all. At least, it doesn't seem defined in POSIX: http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html Hmm, if I checked the right version of the code close_nointr_nofail() expects only 0 as the return code so even if we change the kernel to use more conforming -EIO instead of -ENODEV systemd will still die... The question is whether we really need to propagate return value from f_op-flush() up to userspace in filp_close(). Why don't we ask Al? Thanks. -- Dmitry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/