Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.13 at 10:16 -0400, Peter Hurley wrote: > On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote: > > On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: > >> Peter Hurley writes: > >> > Based on the other reports from Mikael and David, I suspect this > >> problem > >> > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > >> > > >> >pty: Ignore slave pty close() if never successfully opened > >> > > >> > This commit poisons the pty under certain error conditions that may > >> > occur from parallel open()s (or parallel close() with pending write()). > >> > > >> > It's unclear to me which error condition is triggered and how > >> user-space > >> > got an open file descriptor but that seems the most likely. Is the > >> problem > >> > reproducible enough that a debug patch would likely trigger? > >> > >> In my case the problem occurred frequently enough that I've been forced > >> to change my build procedures to avoid it. I'd welcome a debug patch. > > > > Since apparently no debugging patch is forthcoming, maybe it's time to > > test the simple revert of commit 699390354da. > > I apologize for the delay. > > Here's a debug patch which I hope will narrow down the circumstances of > this error condition. Unfortunately it doesn't apply: markus@x4 linux % git describe v3.10-rc5-167-g26e0446 markus@x4 linux % patch -p1 --dry-run < /home/markus/tty.patch checking file drivers/tty/pty.c Hunk #1 succeeded at 25 with fuzz 2. Hunk #2 FAILED at 256. 1 out of 2 hunks FAILED checking file drivers/tty/tty_io.c Hunk #1 FAILED at 105. Hunk #2 FAILED at 424. Hunk #3 FAILED at 448. Hunk #4 FAILED at 1020. Hunk #5 FAILED at 1197. 5 out of 5 hunks FAILED -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote: On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: Peter Hurley writes: > Based on the other reports from Mikael and David, I suspect this problem > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > >pty: Ignore slave pty close() if never successfully opened > > This commit poisons the pty under certain error conditions that may > occur from parallel open()s (or parallel close() with pending write()). > > It's unclear to me which error condition is triggered and how user-space > got an open file descriptor but that seems the most likely. Is the problem > reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. I apologize for the delay. Here's a debug patch which I hope will narrow down the circumstances of this error condition. Regards, Peter Hurley PS - Don't get excited about EIO from read() which you will see during boot. That's expected when the slave closes and the master is parked on a blocking read(). --- >% --- Subject: [PATCH] tty: Debug EIO from write() Signed-off-by: Peter Hurley --- drivers/tty/pty.c| 22 +++--- drivers/tty/tty_io.c | 51 ++- 2 files changed, 61 insertions(+), 12 deletions(-) diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 9c2f1bc..2ce2bb2 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -25,6 +25,16 @@ #include #include +#define TTY_DEBUG_EIO 1 + +#ifdef TTY_DEBUG_EIO +#define tty_debug_eio(tty, f, args...) ({ \ +char __b[64]; \ +printk(KERN_DEBUG "%s: %s: " f, __func__, tty_name(tty, __b), ##args); \ +}) +#else +#define tty_debug_eio(tty, f, args...) +#endif #ifdef CONFIG_UNIX98_PTYS static struct tty_driver *ptm_driver; @@ -246,12 +256,18 @@ static int pty_open(struct tty_struct *tty, struct file *filp) set_bit(TTY_IO_ERROR, >flags); retval = -EIO; -if (test_bit(TTY_OTHER_CLOSED, >flags)) +if (test_bit(TTY_OTHER_CLOSED, >flags)) { +tty_debug_eio(tty, "other pty closed (%#lx)\n", tty->flags); goto out; -if (test_bit(TTY_PTY_LOCK, >link->flags)) +} +if (test_bit(TTY_PTY_LOCK, >link->flags)) { +tty_debug_eio(tty, "ptm still locked\n"); goto out; -if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) +} +if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) { +tty_debug_eio(tty, "ptm open count (%d)\n", tty->link->count); goto out; +} clear_bit(TTY_IO_ERROR, >flags); clear_bit(TTY_OTHER_CLOSED, >link->flags); diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 62e942d..e71c61f 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -105,11 +105,21 @@ #include #include -#undef TTY_DEBUG_HANGUP +#define TTY_DEBUG_HANGUP 1 +#define TTY_DEBUG_EIO 1 #define TTY_PARANOIA_CHECK 1 #define CHECK_TTY_COUNT 1 +#ifdef TTY_DEBUG_EIO +#define tty_debug_eio(tty, f, args...) ({ \ +char __b[64]; \ +printk(KERN_DEBUG "%s: %s: " f, __func__, tty_name(tty, __b), ##args); \ +}) +#else +#define tty_debug_eio(tty, f, args...) +#endif + struct ktermios tty_std_termios = {/* for the benefit of tty drivers */ .c_iflag = ICRNL | IXON, .c_oflag = OPOST | ONLCR, @@ -424,6 +434,7 @@ int tty_check_change(struct tty_struct *tty) if (is_ignored(SIGTTOU)) goto out; if (is_current_pgrp_orphaned()) { +tty_debug_eio(tty, "pgrp orphaned?? (%#lx)\n", tty->flags); ret = -EIO; goto out; } @@ -448,6 +459,9 @@ static ssize_t hung_up_tty_read(struct file *file, char __user *buf, static ssize_t hung_up_tty_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { +struct tty_struct *tty = file_tty(file); + +tty_debug_eio(tty, "%#lx\n", tty ? tty->flags : -1L); return -EIO; } @@ -1020,16 +1034,22 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, if (tty_paranoia_check(tty, inode, "tty_read")) return -EIO; -if (!tty || (test_bit(TTY_IO_ERROR, >flags))) +if (!tty || (test_bit(TTY_IO_ERROR, >flags))) { +tty_debug_eio(tty, "%#lx\n", tty ? tty->flags : -1L); return -EIO; +} /* We want to wait for the line discipline to sort out in this situation */ ld = tty_ldisc_ref_wait(tty); -if (ld->ops->read) +if (ld->ops->read) { i = (ld->ops->read)(tty, file, buf, count); -else +if (i == -EIO) +tty_debug_eio(tty, "ldisc
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.11 at 22:14 +, Orion Poplawski wrote: > Peter Hurley hurleysoftware.com> writes: > > Based on the other reports from Mikael and David, I suspect this problem > > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > > > >pty: Ignore slave pty close() if never successfully opened > > > > This commit poisons the pty under certain error conditions that may > > occur from parallel open()s (or parallel close() with pending write()). > > > > It's unclear to me which error condition is triggered and how user-space > > got an open file descriptor but that seems the most likely. Is the problem > > reproducible enough that a debug patch would likely trigger? > > I get this pretty frequently on my machine with rpmbuild -ba 2>&1 | tee. > I'd be happy to help to debug. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. So can you guys please apply the following patch and see if the issue goes away? Thanks. diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 59bfaec..f5801a6 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file *filp) if (tty->driver->subtype == PTY_TYPE_MASTER) WARN_ON(tty->count > 1); else { - if (test_bit(TTY_IO_ERROR, >flags)) - return; if (tty->count > 2) return; } - set_bit(TTY_IO_ERROR, >flags); wake_up_interruptible(>read_wait); wake_up_interruptible(>write_wait); tty->packet = 0; @@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (!tty || !tty->link) goto out; - set_bit(TTY_IO_ERROR, >flags); - retval = -EIO; if (test_bit(TTY_OTHER_CLOSED, >flags)) goto out; @@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) goto out; - clear_bit(TTY_IO_ERROR, >flags); clear_bit(TTY_OTHER_CLOSED, >link->flags); set_bit(TTY_THROTTLED, >flags); retval = 0; -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: > Peter Hurley writes: > > Based on the other reports from Mikael and David, I suspect this problem > > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > > > >pty: Ignore slave pty close() if never successfully opened > > > > This commit poisons the pty under certain error conditions that may > > occur from parallel open()s (or parallel close() with pending write()). > > > > It's unclear to me which error condition is triggered and how user-space > > got an open file descriptor but that seems the most likely. Is the problem > > reproducible enough that a debug patch would likely trigger? > > In my case the problem occurred frequently enough that I've been forced > to change my build procedures to avoid it. I'd welcome a debug patch. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. So can you guys please apply the following patch and see if the issue goes away? Thanks. diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 59bfaec..f5801a6 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file *filp) if (tty->driver->subtype == PTY_TYPE_MASTER) WARN_ON(tty->count > 1); else { - if (test_bit(TTY_IO_ERROR, >flags)) - return; if (tty->count > 2) return; } - set_bit(TTY_IO_ERROR, >flags); wake_up_interruptible(>read_wait); wake_up_interruptible(>write_wait); tty->packet = 0; @@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (!tty || !tty->link) goto out; - set_bit(TTY_IO_ERROR, >flags); - retval = -EIO; if (test_bit(TTY_OTHER_CLOSED, >flags)) goto out; @@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) goto out; - clear_bit(TTY_IO_ERROR, >flags); clear_bit(TTY_OTHER_CLOSED, >link->flags); set_bit(TTY_THROTTLED, >flags); retval = 0; -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: Peter Hurley writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. So can you guys please apply the following patch and see if the issue goes away? Thanks. diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 59bfaec..f5801a6 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file *filp) if (tty-driver-subtype == PTY_TYPE_MASTER) WARN_ON(tty-count 1); else { - if (test_bit(TTY_IO_ERROR, tty-flags)) - return; if (tty-count 2) return; } - set_bit(TTY_IO_ERROR, tty-flags); wake_up_interruptible(tty-read_wait); wake_up_interruptible(tty-write_wait); tty-packet = 0; @@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (!tty || !tty-link) goto out; - set_bit(TTY_IO_ERROR, tty-flags); - retval = -EIO; if (test_bit(TTY_OTHER_CLOSED, tty-flags)) goto out; @@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (tty-driver-subtype == PTY_TYPE_SLAVE tty-link-count != 1) goto out; - clear_bit(TTY_IO_ERROR, tty-flags); clear_bit(TTY_OTHER_CLOSED, tty-link-flags); set_bit(TTY_THROTTLED, tty-flags); retval = 0; -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.11 at 22:14 +, Orion Poplawski wrote: Peter Hurley peter at hurleysoftware.com writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? I get this pretty frequently on my machine with rpmbuild -ba 21 | tee. I'd be happy to help to debug. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. So can you guys please apply the following patch and see if the issue goes away? Thanks. diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 59bfaec..f5801a6 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file *filp) if (tty-driver-subtype == PTY_TYPE_MASTER) WARN_ON(tty-count 1); else { - if (test_bit(TTY_IO_ERROR, tty-flags)) - return; if (tty-count 2) return; } - set_bit(TTY_IO_ERROR, tty-flags); wake_up_interruptible(tty-read_wait); wake_up_interruptible(tty-write_wait); tty-packet = 0; @@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (!tty || !tty-link) goto out; - set_bit(TTY_IO_ERROR, tty-flags); - retval = -EIO; if (test_bit(TTY_OTHER_CLOSED, tty-flags)) goto out; @@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file *filp) if (tty-driver-subtype == PTY_TYPE_SLAVE tty-link-count != 1) goto out; - clear_bit(TTY_IO_ERROR, tty-flags); clear_bit(TTY_OTHER_CLOSED, tty-link-flags); set_bit(TTY_THROTTLED, tty-flags); retval = 0; -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote: On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: Peter Hurley writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. I apologize for the delay. Here's a debug patch which I hope will narrow down the circumstances of this error condition. Regards, Peter Hurley PS - Don't get excited about EIO from read() which you will see during boot. That's expected when the slave closes and the master is parked on a blocking read(). --- % --- Subject: [PATCH] tty: Debug EIO from write() Signed-off-by: Peter Hurley pe...@hurleysoftware.com --- drivers/tty/pty.c| 22 +++--- drivers/tty/tty_io.c | 51 ++- 2 files changed, 61 insertions(+), 12 deletions(-) diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 9c2f1bc..2ce2bb2 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -25,6 +25,16 @@ #include linux/slab.h #include linux/mutex.h +#define TTY_DEBUG_EIO 1 + +#ifdef TTY_DEBUG_EIO +#define tty_debug_eio(tty, f, args...) ({ \ +char __b[64]; \ +printk(KERN_DEBUG %s: %s: f, __func__, tty_name(tty, __b), ##args); \ +}) +#else +#define tty_debug_eio(tty, f, args...) +#endif #ifdef CONFIG_UNIX98_PTYS static struct tty_driver *ptm_driver; @@ -246,12 +256,18 @@ static int pty_open(struct tty_struct *tty, struct file *filp) set_bit(TTY_IO_ERROR, tty-flags); retval = -EIO; -if (test_bit(TTY_OTHER_CLOSED, tty-flags)) +if (test_bit(TTY_OTHER_CLOSED, tty-flags)) { +tty_debug_eio(tty, other pty closed (%#lx)\n, tty-flags); goto out; -if (test_bit(TTY_PTY_LOCK, tty-link-flags)) +} +if (test_bit(TTY_PTY_LOCK, tty-link-flags)) { +tty_debug_eio(tty, ptm still locked\n); goto out; -if (tty-driver-subtype == PTY_TYPE_SLAVE tty-link-count != 1) +} +if (tty-driver-subtype == PTY_TYPE_SLAVE tty-link-count != 1) { +tty_debug_eio(tty, ptm open count (%d)\n, tty-link-count); goto out; +} clear_bit(TTY_IO_ERROR, tty-flags); clear_bit(TTY_OTHER_CLOSED, tty-link-flags); diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 62e942d..e71c61f 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -105,11 +105,21 @@ #include linux/kmod.h #include linux/nsproxy.h -#undef TTY_DEBUG_HANGUP +#define TTY_DEBUG_HANGUP 1 +#define TTY_DEBUG_EIO 1 #define TTY_PARANOIA_CHECK 1 #define CHECK_TTY_COUNT 1 +#ifdef TTY_DEBUG_EIO +#define tty_debug_eio(tty, f, args...) ({ \ +char __b[64]; \ +printk(KERN_DEBUG %s: %s: f, __func__, tty_name(tty, __b), ##args); \ +}) +#else +#define tty_debug_eio(tty, f, args...) +#endif + struct ktermios tty_std_termios = {/* for the benefit of tty drivers */ .c_iflag = ICRNL | IXON, .c_oflag = OPOST | ONLCR, @@ -424,6 +434,7 @@ int tty_check_change(struct tty_struct *tty) if (is_ignored(SIGTTOU)) goto out; if (is_current_pgrp_orphaned()) { +tty_debug_eio(tty, pgrp orphaned?? (%#lx)\n, tty-flags); ret = -EIO; goto out; } @@ -448,6 +459,9 @@ static ssize_t hung_up_tty_read(struct file *file, char __user *buf, static ssize_t hung_up_tty_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { +struct tty_struct *tty = file_tty(file); + +tty_debug_eio(tty, %#lx\n, tty ? tty-flags : -1L); return -EIO; } @@ -1020,16 +1034,22 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, if (tty_paranoia_check(tty, inode, tty_read)) return -EIO; -if (!tty || (test_bit(TTY_IO_ERROR, tty-flags))) +if (!tty || (test_bit(TTY_IO_ERROR, tty-flags))) { +tty_debug_eio(tty, %#lx\n, tty ? tty-flags : -1L); return -EIO; +} /* We want to wait for the line discipline to sort out in this situation */ ld = tty_ldisc_ref_wait(tty); -if (ld-ops-read) +if (ld-ops-read) { i = (ld-ops-read)(tty, file, buf, count); -else +if (i ==
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.13 at 10:16 -0400, Peter Hurley wrote: On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote: On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote: Peter Hurley writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. Since apparently no debugging patch is forthcoming, maybe it's time to test the simple revert of commit 699390354da. I apologize for the delay. Here's a debug patch which I hope will narrow down the circumstances of this error condition. Unfortunately it doesn't apply: markus@x4 linux % git describe v3.10-rc5-167-g26e0446 markus@x4 linux % patch -p1 --dry-run /home/markus/tty.patch checking file drivers/tty/pty.c Hunk #1 succeeded at 25 with fuzz 2. Hunk #2 FAILED at 256. 1 out of 2 hunks FAILED checking file drivers/tty/tty_io.c Hunk #1 FAILED at 105. Hunk #2 FAILED at 424. Hunk #3 FAILED at 448. Hunk #4 FAILED at 1020. Hunk #5 FAILED at 1197. 5 out of 5 hunks FAILED -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley hurleysoftware.com> writes: > Based on the other reports from Mikael and David, I suspect this problem > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > >pty: Ignore slave pty close() if never successfully opened > > This commit poisons the pty under certain error conditions that may > occur from parallel open()s (or parallel close() with pending write()). > > It's unclear to me which error condition is triggered and how user-space > got an open file descriptor but that seems the most likely. Is the problem > reproducible enough that a debug patch would likely trigger? I get this pretty frequently on my machine with rpmbuild -ba 2>&1 | tee. I'd be happy to help to debug. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley peter at hurleysoftware.com writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? I get this pretty frequently on my machine with rpmbuild -ba 21 | tee. I'd be happy to help to debug. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 06:54:17 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using "emerge" under Gentoo: ... see: https://bugs.gentoo.org/show_bug.cgi?id=459674 (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Now the question is: Could this be a kernel bug? Maybe in the TTY layer? Unfortunately the issue is not easily reproducible and a git-bisect is out of the question. I tracked down and fixed something like this in the User Mode Linux tty implementation many moons ago. The trick to making it reproducible was to rename a copy of the xterm binary to somethingunique, run a shell in said xterm (because "cat" doesn't exercise the tty logic, that's why), run a thing in there producing test output, and have a loop in another window doing "while true; do killall -STOP somethingunique; sleep 1; killall -START somethingunique; sleep 1; done". This forces the pipe buffer to the pty to fill up and exercise the flow control logic, plus all the fun "short write and retry" stuff... Rob-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 06:54:17 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... see: https://bugs.gentoo.org/show_bug.cgi?id=459674 (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Now the question is: Could this be a kernel bug? Maybe in the TTY layer? Unfortunately the issue is not easily reproducible and a git-bisect is out of the question. I tracked down and fixed something like this in the User Mode Linux tty implementation many moons ago. The trick to making it reproducible was to rename a copy of the xterm binary to somethingunique, run a shell in said xterm (because cat doesn't exercise the tty logic, that's why), run a thing in there producing test output, and have a loop in another window doing while true; do killall -STOP somethingunique; sleep 1; killall -START somethingunique; sleep 1; done. This forces the pipe buffer to the pty to fill up and exercise the flow control logic, plus all the fun short write and retry stuff... Rob-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley wrote: > Based on the other reports from Mikael and David, I suspect this problem > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > > pty: Ignore slave pty close() if never successfully opened > > This commit poisons the pty under certain error conditions that may > occur from parallel open()s (or parallel close() with pending write()). In the case that I'm seeing, the tty in question isn't being opened and closed regularly. A gcc build is dumping a load of stuff out through tee and tee writes it to the tty it got as fd 1 from the parent shell. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley writes: > Based on the other reports from Mikael and David, I suspect this problem > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > >pty: Ignore slave pty close() if never successfully opened > > This commit poisons the pty under certain error conditions that may > occur from parallel open()s (or parallel close() with pending write()). > > It's unclear to me which error condition is triggered and how user-space > got an open file descriptor but that seems the most likely. Is the problem > reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.07 at 11:39 -0400, Peter Hurley wrote: > On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote: > > On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: > >> On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: > >>> Since v3.8.0 several people reported intermittent IO errors that happen > >>> during high system load while using "emerge" under Gentoo: > >>> ... > >>> File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", > >>> line 260, in iteration > >>> if not x.callback(f, event, *x.args): > >>> File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line > >>> 99, in _output_handler > >>> stdout_buf[os.write(stdout_fd, stdout_buf):] > >>> File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in > >>> __call__ > >>> rval = self._func(*wrapped_args, **wrapped_kwargs) > >>> OSError: [Errno 5] Input/output error > > Based on the other reports from Mikael and David, I suspect this problem > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: > >pty: Ignore slave pty close() if never successfully opened > > This commit poisons the pty under certain error conditions that may > occur from parallel open()s (or parallel close() with pending write()). > > It's unclear to me which error condition is triggered and how user-space > got an open file descriptor but that seems the most likely. Is the problem > reproducible enough that a debug patch would likely trigger? A debug patch would be appreciated. In my case the problem occurs very irregularly; sometimes it happens a few times per day, at other times I don't see the bug for three weeks. But if Mikael and David could also run their kernels with the debug patch applied, hopefully that's enough to trigger the issue more regularly. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote: On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using "emerge" under Gentoo: ... File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 260, in iteration if not x.callback(f, event, *x.args): File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Looks to me like a user-space bug: EIO is returned when the other end of the "pipe" has been closed. FWIW, I didn't see where the OP tried to revert 'SpawnProcess: stdout_fd FD_CLOEXEC' The only non-emerge related comment (#21 in the link provided) refers to 'a similar issue sometimes happened when I built Firefox by hand [..snip..] And it would randomly crash during the build. Since I've recompiled Python with gcc-4.6 this issue also never occurred again.' That comment doesn't really corroborate the reported bug. That comment was from me (I use 'octoploid' for blog trolling, etc.) and is wrong. The Firefox build issue happend again today. See also the rest of my mail: (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Please note that both the Firefox build process and Portage (emerge) are implemented in Python. Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote: On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 260, in iteration if not x.callback(f, event, *x.args): File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Looks to me like a user-space bug: EIO is returned when the other end of the pipe has been closed. FWIW, I didn't see where the OP tried to revert 'SpawnProcess: stdout_fd FD_CLOEXEC' The only non-emerge related comment (#21 in the link provided) refers to 'a similar issue sometimes happened when I built Firefox by hand [..snip..] And it would randomly crash during the build. Since I've recompiled Python with gcc-4.6 this issue also never occurred again.' That comment doesn't really corroborate the reported bug. That comment was from me (I use 'octoploid' for blog trolling, etc.) and is wrong. The Firefox build issue happend again today. See also the rest of my mail: (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Please note that both the Firefox build process and Portage (emerge) are implemented in Python. Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? Regards, Peter Hurley -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.07 at 11:39 -0400, Peter Hurley wrote: On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote: On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 260, in iteration if not x.callback(f, event, *x.args): File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? A debug patch would be appreciated. In my case the problem occurs very irregularly; sometimes it happens a few times per day, at other times I don't see the bug for three weeks. But if Mikael and David could also run their kernels with the debug patch applied, hopefully that's enough to trigger the issue more regularly. -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley writes: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). It's unclear to me which error condition is triggered and how user-space got an open file descriptor but that seems the most likely. Is the problem reproducible enough that a debug patch would likely trigger? In my case the problem occurred frequently enough that I've been forced to change my build procedures to avoid it. I'd welcome a debug patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Peter Hurley pe...@hurleysoftware.com wrote: Based on the other reports from Mikael and David, I suspect this problem may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6: pty: Ignore slave pty close() if never successfully opened This commit poisons the pty under certain error conditions that may occur from parallel open()s (or parallel close() with pending write()). In the case that I'm seeing, the tty in question isn't being opened and closed regularly. A gcc build is dumping a load of stuff out through tee and tee writes it to the tty it got as fd 1 from the parent shell. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.06 at 18:31 +0100, David Howells wrote: > Markus Trippelsdorf wrote: > > > OSError: [Errno 5] Input/output error > > > > Basically 'emerge' just writes the build output to stdout in a loop: > > I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly > seeing something very similar from the tee run by "fedpkg local" to log its > output when building a cross-compiler gcc RPM: > > tee: standard output: Input/output error > tee: write error > > Slapping strace on tee after the "Input/output error" line has occurred shows > that tee is still running and still pushing lines from its stdin to the output > named on its command line, but has ceased writing to its stdout. tee uses the same loop as portage, but keeps going in case of an EIO: /* In the array of NFILES + 1 descriptors, make the first one correspond to standard output. */ descriptors[0] = stdout; files[0] = _("standard output"); setvbuf (stdout, NULL, _IONBF, 0); for (i = 1; i <= nfiles; i++) { descriptors[i] = (STREQ (files[i], "-") ? stdout : fopen (files[i], mode_string)); if (descriptors[i] == NULL) { error (0, errno, "%s", files[i]); ok = false; } else setvbuf (descriptors[i], NULL, _IONBF, 0); } while (1) { bytes_read = read (0, buffer, sizeof buffer); if (bytes_read < 0 && errno == EINTR) continue; if (bytes_read <= 0) break; /* Write to all NFILES + 1 descriptors. Standard output is the first one. */ for (i = 0; i <= nfiles; i++) if (descriptors[i] && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1) { error (0, errno, "%s", files[i]); descriptors[i] = NULL; ok = false; } } -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Markus Trippelsdorf wrote: > OSError: [Errno 5] Input/output error > > Basically 'emerge' just writes the build output to stdout in a loop: I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly seeing something very similar from the tee run by "fedpkg local" to log its output when building a cross-compiler gcc RPM: tee: standard output: Input/output error tee: write error Slapping strace on tee after the "Input/output error" line has occurred shows that tee is still running and still pushing lines from its stdin to the output named on its command line, but has ceased writing to its stdout. The stdout is (or was) a TTY that's either the communications channel to an xterm or the channel to an emacs buffer. With the xterm, the process can be suspended (CTRL+Z) stuff can be written to the tty and then the process can be rstarted. Indeed, stderr for tee goes to the same tty - and must still be functional otherwise I wouldn't see the aforementioned error messages... In ps fax, the tee process can be seen at the bottom of the subtree: 5158 ?S 0:06 \_ /usr/bin/xterm 5198 pts/3Ss+0:00 | \_ bash 5386 pts/3Sl 4:50 | \_ emacs -geometry 100x77 10258 pts/8SNs+ 0:00 | \_ /usr/bin/python /usr/bin/fedpkg local 10267 pts/8SN+0:00 | \_ rpmbuild --define _sourcedir /dat 10462 pts/8SN+0:00 | | \_ /bin/sh -e /var/tmp/rpm-tmp.x 16248 pts/8SN+0:00 | | \_ make -j12 16249 pts/8SN+0:00 | | \_ /bin/sh -c fail= fail 16253 pts/8SN+0:00 | | \_ /bin/sh -c fail= 16254 pts/8SN+0:00 | | \_ make all-am 16256 pts/8SN+0:00 | | \_ /bin/sh - 16265 pts/8SN+0:00 | | | \_ /bin/ 16688 pts/8SN+0:00 | | | \_ / 16700 pts/8RN+0:00 | | | 16704 pts/8SN+0:00 | | | 16267 pts/8SN+0:00 | | \_ /bin/sh - 16279 pts/8SN+0:00 | | | \_ /bin/ 16694 pts/8SN+0:00 | | | \_ / 16707 pts/8RN+0:00 | | | 16713 pts/8SN+0:00 | | | 16289 pts/8SN+0:00 | | \_ /bin/sh - 16306 pts/8SN+0:00 | | | \_ /bin/ 17045 pts/8SN+0:00 | | | \_ / 17046 pts/8RN+0:00 | | | 17047 pts/8SN+0:00 | | | 16799 pts/8SN+0:00 | | \_ /bin/sh - 16801 pts/8SN+0:00 | | | \_ /bin/ 17051 pts/8SN+0:00 | | | \_ / 17053 pts/8RN+0:00 | | | 16819 pts/8SN+0:00 | | \_ /bin/sh - 16822 pts/8SN+0:00 | | | \_ /bin/ 16943 pts/8SN+0:00 | | | \_ / 16950 pts/8RN+0:00 | | | 16951 pts/8SN+0:00 | | | 16903 pts/8SN+0:00 | | \_ /bin/sh - 16915 pts/8SN+0:00 | | \_ /bin/ 17024 pts/8SN+0:00 | | \_ / 17026 pts/8RN+0:00 | | 17028 pts/8SN+0:00 | | 10268 pts/8SN+0:00 | \_ tee .build-4.8.1-1.1.fc20.log tee's fds look like: warthog>ls -l /proc/10268/fd/ total 0 lr-x--. 1 dhowells dhowells 64 Jun 6 18:24 0 -> pipe:[1595614] lrwx--. 1 dhowells dhowells 64 Jun 6 18:24 1 -> /dev/pts/8 lrwx--. 1 dhowells dhowells 64 Jun 6 18:23 2 -> /dev/pts/8 l-wx--. 1 dhowells dhowells 64 Jun 6 18:24 3 -> /data/fedora/cros David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Markus Trippelsdorf writes: > Since v3.8.0 several people reported intermittent IO errors that happen > during high system load while using "emerge" under Gentoo: > ... > File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line > 260, in iteration > if not x.callback(f, event, *x.args): > File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, > in _output_handler > stdout_buf[os.write(stdout_fd, stdout_buf):] > File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__ > rval = self._func(*wrapped_args, **wrapped_kwargs) > OSError: [Errno 5] Input/output error > > Basically 'emerge' just writes the build output to stdout in a loop: > ... > def _output_handler(self, fd, event): > >background = self.background >stdout_fd = self.stdout_fd >log_file = self._log_file > >while True: >buf = self._read_buf(fd, event) > >if buf is None: ># not a POLLIN event, EAGAIN, etc... >break > >if not buf: ># EOF >self._unregister() >self.wait() >break > >else: >if not background and stdout_fd is not None: >failures = 0 >stdout_buf = buf >while stdout_buf: >try: >stdout_buf = \ > > stdout_buf[os.write(stdout_fd, stdout_buf):] >except OSError as e: >if e.errno != errno.EAGAIN: >raise > ... > > see: https://bugs.gentoo.org/show_bug.cgi?id=459674 > > (A similar issue also happens when building Firefox since v3.8.0. But > because Firefox's build process doesn't raise an exception it just dies > at random points without giving a clue.) > > Now the question is: Could this be a kernel bug? Maybe in the TTY layer? > > Unfortunately the issue is not easily reproducible and a git-bisect is > out of the question. I'm seeing a similar regression. I do a lot of gcc bootstraps and regression test suite runs, and for the bootstraps I do make -jN bootstrap |& tee build-log (tcsh syntax, adjust as appropriate for your preferred shell) to get a complete log for later inspection in case of error. N is typically the number of cores or threads on the machine, e.g. -j8 on my Core-i7 IVB. Up to the 3.7 kernel this never had any problems. Starting with the 3.8 kernel, or possibly 3.9-rc1, this usually dies at some random point with an EIO. I haven't had time to bisect it. /Mikael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: > On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: > > Since v3.8.0 several people reported intermittent IO errors that happen > > during high system load while using "emerge" under Gentoo: > > ... > >File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line > > 260, in iteration > > if not x.callback(f, event, *x.args): > >File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line > > 99, in _output_handler > > stdout_buf[os.write(stdout_fd, stdout_buf):] > >File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__ > > rval = self._func(*wrapped_args, **wrapped_kwargs) > > OSError: [Errno 5] Input/output error > > Looks to me like a user-space bug: EIO is returned when the other > end of the "pipe" has been closed. > > FWIW, I didn't see where the OP tried to revert > 'SpawnProcess: stdout_fd FD_CLOEXEC' > > The only non-emerge related comment (#21 in the link provided) refers to >'a similar issue sometimes happened when I built Firefox by hand [..snip..] > And it would randomly crash during the build. > > Since I've recompiled Python with gcc-4.6 this issue also never occurred > again.' > > That comment doesn't really corroborate the reported bug. That comment was from me (I use 'octoploid' for blog trolling, etc.) and is wrong. The Firefox build issue happend again today. See also the rest of my mail: > (A similar issue also happens when building Firefox since v3.8.0. But > because Firefox's build process doesn't raise an exception it just > dies at random points without giving a clue.) Please note that both the Firefox build process and Portage (emerge) are implemented in Python. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using "emerge" under Gentoo: ... File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 260, in iteration if not x.callback(f, event, *x.args): File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Looks to me like a user-space bug: EIO is returned when the other end of the "pipe" has been closed. FWIW, I didn't see where the OP tried to revert 'SpawnProcess: stdout_fd FD_CLOEXEC' The only non-emerge related comment (#21 in the link provided) refers to 'a similar issue sometimes happened when I built Firefox by hand [..snip..] And it would randomly crash during the build. Since I've recompiled Python with gcc-4.6 this issue also never occurred again.' That comment doesn't really corroborate the reported bug. Regards, Peter Hurley Basically 'emerge' just writes the build output to stdout in a loop: ... def _output_handler(self, fd, event): background = self.background stdout_fd = self.stdout_fd log_file = self._log_file while True: buf = self._read_buf(fd, event) if buf is None: # not a POLLIN event, EAGAIN, etc... break if not buf: # EOF self._unregister() self.wait() break else: if not background and stdout_fd is not None: failures = 0 stdout_buf = buf while stdout_buf: try: stdout_buf = \ stdout_buf[os.write(stdout_fd, stdout_buf):] except OSError as e: if e.errno != errno.EAGAIN: raise ... see: https://bugs.gentoo.org/show_bug.cgi?id=459674 (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Now the question is: Could this be a kernel bug? Maybe in the TTY layer? Unfortunately the issue is not easily reproducible and a git-bisect is out of the question. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 260, in iteration if not x.callback(f, event, *x.args): File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Looks to me like a user-space bug: EIO is returned when the other end of the pipe has been closed. FWIW, I didn't see where the OP tried to revert 'SpawnProcess: stdout_fd FD_CLOEXEC' The only non-emerge related comment (#21 in the link provided) refers to 'a similar issue sometimes happened when I built Firefox by hand [..snip..] And it would randomly crash during the build. Since I've recompiled Python with gcc-4.6 this issue also never occurred again.' That comment doesn't really corroborate the reported bug. Regards, Peter Hurley Basically 'emerge' just writes the build output to stdout in a loop: ... def _output_handler(self, fd, event): background = self.background stdout_fd = self.stdout_fd log_file = self._log_file while True: buf = self._read_buf(fd, event) if buf is None: # not a POLLIN event, EAGAIN, etc... break if not buf: # EOF self._unregister() self.wait() break else: if not background and stdout_fd is not None: failures = 0 stdout_buf = buf while stdout_buf: try: stdout_buf = \ stdout_buf[os.write(stdout_fd, stdout_buf):] except OSError as e: if e.errno != errno.EAGAIN: raise ... see: https://bugs.gentoo.org/show_bug.cgi?id=459674 (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Now the question is: Could this be a kernel bug? Maybe in the TTY layer? Unfortunately the issue is not easily reproducible and a git-bisect is out of the question. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.06 at 10:18 -0400, Peter Hurley wrote: On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 260, in iteration if not x.callback(f, event, *x.args): File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Looks to me like a user-space bug: EIO is returned when the other end of the pipe has been closed. FWIW, I didn't see where the OP tried to revert 'SpawnProcess: stdout_fd FD_CLOEXEC' The only non-emerge related comment (#21 in the link provided) refers to 'a similar issue sometimes happened when I built Firefox by hand [..snip..] And it would randomly crash during the build. Since I've recompiled Python with gcc-4.6 this issue also never occurred again.' That comment doesn't really corroborate the reported bug. That comment was from me (I use 'octoploid' for blog trolling, etc.) and is wrong. The Firefox build issue happend again today. See also the rest of my mail: (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Please note that both the Firefox build process and Portage (emerge) are implemented in Python. -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Markus Trippelsdorf writes: Since v3.8.0 several people reported intermittent IO errors that happen during high system load while using emerge under Gentoo: ... File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 260, in iteration if not x.callback(f, event, *x.args): File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in _output_handler stdout_buf[os.write(stdout_fd, stdout_buf):] File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) OSError: [Errno 5] Input/output error Basically 'emerge' just writes the build output to stdout in a loop: ... def _output_handler(self, fd, event): background = self.background stdout_fd = self.stdout_fd log_file = self._log_file while True: buf = self._read_buf(fd, event) if buf is None: # not a POLLIN event, EAGAIN, etc... break if not buf: # EOF self._unregister() self.wait() break else: if not background and stdout_fd is not None: failures = 0 stdout_buf = buf while stdout_buf: try: stdout_buf = \ stdout_buf[os.write(stdout_fd, stdout_buf):] except OSError as e: if e.errno != errno.EAGAIN: raise ... see: https://bugs.gentoo.org/show_bug.cgi?id=459674 (A similar issue also happens when building Firefox since v3.8.0. But because Firefox's build process doesn't raise an exception it just dies at random points without giving a clue.) Now the question is: Could this be a kernel bug? Maybe in the TTY layer? Unfortunately the issue is not easily reproducible and a git-bisect is out of the question. I'm seeing a similar regression. I do a lot of gcc bootstraps and regression test suite runs, and for the bootstraps I do make -jN bootstrap | tee build-log (tcsh syntax, adjust as appropriate for your preferred shell) to get a complete log for later inspection in case of error. N is typically the number of cores or threads on the machine, e.g. -j8 on my Core-i7 IVB. Up to the 3.7 kernel this never had any problems. Starting with the 3.8 kernel, or possibly 3.9-rc1, this usually dies at some random point with an EIO. I haven't had time to bisect it. /Mikael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
Markus Trippelsdorf mar...@trippelsdorf.de wrote: OSError: [Errno 5] Input/output error Basically 'emerge' just writes the build output to stdout in a loop: I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly seeing something very similar from the tee run by fedpkg local to log its output when building a cross-compiler gcc RPM: tee: standard output: Input/output error tee: write error Slapping strace on tee after the Input/output error line has occurred shows that tee is still running and still pushing lines from its stdin to the output named on its command line, but has ceased writing to its stdout. The stdout is (or was) a TTY that's either the communications channel to an xterm or the channel to an emacs buffer. With the xterm, the process can be suspended (CTRL+Z) stuff can be written to the tty and then the process can be rstarted. Indeed, stderr for tee goes to the same tty - and must still be functional otherwise I wouldn't see the aforementioned error messages... In ps fax, the tee process can be seen at the bottom of the subtree: 5158 ?S 0:06 \_ /usr/bin/xterm 5198 pts/3Ss+0:00 | \_ bash 5386 pts/3Sl 4:50 | \_ emacs -geometry 100x77 10258 pts/8SNs+ 0:00 | \_ /usr/bin/python /usr/bin/fedpkg local 10267 pts/8SN+0:00 | \_ rpmbuild --define _sourcedir /dat 10462 pts/8SN+0:00 | | \_ /bin/sh -e /var/tmp/rpm-tmp.x 16248 pts/8SN+0:00 | | \_ make -j12 16249 pts/8SN+0:00 | | \_ /bin/sh -c fail= fail 16253 pts/8SN+0:00 | | \_ /bin/sh -c fail= 16254 pts/8SN+0:00 | | \_ make all-am 16256 pts/8SN+0:00 | | \_ /bin/sh - 16265 pts/8SN+0:00 | | | \_ /bin/ 16688 pts/8SN+0:00 | | | \_ / 16700 pts/8RN+0:00 | | | 16704 pts/8SN+0:00 | | | 16267 pts/8SN+0:00 | | \_ /bin/sh - 16279 pts/8SN+0:00 | | | \_ /bin/ 16694 pts/8SN+0:00 | | | \_ / 16707 pts/8RN+0:00 | | | 16713 pts/8SN+0:00 | | | 16289 pts/8SN+0:00 | | \_ /bin/sh - 16306 pts/8SN+0:00 | | | \_ /bin/ 17045 pts/8SN+0:00 | | | \_ / 17046 pts/8RN+0:00 | | | 17047 pts/8SN+0:00 | | | 16799 pts/8SN+0:00 | | \_ /bin/sh - 16801 pts/8SN+0:00 | | | \_ /bin/ 17051 pts/8SN+0:00 | | | \_ / 17053 pts/8RN+0:00 | | | 16819 pts/8SN+0:00 | | \_ /bin/sh - 16822 pts/8SN+0:00 | | | \_ /bin/ 16943 pts/8SN+0:00 | | | \_ / 16950 pts/8RN+0:00 | | | 16951 pts/8SN+0:00 | | | 16903 pts/8SN+0:00 | | \_ /bin/sh - 16915 pts/8SN+0:00 | | \_ /bin/ 17024 pts/8SN+0:00 | | \_ / 17026 pts/8RN+0:00 | | 17028 pts/8SN+0:00 | | 10268 pts/8SN+0:00 | \_ tee .build-4.8.1-1.1.fc20.log tee's fds look like: warthogls -l /proc/10268/fd/ total 0 lr-x--. 1 dhowells dhowells 64 Jun 6 18:24 0 - pipe:[1595614] lrwx--. 1 dhowells dhowells 64 Jun 6 18:24 1 - /dev/pts/8 lrwx--. 1 dhowells dhowells 64 Jun 6 18:23 2 - /dev/pts/8 l-wx--. 1 dhowells dhowells 64 Jun 6 18:24 3 - /data/fedora/cros David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange intermittent EIO error when writing to stdout since v3.8.0
On 2013.06.06 at 18:31 +0100, David Howells wrote: Markus Trippelsdorf mar...@trippelsdorf.de wrote: OSError: [Errno 5] Input/output error Basically 'emerge' just writes the build output to stdout in a loop: I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly seeing something very similar from the tee run by fedpkg local to log its output when building a cross-compiler gcc RPM: tee: standard output: Input/output error tee: write error Slapping strace on tee after the Input/output error line has occurred shows that tee is still running and still pushing lines from its stdin to the output named on its command line, but has ceased writing to its stdout. tee uses the same loop as portage, but keeps going in case of an EIO: /* In the array of NFILES + 1 descriptors, make the first one correspond to standard output. */ descriptors[0] = stdout; files[0] = _(standard output); setvbuf (stdout, NULL, _IONBF, 0); for (i = 1; i = nfiles; i++) { descriptors[i] = (STREQ (files[i], -) ? stdout : fopen (files[i], mode_string)); if (descriptors[i] == NULL) { error (0, errno, %s, files[i]); ok = false; } else setvbuf (descriptors[i], NULL, _IONBF, 0); } while (1) { bytes_read = read (0, buffer, sizeof buffer); if (bytes_read 0 errno == EINTR) continue; if (bytes_read = 0) break; /* Write to all NFILES + 1 descriptors. Standard output is the first one. */ for (i = 0; i = nfiles; i++) if (descriptors[i] fwrite (buffer, bytes_read, 1, descriptors[i]) != 1) { error (0, errno, %s, files[i]); descriptors[i] = NULL; ok = false; } } -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/