Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.13 at 10:16 -0400, Peter Hurley wrote:
> On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote:
> > On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:
> >> Peter Hurley writes:
> >>   > Based on the other reports from Mikael and David, I suspect this 
> >> problem
> >>   > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
> >>   >
> >>   >pty: Ignore slave pty close() if never successfully opened
> >>   >
> >>   > This commit poisons the pty under certain error conditions that may
> >>   > occur from parallel open()s (or parallel close() with pending write()).
> >>   >
> >>   > It's unclear to me which error condition is triggered and how 
> >> user-space
> >>   > got an open file descriptor but that seems the most likely. Is the 
> >> problem
> >>   > reproducible enough that a debug patch would likely trigger?
> >>
> >> In my case the problem occurred frequently enough that I've been forced
> >> to change my build procedures to avoid it.  I'd welcome a debug patch.
> >
> > Since apparently no debugging patch is forthcoming, maybe it's time to
> > test the simple revert of commit 699390354da.
> 
> I apologize for the delay.
> 
> Here's a debug patch which I hope will narrow down the circumstances of
> this error condition.

Unfortunately it doesn't apply:

markus@x4 linux % git describe
v3.10-rc5-167-g26e0446
markus@x4 linux % patch -p1 --dry-run < /home/markus/tty.patch
checking file drivers/tty/pty.c
Hunk #1 succeeded at 25 with fuzz 2.
Hunk #2 FAILED at 256.
1 out of 2 hunks FAILED
checking file drivers/tty/tty_io.c
Hunk #1 FAILED at 105.
Hunk #2 FAILED at 424.
Hunk #3 FAILED at 448.
Hunk #4 FAILED at 1020.
Hunk #5 FAILED at 1197.
5 out of 5 hunks FAILED

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Peter Hurley

On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote:

On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:

Peter Hurley writes:
  > Based on the other reports from Mikael and David, I suspect this problem
  > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
  >
  >pty: Ignore slave pty close() if never successfully opened
  >
  > This commit poisons the pty under certain error conditions that may
  > occur from parallel open()s (or parallel close() with pending write()).
  >
  > It's unclear to me which error condition is triggered and how user-space
  > got an open file descriptor but that seems the most likely. Is the problem
  > reproducible enough that a debug patch would likely trigger?

In my case the problem occurred frequently enough that I've been forced
to change my build procedures to avoid it.  I'd welcome a debug patch.


Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.


I apologize for the delay.

Here's a debug patch which I hope will narrow down the circumstances of
this error condition.

Regards,
Peter Hurley

PS - Don't get excited about EIO from read() which you will see during
boot. That's expected when the slave closes and the master is parked on
a blocking read().

--- >% ---
Subject: [PATCH] tty: Debug EIO from write()


Signed-off-by: Peter Hurley 
---
 drivers/tty/pty.c| 22 +++---
 drivers/tty/tty_io.c | 51 ++-
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 9c2f1bc..2ce2bb2 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -25,6 +25,16 @@
 #include 
 #include 

+#define TTY_DEBUG_EIO 1
+
+#ifdef TTY_DEBUG_EIO
+#define tty_debug_eio(tty, f, args...) ({   \
+char __b[64];   \
+printk(KERN_DEBUG "%s: %s: " f, __func__, tty_name(tty, __b), ##args); \
+})
+#else
+#define tty_debug_eio(tty, f, args...)
+#endif

 #ifdef CONFIG_UNIX98_PTYS
 static struct tty_driver *ptm_driver;
@@ -246,12 +256,18 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
 set_bit(TTY_IO_ERROR, >flags);

 retval = -EIO;
-if (test_bit(TTY_OTHER_CLOSED, >flags))
+if (test_bit(TTY_OTHER_CLOSED, >flags)) {
+tty_debug_eio(tty, "other pty closed (%#lx)\n", tty->flags);
 goto out;
-if (test_bit(TTY_PTY_LOCK, >link->flags))
+}
+if (test_bit(TTY_PTY_LOCK, >link->flags)) {
+tty_debug_eio(tty, "ptm still locked\n");
 goto out;
-if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1)
+}
+if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) {
+tty_debug_eio(tty, "ptm open count (%d)\n", tty->link->count);
 goto out;
+}

 clear_bit(TTY_IO_ERROR, >flags);
 clear_bit(TTY_OTHER_CLOSED, >link->flags);
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 62e942d..e71c61f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -105,11 +105,21 @@
 #include 
 #include 

-#undef TTY_DEBUG_HANGUP
+#define TTY_DEBUG_HANGUP 1
+#define TTY_DEBUG_EIO 1

 #define TTY_PARANOIA_CHECK 1
 #define CHECK_TTY_COUNT 1

+#ifdef TTY_DEBUG_EIO
+#define tty_debug_eio(tty, f, args...) ({   \
+char __b[64];   \
+printk(KERN_DEBUG "%s: %s: " f, __func__, tty_name(tty, __b), ##args); \
+})
+#else
+#define tty_debug_eio(tty, f, args...)
+#endif
+
 struct ktermios tty_std_termios = {/* for the benefit of tty drivers  */
 .c_iflag = ICRNL | IXON,
 .c_oflag = OPOST | ONLCR,
@@ -424,6 +434,7 @@ int tty_check_change(struct tty_struct *tty)
 if (is_ignored(SIGTTOU))
 goto out;
 if (is_current_pgrp_orphaned()) {
+tty_debug_eio(tty, "pgrp orphaned?? (%#lx)\n", tty->flags);
 ret = -EIO;
 goto out;
 }
@@ -448,6 +459,9 @@ static ssize_t hung_up_tty_read(struct file *file, char 
__user *buf,
 static ssize_t hung_up_tty_write(struct file *file, const char __user *buf,
  size_t count, loff_t *ppos)
 {
+struct tty_struct *tty = file_tty(file);
+
+tty_debug_eio(tty, "%#lx\n", tty ? tty->flags : -1L);
 return -EIO;
 }

@@ -1020,16 +1034,22 @@ static ssize_t tty_read(struct file *file, char __user 
*buf, size_t count,

 if (tty_paranoia_check(tty, inode, "tty_read"))
 return -EIO;
-if (!tty || (test_bit(TTY_IO_ERROR, >flags)))
+if (!tty || (test_bit(TTY_IO_ERROR, >flags))) {
+tty_debug_eio(tty, "%#lx\n", tty ? tty->flags : -1L);
 return -EIO;
+}

 /* We want to wait for the line discipline to sort out in this
situation */
 ld = tty_ldisc_ref_wait(tty);
-if (ld->ops->read)
+if (ld->ops->read) {
 i = (ld->ops->read)(tty, file, buf, count);
-else
+if (i == -EIO)
+tty_debug_eio(tty, "ldisc 

Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.11 at 22:14 +, Orion Poplawski wrote:
> Peter Hurley  hurleysoftware.com> writes:
> > Based on the other reports from Mikael and David, I suspect this problem
> > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
> > 
> >pty: Ignore slave pty close() if never successfully opened
> > 
> > This commit poisons the pty under certain error conditions that may
> > occur from parallel open()s (or parallel close() with pending write()).
> > 
> > It's unclear to me which error condition is triggered and how user-space
> > got an open file descriptor but that seems the most likely. Is the problem
> > reproducible enough that a debug patch would likely trigger?
> 
> I get this pretty frequently on my machine with rpmbuild -ba 2>&1 | tee. 
> I'd be happy to help to debug.

Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.

So can you guys please apply the following patch and see if the issue
goes away?

Thanks.

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 59bfaec..f5801a6 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file 
*filp)
if (tty->driver->subtype == PTY_TYPE_MASTER)
WARN_ON(tty->count > 1);
else {
-   if (test_bit(TTY_IO_ERROR, >flags))
-   return;
if (tty->count > 2)
return;
}
-   set_bit(TTY_IO_ERROR, >flags);
wake_up_interruptible(>read_wait);
wake_up_interruptible(>write_wait);
tty->packet = 0;
@@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (!tty || !tty->link)
goto out;
 
-   set_bit(TTY_IO_ERROR, >flags);
-
retval = -EIO;
if (test_bit(TTY_OTHER_CLOSED, >flags))
goto out;
@@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1)
goto out;
 
-   clear_bit(TTY_IO_ERROR, >flags);
clear_bit(TTY_OTHER_CLOSED, >link->flags);
set_bit(TTY_THROTTLED, >flags);
retval = 0;

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:
> Peter Hurley writes:
>  > Based on the other reports from Mikael and David, I suspect this problem
>  > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
>  > 
>  >pty: Ignore slave pty close() if never successfully opened
>  > 
>  > This commit poisons the pty under certain error conditions that may
>  > occur from parallel open()s (or parallel close() with pending write()).
>  > 
>  > It's unclear to me which error condition is triggered and how user-space
>  > got an open file descriptor but that seems the most likely. Is the problem
>  > reproducible enough that a debug patch would likely trigger?
> 
> In my case the problem occurred frequently enough that I've been forced
> to change my build procedures to avoid it.  I'd welcome a debug patch.

Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.

So can you guys please apply the following patch and see if the issue
goes away?

Thanks.

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 59bfaec..f5801a6 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file 
*filp)
if (tty->driver->subtype == PTY_TYPE_MASTER)
WARN_ON(tty->count > 1);
else {
-   if (test_bit(TTY_IO_ERROR, >flags))
-   return;
if (tty->count > 2)
return;
}
-   set_bit(TTY_IO_ERROR, >flags);
wake_up_interruptible(>read_wait);
wake_up_interruptible(>write_wait);
tty->packet = 0;
@@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (!tty || !tty->link)
goto out;
 
-   set_bit(TTY_IO_ERROR, >flags);
-
retval = -EIO;
if (test_bit(TTY_OTHER_CLOSED, >flags))
goto out;
@@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1)
goto out;
 
-   clear_bit(TTY_IO_ERROR, >flags);
clear_bit(TTY_OTHER_CLOSED, >link->flags);
set_bit(TTY_THROTTLED, >flags);
retval = 0;

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:
 Peter Hurley writes:
   Based on the other reports from Mikael and David, I suspect this problem
   may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
   
  pty: Ignore slave pty close() if never successfully opened
   
   This commit poisons the pty under certain error conditions that may
   occur from parallel open()s (or parallel close() with pending write()).
   
   It's unclear to me which error condition is triggered and how user-space
   got an open file descriptor but that seems the most likely. Is the problem
   reproducible enough that a debug patch would likely trigger?
 
 In my case the problem occurred frequently enough that I've been forced
 to change my build procedures to avoid it.  I'd welcome a debug patch.

Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.

So can you guys please apply the following patch and see if the issue
goes away?

Thanks.

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 59bfaec..f5801a6 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file 
*filp)
if (tty-driver-subtype == PTY_TYPE_MASTER)
WARN_ON(tty-count  1);
else {
-   if (test_bit(TTY_IO_ERROR, tty-flags))
-   return;
if (tty-count  2)
return;
}
-   set_bit(TTY_IO_ERROR, tty-flags);
wake_up_interruptible(tty-read_wait);
wake_up_interruptible(tty-write_wait);
tty-packet = 0;
@@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (!tty || !tty-link)
goto out;
 
-   set_bit(TTY_IO_ERROR, tty-flags);
-
retval = -EIO;
if (test_bit(TTY_OTHER_CLOSED, tty-flags))
goto out;
@@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (tty-driver-subtype == PTY_TYPE_SLAVE  tty-link-count != 1)
goto out;
 
-   clear_bit(TTY_IO_ERROR, tty-flags);
clear_bit(TTY_OTHER_CLOSED, tty-link-flags);
set_bit(TTY_THROTTLED, tty-flags);
retval = 0;

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.11 at 22:14 +, Orion Poplawski wrote:
 Peter Hurley peter at hurleysoftware.com writes:
  Based on the other reports from Mikael and David, I suspect this problem
  may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
  
 pty: Ignore slave pty close() if never successfully opened
  
  This commit poisons the pty under certain error conditions that may
  occur from parallel open()s (or parallel close() with pending write()).
  
  It's unclear to me which error condition is triggered and how user-space
  got an open file descriptor but that seems the most likely. Is the problem
  reproducible enough that a debug patch would likely trigger?
 
 I get this pretty frequently on my machine with rpmbuild -ba 21 | tee. 
 I'd be happy to help to debug.

Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.

So can you guys please apply the following patch and see if the issue
goes away?

Thanks.

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 59bfaec..f5801a6 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -38,12 +38,9 @@ static void pty_close(struct tty_struct *tty, struct file 
*filp)
if (tty-driver-subtype == PTY_TYPE_MASTER)
WARN_ON(tty-count  1);
else {
-   if (test_bit(TTY_IO_ERROR, tty-flags))
-   return;
if (tty-count  2)
return;
}
-   set_bit(TTY_IO_ERROR, tty-flags);
wake_up_interruptible(tty-read_wait);
wake_up_interruptible(tty-write_wait);
tty-packet = 0;
@@ -249,8 +246,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (!tty || !tty-link)
goto out;
 
-   set_bit(TTY_IO_ERROR, tty-flags);
-
retval = -EIO;
if (test_bit(TTY_OTHER_CLOSED, tty-flags))
goto out;
@@ -259,7 +254,6 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
if (tty-driver-subtype == PTY_TYPE_SLAVE  tty-link-count != 1)
goto out;
 
-   clear_bit(TTY_IO_ERROR, tty-flags);
clear_bit(TTY_OTHER_CLOSED, tty-link-flags);
set_bit(TTY_THROTTLED, tty-flags);
retval = 0;

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Peter Hurley

On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote:

On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:

Peter Hurley writes:
   Based on the other reports from Mikael and David, I suspect this problem
   may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
  
  pty: Ignore slave pty close() if never successfully opened
  
   This commit poisons the pty under certain error conditions that may
   occur from parallel open()s (or parallel close() with pending write()).
  
   It's unclear to me which error condition is triggered and how user-space
   got an open file descriptor but that seems the most likely. Is the problem
   reproducible enough that a debug patch would likely trigger?

In my case the problem occurred frequently enough that I've been forced
to change my build procedures to avoid it.  I'd welcome a debug patch.


Since apparently no debugging patch is forthcoming, maybe it's time to
test the simple revert of commit 699390354da.


I apologize for the delay.

Here's a debug patch which I hope will narrow down the circumstances of
this error condition.

Regards,
Peter Hurley

PS - Don't get excited about EIO from read() which you will see during
boot. That's expected when the slave closes and the master is parked on
a blocking read().

--- % ---
Subject: [PATCH] tty: Debug EIO from write()


Signed-off-by: Peter Hurley pe...@hurleysoftware.com
---
 drivers/tty/pty.c| 22 +++---
 drivers/tty/tty_io.c | 51 ++-
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 9c2f1bc..2ce2bb2 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -25,6 +25,16 @@
 #include linux/slab.h
 #include linux/mutex.h

+#define TTY_DEBUG_EIO 1
+
+#ifdef TTY_DEBUG_EIO
+#define tty_debug_eio(tty, f, args...) ({   \
+char __b[64];   \
+printk(KERN_DEBUG %s: %s:  f, __func__, tty_name(tty, __b), ##args); \
+})
+#else
+#define tty_debug_eio(tty, f, args...)
+#endif

 #ifdef CONFIG_UNIX98_PTYS
 static struct tty_driver *ptm_driver;
@@ -246,12 +256,18 @@ static int pty_open(struct tty_struct *tty, struct file 
*filp)
 set_bit(TTY_IO_ERROR, tty-flags);

 retval = -EIO;
-if (test_bit(TTY_OTHER_CLOSED, tty-flags))
+if (test_bit(TTY_OTHER_CLOSED, tty-flags)) {
+tty_debug_eio(tty, other pty closed (%#lx)\n, tty-flags);
 goto out;
-if (test_bit(TTY_PTY_LOCK, tty-link-flags))
+}
+if (test_bit(TTY_PTY_LOCK, tty-link-flags)) {
+tty_debug_eio(tty, ptm still locked\n);
 goto out;
-if (tty-driver-subtype == PTY_TYPE_SLAVE  tty-link-count != 1)
+}
+if (tty-driver-subtype == PTY_TYPE_SLAVE  tty-link-count != 1) {
+tty_debug_eio(tty, ptm open count (%d)\n, tty-link-count);
 goto out;
+}

 clear_bit(TTY_IO_ERROR, tty-flags);
 clear_bit(TTY_OTHER_CLOSED, tty-link-flags);
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 62e942d..e71c61f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -105,11 +105,21 @@
 #include linux/kmod.h
 #include linux/nsproxy.h

-#undef TTY_DEBUG_HANGUP
+#define TTY_DEBUG_HANGUP 1
+#define TTY_DEBUG_EIO 1

 #define TTY_PARANOIA_CHECK 1
 #define CHECK_TTY_COUNT 1

+#ifdef TTY_DEBUG_EIO
+#define tty_debug_eio(tty, f, args...) ({   \
+char __b[64];   \
+printk(KERN_DEBUG %s: %s:  f, __func__, tty_name(tty, __b), ##args); \
+})
+#else
+#define tty_debug_eio(tty, f, args...)
+#endif
+
 struct ktermios tty_std_termios = {/* for the benefit of tty drivers  */
 .c_iflag = ICRNL | IXON,
 .c_oflag = OPOST | ONLCR,
@@ -424,6 +434,7 @@ int tty_check_change(struct tty_struct *tty)
 if (is_ignored(SIGTTOU))
 goto out;
 if (is_current_pgrp_orphaned()) {
+tty_debug_eio(tty, pgrp orphaned?? (%#lx)\n, tty-flags);
 ret = -EIO;
 goto out;
 }
@@ -448,6 +459,9 @@ static ssize_t hung_up_tty_read(struct file *file, char 
__user *buf,
 static ssize_t hung_up_tty_write(struct file *file, const char __user *buf,
  size_t count, loff_t *ppos)
 {
+struct tty_struct *tty = file_tty(file);
+
+tty_debug_eio(tty, %#lx\n, tty ? tty-flags : -1L);
 return -EIO;
 }

@@ -1020,16 +1034,22 @@ static ssize_t tty_read(struct file *file, char __user 
*buf, size_t count,

 if (tty_paranoia_check(tty, inode, tty_read))
 return -EIO;
-if (!tty || (test_bit(TTY_IO_ERROR, tty-flags)))
+if (!tty || (test_bit(TTY_IO_ERROR, tty-flags))) {
+tty_debug_eio(tty, %#lx\n, tty ? tty-flags : -1L);
 return -EIO;
+}

 /* We want to wait for the line discipline to sort out in this
situation */
 ld = tty_ldisc_ref_wait(tty);
-if (ld-ops-read)
+if (ld-ops-read) {
 i = (ld-ops-read)(tty, file, buf, count);
-else
+if (i == 

Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-13 Thread Markus Trippelsdorf
On 2013.06.13 at 10:16 -0400, Peter Hurley wrote:
 On 06/13/2013 06:39 AM, Markus Trippelsdorf wrote:
  On 2013.06.07 at 20:22 +0200, Mikael Pettersson wrote:
  Peter Hurley writes:
 Based on the other reports from Mikael and David, I suspect this 
  problem
 may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:

pty: Ignore slave pty close() if never successfully opened

 This commit poisons the pty under certain error conditions that may
 occur from parallel open()s (or parallel close() with pending write()).

 It's unclear to me which error condition is triggered and how 
  user-space
 got an open file descriptor but that seems the most likely. Is the 
  problem
 reproducible enough that a debug patch would likely trigger?
 
  In my case the problem occurred frequently enough that I've been forced
  to change my build procedures to avoid it.  I'd welcome a debug patch.
 
  Since apparently no debugging patch is forthcoming, maybe it's time to
  test the simple revert of commit 699390354da.
 
 I apologize for the delay.
 
 Here's a debug patch which I hope will narrow down the circumstances of
 this error condition.

Unfortunately it doesn't apply:

markus@x4 linux % git describe
v3.10-rc5-167-g26e0446
markus@x4 linux % patch -p1 --dry-run  /home/markus/tty.patch
checking file drivers/tty/pty.c
Hunk #1 succeeded at 25 with fuzz 2.
Hunk #2 FAILED at 256.
1 out of 2 hunks FAILED
checking file drivers/tty/tty_io.c
Hunk #1 FAILED at 105.
Hunk #2 FAILED at 424.
Hunk #3 FAILED at 448.
Hunk #4 FAILED at 1020.
Hunk #5 FAILED at 1197.
5 out of 5 hunks FAILED

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-11 Thread Orion Poplawski
Peter Hurley  hurleysoftware.com> writes:
> Based on the other reports from Mikael and David, I suspect this problem
> may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
> 
>pty: Ignore slave pty close() if never successfully opened
> 
> This commit poisons the pty under certain error conditions that may
> occur from parallel open()s (or parallel close() with pending write()).
> 
> It's unclear to me which error condition is triggered and how user-space
> got an open file descriptor but that seems the most likely. Is the problem
> reproducible enough that a debug patch would likely trigger?

I get this pretty frequently on my machine with rpmbuild -ba 2>&1 | tee. 
I'd be happy to help to debug.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-11 Thread Orion Poplawski
Peter Hurley peter at hurleysoftware.com writes:
 Based on the other reports from Mikael and David, I suspect this problem
 may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
 
pty: Ignore slave pty close() if never successfully opened
 
 This commit poisons the pty under certain error conditions that may
 occur from parallel open()s (or parallel close() with pending write()).
 
 It's unclear to me which error condition is triggered and how user-space
 got an open file descriptor but that seems the most likely. Is the problem
 reproducible enough that a debug patch would likely trigger?

I get this pretty frequently on my machine with rpmbuild -ba 21 | tee. 
I'd be happy to help to debug.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-08 Thread Rob Landley

On 06/06/2013 06:54:17 AM, Markus Trippelsdorf wrote:
Since v3.8.0 several people reported intermittent IO errors that  
happen

during high system load while using "emerge" under Gentoo:

...

see: https://bugs.gentoo.org/show_bug.cgi?id=459674

(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just  
dies

at random points without giving a clue.)

Now the question is: Could this be a kernel bug? Maybe in the TTY  
layer?


Unfortunately the issue is not easily reproducible and a git-bisect is
out of the question.


I tracked down and fixed something like this in the User Mode Linux tty  
implementation many moons ago.


The trick to making it reproducible was to rename a copy of the xterm  
binary to somethingunique, run a shell in said xterm (because "cat"  
doesn't exercise the tty logic, that's why), run a thing in there  
producing test output, and have a loop in another window doing "while  
true; do killall -STOP somethingunique; sleep 1; killall -START  
somethingunique; sleep 1; done".


This forces the pipe buffer to the pty to fill up and exercise the flow  
control logic, plus all the fun "short write and retry" stuff...


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-08 Thread Rob Landley

On 06/06/2013 06:54:17 AM, Markus Trippelsdorf wrote:
Since v3.8.0 several people reported intermittent IO errors that  
happen

during high system load while using emerge under Gentoo:

...

see: https://bugs.gentoo.org/show_bug.cgi?id=459674

(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just  
dies

at random points without giving a clue.)

Now the question is: Could this be a kernel bug? Maybe in the TTY  
layer?


Unfortunately the issue is not easily reproducible and a git-bisect is
out of the question.


I tracked down and fixed something like this in the User Mode Linux tty  
implementation many moons ago.


The trick to making it reproducible was to rename a copy of the xterm  
binary to somethingunique, run a shell in said xterm (because cat  
doesn't exercise the tty logic, that's why), run a thing in there  
producing test output, and have a loop in another window doing while  
true; do killall -STOP somethingunique; sleep 1; killall -START  
somethingunique; sleep 1; done.


This forces the pipe buffer to the pty to fill up and exercise the flow  
control logic, plus all the fun short write and retry stuff...


Rob--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread David Howells
Peter Hurley  wrote:

> Based on the other reports from Mikael and David, I suspect this problem
> may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
> 
>   pty: Ignore slave pty close() if never successfully opened
> 
> This commit poisons the pty under certain error conditions that may
> occur from parallel open()s (or parallel close() with pending write()).

In the case that I'm seeing, the tty in question isn't being opened and closed
regularly.  A gcc build is dumping a load of stuff out through tee and tee
writes it to the tty it got as fd 1 from the parent shell.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Mikael Pettersson
Peter Hurley writes:
 > Based on the other reports from Mikael and David, I suspect this problem
 > may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
 > 
 >pty: Ignore slave pty close() if never successfully opened
 > 
 > This commit poisons the pty under certain error conditions that may
 > occur from parallel open()s (or parallel close() with pending write()).
 > 
 > It's unclear to me which error condition is triggered and how user-space
 > got an open file descriptor but that seems the most likely. Is the problem
 > reproducible enough that a debug patch would likely trigger?

In my case the problem occurred frequently enough that I've been forced
to change my build procedures to avoid it.  I'd welcome a debug patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Markus Trippelsdorf
On 2013.06.07 at 11:39 -0400, Peter Hurley wrote:
> On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote:
> > On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:
> >> On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:
> >>> Since v3.8.0 several people reported intermittent IO errors that happen
> >>> during high system load while using "emerge" under Gentoo:
> >>> ...
> >>> File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", 
> >>> line 260, in iteration
> >>>   if not x.callback(f, event, *x.args):
> >>> File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 
> >>> 99, in _output_handler
> >>>   stdout_buf[os.write(stdout_fd, stdout_buf):]
> >>> File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in 
> >>> __call__
> >>>   rval = self._func(*wrapped_args, **wrapped_kwargs)
> >>> OSError: [Errno 5] Input/output error
> 
> Based on the other reports from Mikael and David, I suspect this problem
> may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
> 
>pty: Ignore slave pty close() if never successfully opened
> 
> This commit poisons the pty under certain error conditions that may
> occur from parallel open()s (or parallel close() with pending write()).
> 
> It's unclear to me which error condition is triggered and how user-space
> got an open file descriptor but that seems the most likely. Is the problem
> reproducible enough that a debug patch would likely trigger?

A debug patch would be appreciated.
In my case the problem occurs very irregularly; sometimes it happens a
few times per day, at other times I don't see the bug for three weeks.

But if Mikael and David could also run their kernels with the debug
patch applied, hopefully that's enough to trigger the issue more regularly.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Peter Hurley

On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote:

On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:

On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:

Since v3.8.0 several people reported intermittent IO errors that happen
during high system load while using "emerge" under Gentoo:
...
File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 
260, in iteration
  if not x.callback(f, event, *x.args):
File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, 
in _output_handler
  stdout_buf[os.write(stdout_fd, stdout_buf):]
File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__
  rval = self._func(*wrapped_args, **wrapped_kwargs)
OSError: [Errno 5] Input/output error


Looks to me like a user-space bug: EIO is returned when the other
end of the "pipe" has been closed.

FWIW, I didn't see where the OP tried to revert
 'SpawnProcess: stdout_fd FD_CLOEXEC'

The only non-emerge related comment (#21 in the link provided) refers to
'a similar issue sometimes happened when I built Firefox by hand [..snip..]
 And it would randomly crash during the build.

 Since I've recompiled Python with gcc-4.6 this issue also never occurred
 again.'

That comment doesn't really corroborate the reported bug.


That comment was from me (I use 'octoploid' for blog trolling, etc.) and
is wrong. The Firefox build issue happend again today. See also the rest
of my mail:


(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just
dies at random points without giving a clue.)


Please note that both the Firefox build process and Portage (emerge)
are implemented in Python.


Based on the other reports from Mikael and David, I suspect this problem
may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:

  pty: Ignore slave pty close() if never successfully opened

This commit poisons the pty under certain error conditions that may
occur from parallel open()s (or parallel close() with pending write()).

It's unclear to me which error condition is triggered and how user-space
got an open file descriptor but that seems the most likely. Is the problem
reproducible enough that a debug patch would likely trigger?

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Peter Hurley

On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote:

On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:

On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:

Since v3.8.0 several people reported intermittent IO errors that happen
during high system load while using emerge under Gentoo:
...
File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 
260, in iteration
  if not x.callback(f, event, *x.args):
File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, 
in _output_handler
  stdout_buf[os.write(stdout_fd, stdout_buf):]
File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__
  rval = self._func(*wrapped_args, **wrapped_kwargs)
OSError: [Errno 5] Input/output error


Looks to me like a user-space bug: EIO is returned when the other
end of the pipe has been closed.

FWIW, I didn't see where the OP tried to revert
 'SpawnProcess: stdout_fd FD_CLOEXEC'

The only non-emerge related comment (#21 in the link provided) refers to
'a similar issue sometimes happened when I built Firefox by hand [..snip..]
 And it would randomly crash during the build.

 Since I've recompiled Python with gcc-4.6 this issue also never occurred
 again.'

That comment doesn't really corroborate the reported bug.


That comment was from me (I use 'octoploid' for blog trolling, etc.) and
is wrong. The Firefox build issue happend again today. See also the rest
of my mail:


(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just
dies at random points without giving a clue.)


Please note that both the Firefox build process and Portage (emerge)
are implemented in Python.


Based on the other reports from Mikael and David, I suspect this problem
may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:

  pty: Ignore slave pty close() if never successfully opened

This commit poisons the pty under certain error conditions that may
occur from parallel open()s (or parallel close() with pending write()).

It's unclear to me which error condition is triggered and how user-space
got an open file descriptor but that seems the most likely. Is the problem
reproducible enough that a debug patch would likely trigger?

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Markus Trippelsdorf
On 2013.06.07 at 11:39 -0400, Peter Hurley wrote:
 On 06/06/2013 10:37 AM, Markus Trippelsdorf wrote:
  On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:
  On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:
  Since v3.8.0 several people reported intermittent IO errors that happen
  during high system load while using emerge under Gentoo:
  ...
  File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, 
  line 260, in iteration
if not x.callback(f, event, *x.args):
  File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 
  99, in _output_handler
stdout_buf[os.write(stdout_fd, stdout_buf):]
  File /usr/lib64/portage/pym/portage/__init__.py, line 246, in 
  __call__
rval = self._func(*wrapped_args, **wrapped_kwargs)
  OSError: [Errno 5] Input/output error
 
 Based on the other reports from Mikael and David, I suspect this problem
 may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
 
pty: Ignore slave pty close() if never successfully opened
 
 This commit poisons the pty under certain error conditions that may
 occur from parallel open()s (or parallel close() with pending write()).
 
 It's unclear to me which error condition is triggered and how user-space
 got an open file descriptor but that seems the most likely. Is the problem
 reproducible enough that a debug patch would likely trigger?

A debug patch would be appreciated.
In my case the problem occurs very irregularly; sometimes it happens a
few times per day, at other times I don't see the bug for three weeks.

But if Mikael and David could also run their kernels with the debug
patch applied, hopefully that's enough to trigger the issue more regularly.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread Mikael Pettersson
Peter Hurley writes:
  Based on the other reports from Mikael and David, I suspect this problem
  may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
  
 pty: Ignore slave pty close() if never successfully opened
  
  This commit poisons the pty under certain error conditions that may
  occur from parallel open()s (or parallel close() with pending write()).
  
  It's unclear to me which error condition is triggered and how user-space
  got an open file descriptor but that seems the most likely. Is the problem
  reproducible enough that a debug patch would likely trigger?

In my case the problem occurred frequently enough that I've been forced
to change my build procedures to avoid it.  I'd welcome a debug patch.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-07 Thread David Howells
Peter Hurley pe...@hurleysoftware.com wrote:

 Based on the other reports from Mikael and David, I suspect this problem
 may have to do with my commit 699390354da6c258b65bf8fa79cfd5feaede50b6:
 
   pty: Ignore slave pty close() if never successfully opened
 
 This commit poisons the pty under certain error conditions that may
 occur from parallel open()s (or parallel close() with pending write()).

In the case that I'm seeing, the tty in question isn't being opened and closed
regularly.  A gcc build is dumping a load of stuff out through tee and tee
writes it to the tty it got as fd 1 from the parent shell.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Markus Trippelsdorf
On 2013.06.06 at 18:31 +0100, David Howells wrote:
> Markus Trippelsdorf  wrote:
> 
> > OSError: [Errno 5] Input/output error
> > 
> > Basically 'emerge' just writes the build output to stdout in a loop:
> 
> I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly
> seeing something very similar from the tee run by "fedpkg local" to log its
> output when building a cross-compiler gcc RPM:
> 
>   tee: standard output: Input/output error
>   tee: write error
> 
> Slapping strace on tee after the "Input/output error" line has occurred shows
> that tee is still running and still pushing lines from its stdin to the output
> named on its command line, but has ceased writing to its stdout.

tee uses the same loop as portage, but keeps going in case of an EIO:

  /* In the array of NFILES + 1 descriptors, make
 the first one correspond to standard output.   */
  descriptors[0] = stdout;
  files[0] = _("standard output");
  setvbuf (stdout, NULL, _IONBF, 0);

  for (i = 1; i <= nfiles; i++)
{
  descriptors[i] = (STREQ (files[i], "-")
? stdout
: fopen (files[i], mode_string));
  if (descriptors[i] == NULL)
{
  error (0, errno, "%s", files[i]);
  ok = false;
}
  else
setvbuf (descriptors[i], NULL, _IONBF, 0);
}

  while (1)
{
  bytes_read = read (0, buffer, sizeof buffer);
  if (bytes_read < 0 && errno == EINTR)
continue;
  if (bytes_read <= 0)
break;

  /* Write to all NFILES + 1 descriptors.
 Standard output is the first one.  */
  for (i = 0; i <= nfiles; i++)
if (descriptors[i]
&& fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)
  {
error (0, errno, "%s", files[i]);
descriptors[i] = NULL;
ok = false;
  }
}

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread David Howells
Markus Trippelsdorf  wrote:

> OSError: [Errno 5] Input/output error
> 
> Basically 'emerge' just writes the build output to stdout in a loop:

I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly
seeing something very similar from the tee run by "fedpkg local" to log its
output when building a cross-compiler gcc RPM:

tee: standard output: Input/output error
tee: write error

Slapping strace on tee after the "Input/output error" line has occurred shows
that tee is still running and still pushing lines from its stdin to the output
named on its command line, but has ceased writing to its stdout.

The stdout is (or was) a TTY that's either the communications channel to an
xterm or the channel to an emacs buffer.  With the xterm, the process can be
suspended (CTRL+Z) stuff can be written to the tty and then the process can be
rstarted.  Indeed, stderr for tee goes to the same tty - and must still be
functional otherwise I wouldn't see the aforementioned error messages...

In ps fax, the tee process can be seen at the bottom of the subtree:

 5158 ?S  0:06  \_ /usr/bin/xterm
 5198 pts/3Ss+0:00  |   \_ bash
 5386 pts/3Sl 4:50  |   \_ emacs -geometry 100x77
10258 pts/8SNs+   0:00  |   \_ /usr/bin/python /usr/bin/fedpkg local
10267 pts/8SN+0:00  |   \_ rpmbuild --define _sourcedir /dat
10462 pts/8SN+0:00  |   |   \_ /bin/sh -e /var/tmp/rpm-tmp.x
16248 pts/8SN+0:00  |   |   \_ make -j12
16249 pts/8SN+0:00  |   |   \_ /bin/sh -c fail= fail
16253 pts/8SN+0:00  |   |   \_ /bin/sh -c fail= 
16254 pts/8SN+0:00  |   |   \_ make all-am
16256 pts/8SN+0:00  |   |   \_ /bin/sh -
16265 pts/8SN+0:00  |   |   |   \_ /bin/
16688 pts/8SN+0:00  |   |   |   \_ /
16700 pts/8RN+0:00  |   |   |   
16704 pts/8SN+0:00  |   |   |   
16267 pts/8SN+0:00  |   |   \_ /bin/sh -
16279 pts/8SN+0:00  |   |   |   \_ /bin/
16694 pts/8SN+0:00  |   |   |   \_ /
16707 pts/8RN+0:00  |   |   |   
16713 pts/8SN+0:00  |   |   |   
16289 pts/8SN+0:00  |   |   \_ /bin/sh -
16306 pts/8SN+0:00  |   |   |   \_ /bin/
17045 pts/8SN+0:00  |   |   |   \_ /
17046 pts/8RN+0:00  |   |   |   
17047 pts/8SN+0:00  |   |   |   
16799 pts/8SN+0:00  |   |   \_ /bin/sh -
16801 pts/8SN+0:00  |   |   |   \_ /bin/
17051 pts/8SN+0:00  |   |   |   \_ /
17053 pts/8RN+0:00  |   |   |   
16819 pts/8SN+0:00  |   |   \_ /bin/sh -
16822 pts/8SN+0:00  |   |   |   \_ /bin/
16943 pts/8SN+0:00  |   |   |   \_ /
16950 pts/8RN+0:00  |   |   |   
16951 pts/8SN+0:00  |   |   |   
16903 pts/8SN+0:00  |   |   \_ /bin/sh -
16915 pts/8SN+0:00  |   |   \_ /bin/
17024 pts/8SN+0:00  |   |   \_ /
17026 pts/8RN+0:00  |   |   
17028 pts/8SN+0:00  |   |   
10268 pts/8SN+0:00  |   \_ tee .build-4.8.1-1.1.fc20.log

tee's fds look like:

warthog>ls -l /proc/10268/fd/
total 0
lr-x--. 1 dhowells dhowells 64 Jun  6 18:24 0 -> pipe:[1595614]
lrwx--. 1 dhowells dhowells 64 Jun  6 18:24 1 -> /dev/pts/8
lrwx--. 1 dhowells dhowells 64 Jun  6 18:23 2 -> /dev/pts/8
l-wx--. 1 dhowells dhowells 64 Jun  6 18:24 3 -> /data/fedora/cros

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Mikael Pettersson
Markus Trippelsdorf writes:
 > Since v3.8.0 several people reported intermittent IO errors that happen
 > during high system load while using "emerge" under Gentoo:
 > ...
 >   File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 
 > 260, in iteration
 > if not x.callback(f, event, *x.args):
 >   File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, 
 > in _output_handler
 > stdout_buf[os.write(stdout_fd, stdout_buf):]
 >   File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__
 > rval = self._func(*wrapped_args, **wrapped_kwargs)
 > OSError: [Errno 5] Input/output error
 > 
 > Basically 'emerge' just writes the build output to stdout in a loop:
 > ...
 > def _output_handler(self, fd, event):
 > 
 >background = self.background
 >stdout_fd = self.stdout_fd
 >log_file = self._log_file 
 > 
 >while True:
 >buf = self._read_buf(fd, event)
 > 
 >if buf is None:
 ># not a POLLIN event, EAGAIN, etc...
 >break
 > 
 >if not buf:
 ># EOF
 >self._unregister()
 >self.wait()
 >break
 > 
 >else:
 >if not background and stdout_fd is not None:
 >failures = 0
 >stdout_buf = buf
 >while stdout_buf:
 >try:
 >stdout_buf = \
 >
 > stdout_buf[os.write(stdout_fd, stdout_buf):]
 >except OSError as e:
 >if e.errno != errno.EAGAIN:
 >raise
 >  ...
 > 
 > see: https://bugs.gentoo.org/show_bug.cgi?id=459674
 > 
 > (A similar issue also happens when building Firefox since v3.8.0. But
 > because Firefox's build process doesn't raise an exception it just dies
 > at random points without giving a clue.)
 > 
 > Now the question is: Could this be a kernel bug? Maybe in the TTY layer?
 > 
 > Unfortunately the issue is not easily reproducible and a git-bisect is
 > out of the question.

I'm seeing a similar regression.  I do a lot of gcc bootstraps and regression
test suite runs, and for the bootstraps I do

make -jN bootstrap |& tee build-log

(tcsh syntax, adjust as appropriate for your preferred shell) to get a complete
log for later inspection in case of error.  N is typically the number of cores
or threads on the machine, e.g. -j8 on my Core-i7 IVB.

Up to the 3.7 kernel this never had any problems.  Starting with the 3.8 kernel,
or possibly 3.9-rc1, this usually dies at some random point with an EIO.

I haven't had time to bisect it.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Markus Trippelsdorf
On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:
> On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:
> > Since v3.8.0 several people reported intermittent IO errors that happen
> > during high system load while using "emerge" under Gentoo:
> > ...
> >File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 
> > 260, in iteration
> >  if not x.callback(f, event, *x.args):
> >File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 
> > 99, in _output_handler
> >  stdout_buf[os.write(stdout_fd, stdout_buf):]
> >File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__
> >  rval = self._func(*wrapped_args, **wrapped_kwargs)
> > OSError: [Errno 5] Input/output error
> 
> Looks to me like a user-space bug: EIO is returned when the other
> end of the "pipe" has been closed.
> 
> FWIW, I didn't see where the OP tried to revert
> 'SpawnProcess: stdout_fd FD_CLOEXEC'
> 
> The only non-emerge related comment (#21 in the link provided) refers to
>'a similar issue sometimes happened when I built Firefox by hand [..snip..]
> And it would randomly crash during the build.
> 
> Since I've recompiled Python with gcc-4.6 this issue also never occurred
> again.'
> 
> That comment doesn't really corroborate the reported bug.

That comment was from me (I use 'octoploid' for blog trolling, etc.) and
is wrong. The Firefox build issue happend again today. See also the rest
of my mail:

> (A similar issue also happens when building Firefox since v3.8.0. But
> because Firefox's build process doesn't raise an exception it just
> dies at random points without giving a clue.)

Please note that both the Firefox build process and Portage (emerge)
are implemented in Python.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Peter Hurley

On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:

Since v3.8.0 several people reported intermittent IO errors that happen
during high system load while using "emerge" under Gentoo:
...
   File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 
260, in iteration
 if not x.callback(f, event, *x.args):
   File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, in 
_output_handler
 stdout_buf[os.write(stdout_fd, stdout_buf):]
   File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__
 rval = self._func(*wrapped_args, **wrapped_kwargs)
OSError: [Errno 5] Input/output error


Looks to me like a user-space bug: EIO is returned when the other
end of the "pipe" has been closed.

FWIW, I didn't see where the OP tried to revert
   'SpawnProcess: stdout_fd FD_CLOEXEC'

The only non-emerge related comment (#21 in the link provided) refers to
  'a similar issue sometimes happened when I built Firefox by hand [..snip..]
   And it would randomly crash during the build.

   Since I've recompiled Python with gcc-4.6 this issue also never occurred
   again.'

That comment doesn't really corroborate the reported bug.

Regards,
Peter Hurley


Basically 'emerge' just writes the build output to stdout in a loop:
...
def _output_handler(self, fd, event):

background = self.background
stdout_fd = self.stdout_fd
log_file = self._log_file

while True:
buf = self._read_buf(fd, event)

if buf is None:
# not a POLLIN event, EAGAIN, etc...
break

if not buf:
# EOF
self._unregister()
self.wait()
break

else:
if not background and stdout_fd is not None:
failures = 0
stdout_buf = buf
while stdout_buf:
try:
stdout_buf = \

stdout_buf[os.write(stdout_fd, stdout_buf):]
except OSError as e:
if e.errno != errno.EAGAIN:
raise
...

see: https://bugs.gentoo.org/show_bug.cgi?id=459674

(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just dies
at random points without giving a clue.)

Now the question is: Could this be a kernel bug? Maybe in the TTY layer?

Unfortunately the issue is not easily reproducible and a git-bisect is
out of the question.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Peter Hurley

On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:

Since v3.8.0 several people reported intermittent IO errors that happen
during high system load while using emerge under Gentoo:
...
   File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 
260, in iteration
 if not x.callback(f, event, *x.args):
   File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, in 
_output_handler
 stdout_buf[os.write(stdout_fd, stdout_buf):]
   File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__
 rval = self._func(*wrapped_args, **wrapped_kwargs)
OSError: [Errno 5] Input/output error


Looks to me like a user-space bug: EIO is returned when the other
end of the pipe has been closed.

FWIW, I didn't see where the OP tried to revert
   'SpawnProcess: stdout_fd FD_CLOEXEC'

The only non-emerge related comment (#21 in the link provided) refers to
  'a similar issue sometimes happened when I built Firefox by hand [..snip..]
   And it would randomly crash during the build.

   Since I've recompiled Python with gcc-4.6 this issue also never occurred
   again.'

That comment doesn't really corroborate the reported bug.

Regards,
Peter Hurley


Basically 'emerge' just writes the build output to stdout in a loop:
...
def _output_handler(self, fd, event):

background = self.background
stdout_fd = self.stdout_fd
log_file = self._log_file

while True:
buf = self._read_buf(fd, event)

if buf is None:
# not a POLLIN event, EAGAIN, etc...
break

if not buf:
# EOF
self._unregister()
self.wait()
break

else:
if not background and stdout_fd is not None:
failures = 0
stdout_buf = buf
while stdout_buf:
try:
stdout_buf = \

stdout_buf[os.write(stdout_fd, stdout_buf):]
except OSError as e:
if e.errno != errno.EAGAIN:
raise
...

see: https://bugs.gentoo.org/show_bug.cgi?id=459674

(A similar issue also happens when building Firefox since v3.8.0. But
because Firefox's build process doesn't raise an exception it just dies
at random points without giving a clue.)

Now the question is: Could this be a kernel bug? Maybe in the TTY layer?

Unfortunately the issue is not easily reproducible and a git-bisect is
out of the question.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Markus Trippelsdorf
On 2013.06.06 at 10:18 -0400, Peter Hurley wrote:
 On 06/06/2013 07:54 AM, Markus Trippelsdorf wrote:
  Since v3.8.0 several people reported intermittent IO errors that happen
  during high system load while using emerge under Gentoo:
  ...
 File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 
  260, in iteration
   if not x.callback(f, event, *x.args):
 File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 
  99, in _output_handler
   stdout_buf[os.write(stdout_fd, stdout_buf):]
 File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__
   rval = self._func(*wrapped_args, **wrapped_kwargs)
  OSError: [Errno 5] Input/output error
 
 Looks to me like a user-space bug: EIO is returned when the other
 end of the pipe has been closed.
 
 FWIW, I didn't see where the OP tried to revert
 'SpawnProcess: stdout_fd FD_CLOEXEC'
 
 The only non-emerge related comment (#21 in the link provided) refers to
'a similar issue sometimes happened when I built Firefox by hand [..snip..]
 And it would randomly crash during the build.
 
 Since I've recompiled Python with gcc-4.6 this issue also never occurred
 again.'
 
 That comment doesn't really corroborate the reported bug.

That comment was from me (I use 'octoploid' for blog trolling, etc.) and
is wrong. The Firefox build issue happend again today. See also the rest
of my mail:

 (A similar issue also happens when building Firefox since v3.8.0. But
 because Firefox's build process doesn't raise an exception it just
 dies at random points without giving a clue.)

Please note that both the Firefox build process and Portage (emerge)
are implemented in Python.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Mikael Pettersson
Markus Trippelsdorf writes:
  Since v3.8.0 several people reported intermittent IO errors that happen
  during high system load while using emerge under Gentoo:
  ...
File /usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py, line 
  260, in iteration
  if not x.callback(f, event, *x.args):
File /usr/lib64/portage/pym/portage/util/_async/PipeLogger.py, line 99, 
  in _output_handler
  stdout_buf[os.write(stdout_fd, stdout_buf):]
File /usr/lib64/portage/pym/portage/__init__.py, line 246, in __call__
  rval = self._func(*wrapped_args, **wrapped_kwargs)
  OSError: [Errno 5] Input/output error
  
  Basically 'emerge' just writes the build output to stdout in a loop:
  ...
  def _output_handler(self, fd, event):
  
 background = self.background
 stdout_fd = self.stdout_fd
 log_file = self._log_file 
  
 while True:
 buf = self._read_buf(fd, event)
  
 if buf is None:
 # not a POLLIN event, EAGAIN, etc...
 break
  
 if not buf:
 # EOF
 self._unregister()
 self.wait()
 break
  
 else:
 if not background and stdout_fd is not None:
 failures = 0
 stdout_buf = buf
 while stdout_buf:
 try:
 stdout_buf = \
 
  stdout_buf[os.write(stdout_fd, stdout_buf):]
 except OSError as e:
 if e.errno != errno.EAGAIN:
 raise
   ...
  
  see: https://bugs.gentoo.org/show_bug.cgi?id=459674
  
  (A similar issue also happens when building Firefox since v3.8.0. But
  because Firefox's build process doesn't raise an exception it just dies
  at random points without giving a clue.)
  
  Now the question is: Could this be a kernel bug? Maybe in the TTY layer?
  
  Unfortunately the issue is not easily reproducible and a git-bisect is
  out of the question.

I'm seeing a similar regression.  I do a lot of gcc bootstraps and regression
test suite runs, and for the bootstraps I do

make -jN bootstrap | tee build-log

(tcsh syntax, adjust as appropriate for your preferred shell) to get a complete
log for later inspection in case of error.  N is typically the number of cores
or threads on the machine, e.g. -j8 on my Core-i7 IVB.

Up to the 3.7 kernel this never had any problems.  Starting with the 3.8 kernel,
or possibly 3.9-rc1, this usually dies at some random point with an EIO.

I haven't had time to bisect it.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread David Howells
Markus Trippelsdorf mar...@trippelsdorf.de wrote:

 OSError: [Errno 5] Input/output error
 
 Basically 'emerge' just writes the build output to stdout in a loop:

I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly
seeing something very similar from the tee run by fedpkg local to log its
output when building a cross-compiler gcc RPM:

tee: standard output: Input/output error
tee: write error

Slapping strace on tee after the Input/output error line has occurred shows
that tee is still running and still pushing lines from its stdin to the output
named on its command line, but has ceased writing to its stdout.

The stdout is (or was) a TTY that's either the communications channel to an
xterm or the channel to an emacs buffer.  With the xterm, the process can be
suspended (CTRL+Z) stuff can be written to the tty and then the process can be
rstarted.  Indeed, stderr for tee goes to the same tty - and must still be
functional otherwise I wouldn't see the aforementioned error messages...

In ps fax, the tee process can be seen at the bottom of the subtree:

 5158 ?S  0:06  \_ /usr/bin/xterm
 5198 pts/3Ss+0:00  |   \_ bash
 5386 pts/3Sl 4:50  |   \_ emacs -geometry 100x77
10258 pts/8SNs+   0:00  |   \_ /usr/bin/python /usr/bin/fedpkg local
10267 pts/8SN+0:00  |   \_ rpmbuild --define _sourcedir /dat
10462 pts/8SN+0:00  |   |   \_ /bin/sh -e /var/tmp/rpm-tmp.x
16248 pts/8SN+0:00  |   |   \_ make -j12
16249 pts/8SN+0:00  |   |   \_ /bin/sh -c fail= fail
16253 pts/8SN+0:00  |   |   \_ /bin/sh -c fail= 
16254 pts/8SN+0:00  |   |   \_ make all-am
16256 pts/8SN+0:00  |   |   \_ /bin/sh -
16265 pts/8SN+0:00  |   |   |   \_ /bin/
16688 pts/8SN+0:00  |   |   |   \_ /
16700 pts/8RN+0:00  |   |   |   
16704 pts/8SN+0:00  |   |   |   
16267 pts/8SN+0:00  |   |   \_ /bin/sh -
16279 pts/8SN+0:00  |   |   |   \_ /bin/
16694 pts/8SN+0:00  |   |   |   \_ /
16707 pts/8RN+0:00  |   |   |   
16713 pts/8SN+0:00  |   |   |   
16289 pts/8SN+0:00  |   |   \_ /bin/sh -
16306 pts/8SN+0:00  |   |   |   \_ /bin/
17045 pts/8SN+0:00  |   |   |   \_ /
17046 pts/8RN+0:00  |   |   |   
17047 pts/8SN+0:00  |   |   |   
16799 pts/8SN+0:00  |   |   \_ /bin/sh -
16801 pts/8SN+0:00  |   |   |   \_ /bin/
17051 pts/8SN+0:00  |   |   |   \_ /
17053 pts/8RN+0:00  |   |   |   
16819 pts/8SN+0:00  |   |   \_ /bin/sh -
16822 pts/8SN+0:00  |   |   |   \_ /bin/
16943 pts/8SN+0:00  |   |   |   \_ /
16950 pts/8RN+0:00  |   |   |   
16951 pts/8SN+0:00  |   |   |   
16903 pts/8SN+0:00  |   |   \_ /bin/sh -
16915 pts/8SN+0:00  |   |   \_ /bin/
17024 pts/8SN+0:00  |   |   \_ /
17026 pts/8RN+0:00  |   |   
17028 pts/8SN+0:00  |   |   
10268 pts/8SN+0:00  |   \_ tee .build-4.8.1-1.1.fc20.log

tee's fds look like:

warthogls -l /proc/10268/fd/
total 0
lr-x--. 1 dhowells dhowells 64 Jun  6 18:24 0 - pipe:[1595614]
lrwx--. 1 dhowells dhowells 64 Jun  6 18:24 1 - /dev/pts/8
lrwx--. 1 dhowells dhowells 64 Jun  6 18:23 2 - /dev/pts/8
l-wx--. 1 dhowells dhowells 64 Jun  6 18:24 3 - /data/fedora/cros

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange intermittent EIO error when writing to stdout since v3.8.0

2013-06-06 Thread Markus Trippelsdorf
On 2013.06.06 at 18:31 +0100, David Howells wrote:
 Markus Trippelsdorf mar...@trippelsdorf.de wrote:
 
  OSError: [Errno 5] Input/output error
  
  Basically 'emerge' just writes the build output to stdout in a loop:
 
 I've just upgraded to kernel-3.9.4-200.fc18.x86_64 and I'm now regularly
 seeing something very similar from the tee run by fedpkg local to log its
 output when building a cross-compiler gcc RPM:
 
   tee: standard output: Input/output error
   tee: write error
 
 Slapping strace on tee after the Input/output error line has occurred shows
 that tee is still running and still pushing lines from its stdin to the output
 named on its command line, but has ceased writing to its stdout.

tee uses the same loop as portage, but keeps going in case of an EIO:

  /* In the array of NFILES + 1 descriptors, make
 the first one correspond to standard output.   */
  descriptors[0] = stdout;
  files[0] = _(standard output);
  setvbuf (stdout, NULL, _IONBF, 0);

  for (i = 1; i = nfiles; i++)
{
  descriptors[i] = (STREQ (files[i], -)
? stdout
: fopen (files[i], mode_string));
  if (descriptors[i] == NULL)
{
  error (0, errno, %s, files[i]);
  ok = false;
}
  else
setvbuf (descriptors[i], NULL, _IONBF, 0);
}

  while (1)
{
  bytes_read = read (0, buffer, sizeof buffer);
  if (bytes_read  0  errno == EINTR)
continue;
  if (bytes_read = 0)
break;

  /* Write to all NFILES + 1 descriptors.
 Standard output is the first one.  */
  for (i = 0; i = nfiles; i++)
if (descriptors[i]
 fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)
  {
error (0, errno, %s, files[i]);
descriptors[i] = NULL;
ok = false;
  }
}

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/