On Mon, Sep 08, 2014 at 02:49:46PM +0100, Steven Chamberlain wrote:
> retitle 760841 pspp: random testsuite failures
> thanks
>
> Affects GNU/Linux too! It's odd the issue has only been seen yet on
> kfreebsd rather than linux buildds; I reproduced it on linux-amd64, and
> with similar ~2% probability:
>
> > ~/pspp-0.8.3$ for i in $(seq 1 1000) ; do printf
> > 'e\0n\0t\0r\0\351\0e\0\n\0' | tests/libpspp/u8-istream-test read - Auto |
> > xxd ; done | sort | uniq -c | sort -bn
> > 21 000: 6500 6e00 7400 7200 c3a9 0065 000a 00e.n.t.re...
> > 979 000: 656e 7472 c3a9 650a entr..e.
>
> For sanity, I've checked that printf is working correctly:
>
> > ~/pspp-0.8.3$ for i in $(seq 1 1000) ; do printf
> > 'e\0n\0t\0r\0\351\0e\0\n\0' | xxd ; done | sort | uniq -c | sort -bn
> >1000 000: 6500 6e00 7400 7200 e900 6500 0a00 e.n.t.r...e...
>
> So it may be related to how the tests/libpspp/u8-istream-test program
> does its reads from stdin? I'm also curious why the read is truncated
> only sometimes - possibly a glibc issue there?
Thanks for the analysis.
This is a PSPP bug in the library that u8-istream-test uses. I pushed
the following fix to the PSPP upstream repository. Friedrich, do you
want to apply that to the PSPP packaging? I'll sponsor the upload for
you.
Thanks,
Ben.
--8<--cut here-->8--
From: Ben Pfaff
Date: Tue, 9 Sep 2014 08:51:44 -0700
Subject: [PATCH] u8-istream: Fix handling of partial reads.
The u8-istream code did not retry upon a partial read, assuming that that
was the end of the file. When the partial read was shorter than
ENCODING_GUESS_MIN, this could cause the encoding guesser, in turn, to
guess the wrong encoding (especially if the encoding was really UTF-16 and
the partial read was an odd number of bytes).
Reported at https://bugs.debian.org/760841.
Reported by Friedrich Beckmann and Steven Chamberlain.
---
src/libpspp/u8-istream.c| 20 ++--
tests/libpspp/u8-istream.at | 4 +++-
2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/src/libpspp/u8-istream.c b/src/libpspp/u8-istream.c
index 2486ca1..6347a67 100644
--- a/src/libpspp/u8-istream.c
+++ b/src/libpspp/u8-istream.c
@@ -1,5 +1,5 @@
/* PSPP - a program for statistical analysis.
- Copyright (C) 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
+ Copyright (C) 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -203,14 +203,22 @@ fill_buffer (struct u8_istream *is)
is->head = is->buffer;
/* Read more input. */
+ n = 0;
do
{
- n = read (is->fd, is->buffer + is->length,
-U8_ISTREAM_BUFFER_SIZE - is->length);
+ ssize_t retval = read (is->fd, is->buffer + is->length,
+ U8_ISTREAM_BUFFER_SIZE - is->length);
+ if (retval > 0)
+{
+ n += retval;
+ is->length += retval;
+}
+ else if (retval == 0)
+return n;
+ else if (errno != EINTR)
+return n > 0 ? n : -1;
}
- while (n < 0 && errno == EINTR);
- if (n > 0)
-is->length += n;
+ while (is->length < U8_ISTREAM_BUFFER_SIZE);
return n;
}
diff --git a/tests/libpspp/u8-istream.at b/tests/libpspp/u8-istream.at
index 24af08c..5b5b4e0 100644
--- a/tests/libpspp/u8-istream.at
+++ b/tests/libpspp/u8-istream.at
@@ -140,7 +140,9 @@ AT_CLEANUP
AT_SETUP([read UTF-16 as Auto])
AT_KEYWORDS([u8_istream])
AT_CHECK([i18n-test supports_encodings UTF-16 UTF-16BE UTF-16LE])
-AT_CHECK([printf '\0e\0n\0t\0r\0\351\0e\0\n' | u8-istream-test read - Auto],
+dnl The "sleep 1" checks for a bug in which u8-istream did not properly
+dnl handle receiving data in multiple chunks.
+AT_CHECK([{ printf '\0e\0n\0t\0'; sleep 1; printf 'r\0\351\0e\0\n'; } |
u8-istream-test read - Auto],
[0], [entr??e
])
AT_CHECK([printf 'e\0n\0t\0r\0\351\0e\0\n\0' | u8-istream-test read - Auto],
--
1.9.1
--
To UNSUBSCRIBE, email to debian-bsd-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140909155353.ga30...@blp.benpfaff.org