from:"Paul Eggert"

Bug#456943: bug#15444: One character can be lost if colors are enabled

2024-03-26 Thread Paul Eggert


On 3/25/24 08:49, Vincent Lefevre wrote:

This works fine in Xterm, giving on a 80-column terminal:

...
 
However, this triggers the bug in GNOME Terminal (and other

libvte-based terminals):


That's not good. Is there some escape sequence that will work on both 
xterm and libvte? I assume the space+backspace trick you suggested later 
assumes xterm behavior.

Bug#1067022: man2html: Segmentation fault with tzfile(5)

2024-03-17 Thread Paul Eggert


On 2024-03-16 16:06, Alejandro Colomar wrote:


BTW, I noticed that the upstream homepage is dead:
.
Is this project defunct?


Yes it is. It's been defunct for many years.

Attached is a patch for this particular bug. However, a brief code 
inspection suggests there are lots more core dumps where this came from.


man2html should be discontinued from Debian if nobody wants to maintain 
it, as I don't. There are plenty of substitutes for it, starting with 
groff itself.
--- man-1.6g-debian/man2html/man2html.c	2024-03-17 18:14:12.360162014 -0700
+++ man-1.6g-debian-fix/man2html/man2html.c	2024-03-17 21:21:23.145134418 -0700
@@ -1282,9 +1282,8 @@
 return c;
 }
 
-char *scan_expression(char *c, int *result) {
-int value=0,value2,sign=1,opex=0;
-char oper='c';
+static char *scan_if_expression(char *c, int *result) {
+int value=0;
 
 if (*c=='!') {
 	c=scan_expression(c+1, );
@@ -1328,6 +1327,16 @@
 	if (tcmp) c=c+3;
 	c++;
 } else {
+	return scan_expression(c, result);
+}
+*result=value;
+return c;
+}
+
+char *scan_expression(char *c, int *result) {
+	int value=0,value2,sign=1,opex=0;
+	char oper='c';
+
 	while (*c && !isspace(*c) && *c!=')') {
 	opex=0;
 	switch (*c) {
@@ -1414,9 +1423,8 @@
 	}
 	}
 	if (*c==')') c++;
-}
-*result=value;
-return c;
+	*result=value;
+	return c;
 }
 
 static void
@@ -1956,7 +1964,7 @@
 	 * .if !'string1'string2' anything
 	 */
 	c=c+j;
-	c=scan_expression(c, );
+	c=scan_if_expression(c, );
 	ifelseval=!i;
 	if (i) {
 		*c='\n';

Bug#1058752: bug#62572: Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2024-01-31 Thread Paul Eggert


On 1/31/24 06:06, Pádraig Brady wrote:

To my mind the most protective option takes precedence.


That's not how POSIX works with mv -i and mv -f. The last flag wins. I 
assume this is so that people can have aliases or shell scripts that 
make -i the default, but you can override by specifying -f on the 
command line. E.g., in mymv:


   #!/bin/sh
   mv -i "$@"

then "mymv -f a b" works as expected.

Wouldn't a similar argument apply to cp's --update options?

Or perhaps we should play it safe, and reject any combination of 
--update etc. options that are incompatible. We can always change our 
mind later and say that later options override earlier ones, or do 
something else that's less conservative.

Bug#1058752: bug#62572: Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2024-01-30 Thread Paul Eggert


On 2024-01-30 03:18, Pádraig Brady wrote:

So we now have the proposed change as:

   - revert -n to old silent success behavior
   - document -n as deprecated
   - Leave --update=none as is (will be synonymous with -n)
   - Provide --update=none-fail to diagnose and exit failure


Thanks, that's a better proposal, but I still see several opportunities 
for confusion.


If I understand things correctly, cp --update=none is not synonymous 
with the proposed (i.e., old-behavior) cp -n, because -n overrides 
previous -i options but --update=none does not. Also, -n overrides 
either previous or following --update=UPDATE options, but --update=none 
overrides only previous --update=UPDATE options. (For what it's worth, 
FreeBSD -n overrides


Some of this complication seems to be for consistency with how mv 
behaves with -f, -i, -n, and --update, and similarly with how rm behaves 
with -f, -i, -I, and --interactive. To be honest I don't quite 
understand the reason for all this complexity, which suggests it should 
be documented somewhere (the manual?) if it isn't already.


This raises more questions:

* If we deprecate cp -n, what about mv -n? FreeBSD mv -n behaves like 
Coreutils mv -n: it silently does nothing and exits successfully. So 
there's no compatibility argument for changing mv -n's behavior. 
However, whatever --update option we add to cp (to output a diagnostic 
and exit with failure) should surely also be added to mv, to aid 
consistency.


* Should cp --update=none be changed so that it really behaves like the 
old cp -n, in that it overrides other options in ways that differ from 
how the other --update=UPDATE options behave? I'm leaning toward "no" as 
this adds complexity that I don't see the use for.


* If we don't change cp --update=none's overriding behavior, is it still 
OK to tell users to substitute --update=none for -n even though the two 
options are not exactly equivalent? I'm leaning towards "yes" but would 
like other opinions.

Bug#1058752: bug#62572: Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2024-01-29 Thread Paul Eggert


On 1/29/24 08:11, Pádraig Brady wrote:


Right, that's why I'm still leaning towards my proposal in the last mail.


Well, I won't insist on doing nothing; however, the proposal needs 
ironing out and now's a good time to do it before installing changes.




   - revert to previous exit success -n behavior
   - document -n as deprecated
   - provide --update=noclobber to give exit failure functionality


So --update=noclobber would differ in meaning from the deprecated-in-9.5 
--no-clobber, but would agree in meaning with 9.4 --no-clobber? That 
sounds pretty confusing for future users. (And a nit: why should one 
spelling have a hyphen but the other doesn't?)



     - BTW, it probably makes sense to print a diagnostic for each 
skipped file here
   as it's exceptional behavior, for which we're exiting with 
failure for.


Coreutils 9.4 cp -n already does that, no? So I'm not sure what's being 
proposed here.


  $ touch a b
  $ cp -n a b; echo $?
  cp: not replacing 'b'
  1



   - the existing --update=none provides the exit success functionality


It seems to me that this proposal conflates two questions:

* What rules should cp use to decide whether to update a destination?

* When cp decides not to update a destination, what should it do? Exit 
with nonzero status? Output a diagnostic? Both? Neither?


Aren't these independent axes? If so, shouldn't they have independent 
options? For example, since we have --update=older, shouldn't there be a 
way to say "I want to copy A to B only if B is older than A, and I want 
the exit status to be zero only if A was copied to B"?

Bug#1058752: bug#62572: Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2024-01-28 Thread Paul Eggert


On 2024-01-28 05:22, Pádraig Brady wrote:

At this stage it seems best for us go back to the original Linux 
behiavor (use case 3),
and to silently deprecate -n in docs to document the portability issues 
with it.


I'm not sure reverting would be best. It would introduce more confusion, 
and would make coreutils incompatible with FreeBSD again.


The recent Debian change indicates that their intent is to move to the 
FreeBSD behavior too. This would improve cross-platform portability and 
I don't think we should discourage that.




  $ cp -n /bin/true tmp
  cp: warning: behavior of -n is non-portable and may change in future; use 
--update=none instead

This is problematic as:

  - It's noisy


Yes that's a problem, and I doubt whether we should mimic Debian.


  - There is no way to get the behavior of indicating failure if existing files 
present


Yes, it's not a good place to be. Surely current coreutils is better 
than what Debian is doing.



  - The --update=none advice is only portable to newer coreutils


True, but that's not a deal-killer. No advice that we give can be 100% 
portable to all platforms.



We should also provide --update=noclobber for use case 1.
Having the control on the --update option, allows use to more clearly 
deprecate -n.


Adding an --update=noclobber sounds like a good thing to do.

Another possibility is to add a warning that is emitted only at the end 
of 'cp'. The warning would occur only if the exit code differs because 
of this cp -n business. We could stretch things a bit and have a 
configure-time option --enable-compat-warnings that builders like Debian 
could use if they want such warnings.

Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2023-12-17 Thread Paul Eggert


On 2023-12-16 13:46, Bernhard Voelker wrote:

Whether the implementation is race-prone or not is an internal thing.


I wasn't referring to the internal implementation. I was referring to cp 
users. With the newer Coreutils (FreeBSD) behavior, you can reliably 
write a script to do something if cp -n didn't copy the file because the 
destination already existed. With the older Coreutils behavior you 
cannot do that reliably; there will always be a race condition.

Bug#1058752: bug#62572: cp --no-clobber behavior has changed

2023-12-15 Thread Paul Eggert


On 2023-12-15 10:49, Michael Stone wrote:

There's no compelling reason to force this change


Well, certainly nobody compelled us at gunpoint

Stlll, Pádraig gave a reasonable summary of why the change was made, 
despite its incompatibility with previous behavior. (One thing I'd add 
is that the FreeBSD behavior is inherently less race-prone.) It seemed 
like a good idea at the time all things considered, and to my mind still 
does.




Essentially the current situation is that -n shouldn't be used if you expect a 
certain behavior for this case and you are writing a script for linux systems. 
Maybe in 10 years you'll be able to assume the new behavior. Better to just 
tell people to not use it at all, and leave the historic behavior alone until 
everyone has stopped using -n entirely.


Even if we tell people not to use -n at all, that doesn't mean we should 
revert to the coreutils 9.1 behavior.


The cat is to some extent out of the bag. Unless one insists on (FreeBSD 
| coreutils 9.2-9.4), or insist on coreutils 7.1-9.1, one should not 
rely on cp -n failing or silently succeeding when the destination 
already exists. This will remain true regardless of whether coreutils 
reverts to its 7.1-9.1 behavior.

Bug#1041588: bug#64773: grep 3.11 -r on 100000+ files fails with "Operation not supported"

2023-07-21 Thread Paul Eggert

To fix just this bug (as opposed to the other Gnulib-related bugs that 
may be lurking) try applying the attached Gnulib patch to a grep 3.11 
tarball.

Closing the debbugs.gnu.org bug report, as the bug has been fixed upstream.From d4d8abb39eb02c555f062b1f83ffcfac999c582f Mon Sep 17 00:00:00 2001
From: Bruno Haible 
Date: Fri, 5 May 2023 12:02:49 +0200
Subject: [PATCH] dirfd: Fix bogus override (regression 2023-04-26).

Reported by Bjarni Ingi Gislason  in
.

* m4/dirfd.m4 (gl_FUNC_DIRFD): Fix mistake in last change.
---
 ChangeLog   | 7 +++
 m4/dirfd.m4 | 6 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index aaffe12fc1..5f01a52535 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2023-05-05  Bruno Haible  
+
+	dirfd: Fix bogus override (regression 2023-04-26).
+	Reported by Bjarni Ingi Gislason  in
+	.
+	* m4/dirfd.m4 (gl_FUNC_DIRFD): Fix mistake in last change.
+
 2023-05-04  Bruno Haible  

 	c32swidth: Add tests.
diff --git a/m4/dirfd.m4 b/m4/dirfd.m4
index d1ee2c7f61..7968b1287c 100644
--- a/m4/dirfd.m4
+++ b/m4/dirfd.m4
@@ -1,4 +1,4 @@
-# serial 27   -*- Autoconf -*-
+# serial 28   -*- Autoconf -*-

 dnl Find out how to get the file descriptor associated with an open DIR*.

@@ -40,10 +40,6 @@ AC_DEFUN([gl_FUNC_DIRFD],
 HAVE_DIRFD=0
   else
 HAVE_DIRFD=1
-dnl Replace only if the system declares dirfd already.
-if test $ac_cv_have_decl_dirfd = yes; then
-  REPLACE_DIRFD=1
-fi
 dnl Replace dirfd() on native Windows, to support fdopendir().
 AC_REQUIRE([gl_DIRENT_DIR])
 if test $DIR_HAS_FD_MEMBER = 0; then
-- 
2.39.2

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-12 Thread Paul Eggert


On 2023-03-12 08:28, Alejandro Colomar wrote:


I've pushed a signed tag paul1, so you can safely check that the
repo is mine (since I don't have HTTPS).


Thanks, I'm not sure what exactly this means as I don't contribute to 
shadow-devel. As far as the remaining patches go, please use your best 
judgment as I'm running low on time to worry about this.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 14:02, Alejandro Colomar wrote:

we should use "%s" (if we go the way of snprintf(3)).


Yes, thanks for catching that. However, I came up with a better way that 
avoids snprintf (and strlcpy) entirely both here and the other place I 
used snprintf.


Attached is a revised set of patches that addresses the comments you 
made and embodies my followups.From 324bb0e914b5470650df02bd7b64e963665d44c1 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:01:02 -0800
Subject: [PATCH 1/8] Simplify change_field by using strcpy

* lib/fields.c (change_field): Since we know the string fits,
use strcpy rather than strlcpy.

Signed-off-by: Paul Eggert 
---
 lib/fields.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/fields.c b/lib/fields.c
index 0b5f91b2..8801bce6 100644
--- a/lib/fields.c
+++ b/lib/fields.c
@@ -100,7 +100,6 @@ void change_field (char *buf, size_t maxsize, const char *prompt)
 			cp++;
 		}
 
-		strlcpy (buf, cp, maxsize);
+		strcpy (buf, cp);
 	}
 }
-
-- 
2.37.2

From b13ffb86dcd10e8160eee10bd286fc73937c3e8b Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 13:41:54 -0800
Subject: [PATCH 2/8] Omit unneeded change_field test

* fields.c (change_field): Omit unnecessary test.

Signed-off-by: Paul Eggert 
---
 lib/fields.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/fields.c b/lib/fields.c
index 8801bce6..fa5fd156 100644
--- a/lib/fields.c
+++ b/lib/fields.c
@@ -96,7 +96,7 @@ void change_field (char *buf, size_t maxsize, const char *prompt)
 		*cp = '\0';
 
 		cp = newf;
-		while (('\0' != *cp) && isspace (*cp)) {
+		while (isspace (*cp)) {
 			cp++;
 		}
 
-- 
2.37.2

From 090722a20765cf9a248050524143fce5b68cfe8c Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 13:43:36 -0800
Subject: [PATCH 3/8] Fix change_field buffer underrun

* lib/fields.c (change_field): Don't point
before array start; that has undefined behavior.

Signed-off-by: Paul Eggert 
---
 lib/fields.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/fields.c b/lib/fields.c
index fa5fd156..640be931 100644
--- a/lib/fields.c
+++ b/lib/fields.c
@@ -91,8 +91,9 @@ void change_field (char *buf, size_t maxsize, const char *prompt)
 		 * entering a space.  --marekm
 		 */
 
-		while (--cp >= newf && isspace (*cp));
-		cp++;
+		while (newf < cp && isspace (cp[-1])) {
+			cp--;
+		}
 		*cp = '\0';
 
 		cp = newf;
-- 
2.37.2

From 4982d5f2fe2f2c339568996ebb17a99200d2f106 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:02:45 -0800
Subject: [PATCH 4/8] Prefer strcpy to strlcpy when either works

* lib/gshadow.c (sgetsgent): Use strcpy not strlcpy,
since the string is known to fit.

Signed-off-by: Paul Eggert 
---
 lib/gshadow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/gshadow.c b/lib/gshadow.c
index c17af67f..ca14449a 100644
--- a/lib/gshadow.c
+++ b/lib/gshadow.c
@@ -128,7 +128,7 @@ void endsgent (void)
 		sgrbuflen = len;
 	}
 
-	strlcpy (sgrbuf, string, len);
+	strcpy (sgrbuf, string);
 
 	cp = strrchr (sgrbuf, '\n');
 	if (NULL != cp) {
-- 
2.37.2

From 54fac7560f87a134c4d3045ce7048f4819c4e492 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:38:24 -0800
Subject: [PATCH 5/8] Avoid silent truncation of console file data

* libmisc/console.c (is_listed): Rework so that there is no
fixed-size buffer, and no need to use fgets or strlcpy or strtok.
Instead, the code works with arbitrary-sized input,
without silently truncating data or mishandling NUL
bytes in the console file.

Signed-off-by: Paul Eggert 
---
 libmisc/console.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/libmisc/console.c b/libmisc/console.c
index 7e2132dd..8264e1a3 100644
--- a/libmisc/console.c
+++ b/libmisc/console.c
@@ -24,7 +24,6 @@
 static bool is_listed (const char *cfgin, const char *tty, bool def)
 {
 	FILE *fp;
-	char buf[1024], *s;
 	const char *cons;
 
 	/*
@@ -43,17 +42,17 @@ static bool is_listed (const char *cfgin, const char *tty, bool def)
 	 */
 
 	if (*cons != '/') {
-		char *pbuf;
-		strlcpy (buf, cons, sizeof (buf));
-		pbuf = [0];
-		while ((s = strtok (pbuf, ":")) != NULL) {
-			if (strcmp (s, tty) == 0) {
+		size_t ttylen = strlen (tty);
+		for (;;) {
+			if (strncmp (cons, tty, ttylen) == 0
+			&& (cons[ttylen] == ':' || !cons[ttylen])) {
 return true;
 			}
-
-			pbuf = NULL;
+			cons = strchr (cons, ':');
+			if (!cons)
+return false;
+			cons++;
 		}
-		return false;
 	}
 
 	/*
@@ -70,21 +69,22 @@ static bool is_listed (const char *cfgin, const char *tty, bool def)
 	 * See if this tty is listed in the console file.
 	 */
 
-	while (fgets (buf, sizeof (buf), fp) != NULL) {
-		/* Remove optional trailing '\n'. */
-		buf[strcspn (buf, "\n")] = '\0';
-		if (strcmp (buf, tty) == 0) {
-			(void)

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 14:39, Alejandro Colomar wrote:

I wonder
if the patch is really "simplifying".


It depends on how one measures simplicity. The reader will need to know 
strftime's API regardless; requiring the reader to also know strlcpy's 
API makes the reader's job harder.


Also, it's less machine code to call just the one function (if one cares 
about simplicity of debugging :-).


If you still prefer calling two different functions instead of just one, 
feel free to modify it to use plain strcpy. strlcpy isn't needed here as 
the destination buffers are all big enough. To be honest though I like 
it the way it is (though it could use a comment; I'll add that).

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 13:49, Alejandro Colomar wrote:

+: mempcpy (full_tty, "/dev/", sizeof"/dev/" - 1)),

This is a great use case for stpcpy(3).


I came up with a slightly better approach, that needs neither mempcpy 
nor stpcpy. I plan to send it along soon.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 13:59, Alejandro Colomar wrote:

If the function is allowed
to dereference, then NULL is not allowed, but if the values are
uninitialized, then reading any of them should also trigger UB, no?


Sure, but the standard says that strftime reads only the struct tm 
members needed to interpret the format. If the format contains no 
conversion specs, strftime reads no struct tm members and thus there is 
no UB even if the struct tm is entirely uninitialized.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 14:18, Alejandro Colomar wrote:


What I'm not sure is that strftime(3) requires nonnull.


glibc's strftime implementation segfaults if you pass a null pointer, so 
we can't pass NULL regardless of whether the strftime API in time.h uses 
__attribute__ ((nonnull))'.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 13:59, Alejandro Colomar wrote:

Unless the standard specifically allows us to do so, but I can't find
anything clear.


It's pretty clear if you're a time nerd like me. :-) The standard for 
strftime says "The appropriate characters are determined using the 
LC_TIME category of the current locale and by the values of zero or more 
members of the broken-down time structure pointed to by timeptr, as 
specified in brackets in the description. If any of the specified values 
are outside the normal range, the characters stored are unspecified."


The "zero" means that if no conversion specs are present in the format 
string, then no struct tm members are examined, and it's therefore OK 
for all members to be uninitialized if no conversion specs are present.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert


On 2023-03-11 13:31, Alejandro Colomar wrote:

What's this  exactly for?


It avoids undefined behavior. A call like strftime (buf, sizeof buf, 
"XXX", NULL) has undefined behavior, as near as I can make out. It's OK 
that the dummy is uninitialized.

Bug#1032393: [Pkg-shadow-devel] Bug#1032393: [PATCH v2 2/2] debian/control: Add libbsd-dev and pkg-config

2023-03-11 Thread Paul Eggert

I looked into this, and five of the shadow package's six uses of strlcpy 
are wrong, i.e., they are associated with silent truncation or buffer 
overrun/underrun or dereferencing NULL in nearby code. This isn't 
surprising, as strlcpy is commonly used in code that has been 
slapdashedly hacked to try to make it safer, and in my experience code 
that that has been modified in that way is usually wrong.


Proposed patches attached.

Although there is one correct use of strlcpy, the correct use (in 
sgetsgent) is equivalent to memcpy so there is no need for strlcpy there 
(see patch 0002).
From d40e2f92f3e50d13d87393bd30b2b4b20b89a2d6 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:01:02 -0800
Subject: [PATCH 1/6] Fix undefined behavior in change_field

* lib/fields.c (change_field): Do not ever compute [-1],
as behavior is undefined.  Since we know that the string fits,
use memcpy rather than strlcpy.

Signed-off-by: Paul Eggert 
---
 lib/fields.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/fields.c b/lib/fields.c
index 0b5f91b2..3b119502 100644
--- a/lib/fields.c
+++ b/lib/fields.c
@@ -90,17 +90,17 @@ void change_field (char *buf, size_t maxsize, const char *prompt)
 		 * makes it possible to change the field to empty, by
 		 * entering a space.  --marekm
 		 */
+		char *bp = newf;
 
-		while (--cp >= newf && isspace (*cp));
-		cp++;
+		while (newf < cp && isspace (cp[-1])) {
+			cp--;
+		}
 		*cp = '\0';
 
-		cp = newf;
-		while (('\0' != *cp) && isspace (*cp)) {
-			cp++;
+		while (isspace (*bp)) {
+			bp++;
 		}
 
-		strlcpy (buf, cp, maxsize);
+		memcpy (buf, bp, cp + 1 - bp);
 	}
 }
-
-- 
2.37.2

From 7e88c5914c1fab6c4d88e1ca39d6b6319e7ee768 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:02:45 -0800
Subject: [PATCH 2/6] Prefer memcpy to strlcpy when either works

memcpy is standardized and should be faster here.
* lib/gshadow.c (sgetsgent): Use memcpy not strlcpy,
since the string is known to fit.

Signed-off-by: Paul Eggert 
---
 lib/gshadow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/gshadow.c b/lib/gshadow.c
index c17af67f..1976c9a9 100644
--- a/lib/gshadow.c
+++ b/lib/gshadow.c
@@ -128,7 +128,7 @@ void endsgent (void)
 		sgrbuflen = len;
 	}
 
-	strlcpy (sgrbuf, string, len);
+	memcpy (sgrbuf, string, len);
 
 	cp = strrchr (sgrbuf, '\n');
 	if (NULL != cp) {
-- 
2.37.2

From a1c2e046d52042cf60ff7196a9d9a972573290bd Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:38:24 -0800
Subject: [PATCH 3/6] Avoid silent truncation of console file data

* libmisc/console.c (is_listed): Rework so that there is no
fixed-size buffer, and no need to use fgets or strlcpy or strtok.
Instead, the code works with arbitrary-sized input,
without silently truncating data or mishandling NUL
bytes in the console file.

Signed-off-by: Paul Eggert 
---
 libmisc/console.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/libmisc/console.c b/libmisc/console.c
index 7e2132dd..8264e1a3 100644
--- a/libmisc/console.c
+++ b/libmisc/console.c
@@ -24,7 +24,6 @@
 static bool is_listed (const char *cfgin, const char *tty, bool def)
 {
 	FILE *fp;
-	char buf[1024], *s;
 	const char *cons;
 
 	/*
@@ -43,17 +42,17 @@ static bool is_listed (const char *cfgin, const char *tty, bool def)
 	 */
 
 	if (*cons != '/') {
-		char *pbuf;
-		strlcpy (buf, cons, sizeof (buf));
-		pbuf = [0];
-		while ((s = strtok (pbuf, ":")) != NULL) {
-			if (strcmp (s, tty) == 0) {
+		size_t ttylen = strlen (tty);
+		for (;;) {
+			if (strncmp (cons, tty, ttylen) == 0
+			&& (cons[ttylen] == ':' || !cons[ttylen])) {
 return true;
 			}
-
-			pbuf = NULL;
+			cons = strchr (cons, ':');
+			if (!cons)
+return false;
+			cons++;
 		}
-		return false;
 	}
 
 	/*
@@ -70,21 +69,22 @@ static bool is_listed (const char *cfgin, const char *tty, bool def)
 	 * See if this tty is listed in the console file.
 	 */
 
-	while (fgets (buf, sizeof (buf), fp) != NULL) {
-		/* Remove optional trailing '\n'. */
-		buf[strcspn (buf, "\n")] = '\0';
-		if (strcmp (buf, tty) == 0) {
-			(void) fclose (fp);
-			return true;
+	const char *tp = tty;
+	bool listed = false;
+	for (int c; 0 <= (c = getc (fp)); ) {
+		if (c == '\n') {
+			if (tp && !*tp) {
+listed = true;
+break;
+			}
+			tp = tty;
+		} else if (tp) {
+			tp = *tp == c && c ? tp + 1 : NULL;
 		}
 	}
 
-	/*
-	 * This tty isn't a console.
-	 */
-
 	(void) fclose (fp);
-	return false;
+	return listed;
 }
 
 /*
@@ -105,4 +105,3 @@ bool console (const char *tty)
 
 	return is_listed ("CONSOLE", tty, true);
 }
-
-- 
2.37.2

From 1c8388d1d1831e976cdaa6e6f27bb08bf31aedc5 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 11 Mar 2023 00:42:29 -0800
Subject: [PATCH 4/6] Fix crash with large timestamps

* libmisc/date_t

Bug#930247: bug#36148: Debian Bug#930247: grep: does not handle backreferences correctly, violating POSIX

2023-01-20 Thread Paul Eggert


On 2023-01-20 01:51, Santiago Ruano Rincón wrote:

I'll clone the bug in Debian (and adjust severities), to make it easier
to follow/differentiate both bugs.

Paul, do you want me to do the same in debbugs.gnu.org?


Please don't bother, since the bug is already fixed upstream.

Bug#930247: bug#36148: Debian Bug#930247: grep: does not handle backreferences correctly, violating POSIX

2022-12-05 Thread Paul Eggert


On 12/1/22 17:21, Thorsten Glaser wrote:

Please fix this bug, it’s really bad and embarrassing.


Thanks for reporting it; I wasn't aware of it.

Although you sent your email to 36...@debbugs.gnu.org / 
930247@bugs.debian.9org, your email is reporting a separate bug, and I 
fixed it in the development version of GNU grep by installing the 
attached patch. This patch should appear in the next GNU grep release.


I suggest not closing the original bug reports, since the original bug 
remains. Of course fixes are welcome but they are lower priority.From b061d24916fb9a14da37a3f2a05cb80dc65cfd38 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 5 Dec 2022 14:16:45 -0800
Subject: [PATCH] grep: bug: backref in last of multiple patterns

* NEWS: Mention this.
* src/dfasearch.c (GEAcompile): Trim trailing newline from
the last pattern, even if it has back-references and follows
a pattern that lacks back-references.
* tests/backref: Add test for this bug.
---
 NEWS|  6 ++
 src/dfasearch.c | 25 -
 tests/backref   |  8 
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index da293a3..6c00b2b 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU grep NEWS-*- outline -*-
 
 * Noteworthy changes in release ?.? (-??-??) [?]
 
+** Bug fixes
+
+  When given multiple patterns the last of which has a back-reference,
+  grep no longer sometimes mistakenly matches lines in some cases.
+  [Bug#36148#13 introduced in grep 3.4]
+
 
 * Noteworthy changes in release 3.8 (2022-09-02) [stable]
 
diff --git a/src/dfasearch.c b/src/dfasearch.c
index a71902a..a5b0d90 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -281,20 +281,19 @@ GEAcompile (char *pattern, idx_t size, reg_syntax_t syntax_bits,
   if (compilation_failed)
 exit (EXIT_TROUBLE);
 
-  if (prev <= patlim)
+  if (patlim < prev)
+buflen--;
+  else if (pattern < prev)
 {
-  if (pattern < prev)
-{
-  idx_t prevlen = patlim - prev;
-  buf = xirealloc (buf, buflen + prevlen);
-  memcpy (buf + buflen, prev, prevlen);
-  buflen += prevlen;
-}
-  else
-{
-  buf = pattern;
-  buflen = size;
-}
+  idx_t prevlen = patlim - prev;
+  buf = xirealloc (buf, buflen + prevlen);
+  memcpy (buf + buflen, prev, prevlen);
+  buflen += prevlen;
+}
+  else
+{
+  buf = pattern;
+  buflen = size;
 }
 
   /* In the match_words and match_lines cases, we use a different pattern
diff --git a/tests/backref b/tests/backref
index 510e130..97cb157 100755
--- a/tests/backref
+++ b/tests/backref
@@ -43,4 +43,12 @@ if test $? -ne 2 ; then
 failures=1
 fi
 
+# https://bugs.gnu.org/36148#13
+echo 'Total failed: 2 (1 ignored)' |
+grep -e '^Total failed: 0$' -e '^Total failed: \([0-9]*\) (\1 ignored)$'
+if test $? -ne 1 ; then
+echo "Backref: Multiple -e test, test #5 failed"
+failures=1
+fi
+
 Exit $failures
-- 
2.38.1

Bug#1017711: bug#58956: mark_object, mark_objects(?) crash

2022-11-06 Thread Paul Eggert


On 2022-11-06 11:32, Eli Zaretskii wrote:

My question was whether in this scenario, since the parent Emacs
exits, the child Emacs can get SIGHUP, simply because its parent
exited and the read end of the PTY no longer exists.


Yes, my sense from the few experiments I tried, is that it's a plausible 
scenario, though I never observed it actually happening for Emacs doing 
a subprocess compile.

Bug#1017711: bug#58956: mark_object, mark_objects(?) crash

2022-11-06 Thread Paul Eggert


On 2022-11-05 22:51, Eli Zaretskii wrote:


But is it possible for a program like Emacs to get SIGHUP in such a
situation, or is that highly improbable?  We have standard streams of
the inferior Emacs process connected via PTYs to the parent process, I
believe -- does that deliver SIGHUP or SIGPIPE when the parent exits?


It depends on the OS and the app that invokes Emacs and how that app 
itself was invoked. It's a hairy area.


On a POSIX platform it's certainly *possible* for Emacs to get SIGHUP in 
that situation, because a user can invoke the shell command 'kill -s HUP 
P', where P is the process ID of the inferior Emacs. Whether it's 
*likely* is a bit harder to say. I ran a few little experiments on 
Fedora 36 and Ubuntu 22.10 and found SIGHUP being sent in a few 
situations and not others and didn't have the time or patience to suss 
out exactly why or when.

Bug#1017711: bug#58956: mark_object, mark_objects(?) crash

2022-11-05 Thread Paul Eggert


On 2022-11-04 00:00, Eli Zaretskii wrote:

We need to establish what is the
source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
situations where Emacs launched an async subprocess to do native
compilation (which is another Emacs process in a --batch session), and
the parent Emacs session is terminated by the user before the async
compilation runs to completion.  Would the child Emacs process get
SIGHUP in this scenario?


Hard for me to say. It's a messy area, with kernels (and Emacs itself) 
sending SIGHUP on various whims.


Does the attached patch fix things? It builds on your commit 
190a6853708ab22072437f6ebd93beb3ec1a9ce6 dated 2020-12-04; I don't know 
why that earlier patch was installed, but it would seem to apply to 
SIGHUP and SIGTERM as well as it applies to SIGINT.diff --git a/src/emacs.c b/src/emacs.c
index 1b2aa9442b..92e2299a04 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -432,9 +432,9 @@ terminate_due_to_signal (int sig, int backtrace_limit)
   if (sig == SIGTERM || sig == SIGHUP || sig == SIGINT)
 	{
 	  /* Avoid abort in shut_down_emacs if we were interrupted
-		 by SIGINT in noninteractive usage, as in that case we
+		 in noninteractive usage, as in that case we
 		 don't care about the message stack.  */
-	  if (sig == SIGINT && noninteractive)
+	  if (noninteractive)
 		clear_message_stack ();
 	  Fkill_emacs (make_fixnum (sig), Qnil);
 	}

Bug#1019724: bug#57604: Bug#1019724: warning: stray \ before - causes autopkgtest failure

2022-09-19 Thread Paul Eggert


On 9/19/22 05:32, Santiago Ruano Rincón wrote:


as you can read below, there are 4235 packages including the
warning in their build logs. Funnily, grep is also in the list :-)


Grep is on the list because Debian indirectly requires ucf to build 
Grep, and ucf issues the warning about stray \ because ucf mistakenly 
uses a Perlism in a grep regular expression 
. This particular warning doesn't break 
anything; it merely alerts installers of a screwup that happens to work 
but relies on undefined results.


We're thinking about adding a configure-time option to Grep to disable 
warnings about egrep/fgrep, to address the original Grep bug report 
. I'm not so sure about disabling warnings 
about bad escapes, as these warnings are so often a win and so rarely a 
loss, as is the case with ucf. Of course there is a tradeoff here 
between (a) having to wade through a bunch of annoying warnings, and (b) 
fixing packages so that they don't rely on undefined results.


Since the main issue here seems to be libtool-related test failures, how 
about patching libtool and letting the affected packages use the patched 
libtool? You can find a patch here:


https://savannah.gnu.org/patch/index.php?10282
https://savannah.gnu.org/patch/download.php?file_id=53720

The libtool test failures are false alarms, so another option would be 
to ignore the failures until libtool gets fixed.



For more on this thorny topic, please see:

https://www.gnu.org/software/grep/manual/html_node/Problematic-Expressions.html

The stray \ issue is the 19th bullet.

Bug#922552: [bug-diffutils] bug#36488: diffutils 3.7 make check failure ppc64le opensuse on colors test

2021-08-29 Thread Paul Eggert

On 8/28/21 8:40 AM, Thiago Jung Bauermann via bug-diffutils via All 
diffutils discussion. wrote:



I believe this is the same problem reported in bug 34519.
The Debian build also fails with "diff: standard output: Broken pipe".


Thanks for tracking that down. Frédéric's analysis in 
<https://bugs.debian.org/922552#19> was helpful.


I found some time to look into this bug, and installed into 
Savannah.gnu.org diffutils the attached patch, which I hope fixes the 
bug although I don't have the relevant platform to test it. Please give 
it a try.


Once this patch is part of a release, Debian shouldn't need any patches 
for diffutils.


For now I am closing the diffutils bug report 
<https://bugs.gnu.org/36488>; if I was too optimistic and the patch 
doesn't fix things we can always reopen it.
From 9b20182d48481c7ca647ff8926feeb8e1da4f7b0 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 28 Aug 2021 23:49:32 -0700
Subject: [PATCH] diff: cleanup signal handling just before exit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This should fix an unlikely signal handling bug with colored
output, and should also fix a Debian FTBFS (Fails To Build From
Source) on powerpc64le-linux.  See Bug#34519 and Frédéric
Bonnard’s report in:
https://bugs.debian.org/922552#19
* bootstrap.conf (gnulib_modules): Add raise, sigprocmask.
* src/diff.c (main): Call cleanup_signal_handlers before exiting.
Don’t bother calling ‘exit’; no longer needed nowadays.
* src/util.c (sigprocmask, siginterrupt) [!SA_NOCLDSTOP]:
Define to 0 instead of empty, since the results are now used.
(sigset_t) [!SA_NOCLDSTOP]: Remove; we now rely on Gnulib.
(xsigaction) [SA_NOCLDSTOP]: New function.
(xsigaddset, xsigismember, xsignal, xsigprocmask): New functions.
(some_signals_caught): New static var.
(process_signals): Omit a conditional branch.
Don’t bother loading interrupt_signal if stop_signal_count is nonzero.
(process_signals, install_signal_handlers):
Check for failures from sigprocmask etc.
(sig, nsig): Now at top level, since multiple functions need them.
(install_signal_handlers): No need for caught_sig array;
just use caught_signals.  However, set some_signals_caught.
(cleanup_signal_handlers): New function.
---
 bootstrap.conf |   2 +
 src/diff.c |   2 +-
 src/diff.h |   1 +
 src/util.c | 187 -
 4 files changed, 127 insertions(+), 65 deletions(-)

diff --git a/bootstrap.conf b/bootstrap.conf
index 6560e9a..e51b2d8 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -65,11 +65,13 @@ mktime
 nstrftime
 progname
 propername
+raise
 rawmemchr
 readme-release
 regex
 sh-quote
 signal
+sigprocmask
 stat
 stat-macros
 stat-time
diff --git a/src/diff.c b/src/diff.c
index a4e5538..3b901aa 100644
--- a/src/diff.c
+++ b/src/diff.c
@@ -853,7 +853,7 @@ main (int argc, char **argv)
   print_message_queue ();
 
   check_stdout ();
-  exit (exit_status);
+  cleanup_signal_handlers ();
   return exit_status;
 }
 
diff --git a/src/diff.h b/src/diff.h
index 03f00a6..f346b43 100644
--- a/src/diff.h
+++ b/src/diff.h
@@ -388,6 +388,7 @@ extern struct change *find_change (struct change *);
 extern struct change *find_reverse_change (struct change *);
 extern enum changes analyze_hunk (struct change *, lin *, lin *, lin *, lin *);
 extern void begin_output (void);
+extern void cleanup_signal_handlers (void);
 extern void debug_script (struct change *);
 extern void fatal (char const *) __attribute__((noreturn));
 extern void finish_output (void);
diff --git a/src/util.c b/src/util.c
index dd6d3bf..8e676c8 100644
--- a/src/util.c
+++ b/src/util.c
@@ -31,10 +31,9 @@
present.  */
 #ifndef SA_NOCLDSTOP
 # define SA_NOCLDSTOP 0
-# define sigprocmask(How, Set, Oset) /* empty */
-# define sigset_t int
+# define sigprocmask(How, Set, Oset) 0
 # if ! HAVE_SIGINTERRUPT
-#  define siginterrupt(sig, flag) /* empty */
+#  define siginterrupt(sig, flag) 0
 # endif
 #endif
 
@@ -160,16 +159,63 @@ print_message_queue (void)
 }
 }
 
-/* The set of signals that are caught.  */
 
+#if SA_NOCLDSTOP
+static void
+xsigaction (int sig, struct sigaction const *restrict act,
+	struct sigaction *restrict oact)
+{
+  if (sigaction (sig, act, oact) != 0)
+pfatal_with_name ("sigaction");
+}
+#endif
+
+static void
+xsigaddset (sigset_t *set, int sig)
+{
+  if (sigaddset (set, sig) != 0)
+pfatal_with_name ("sigaddset");
+}
+
+static bool
+xsigismember (sigset_t const *set, int sig)
+{
+  int mem = sigismember (set, sig);
+  if (mem < 0)
+pfatal_with_name ("sigismember");
+  assume (mem == 1);
+  return mem;
+}
+
+typedef void (*signal_handler) (int);
+static signal_handler
+xsignal (int sig, signal_handler func)
+{
+  signal_handler h = signal (sig, func);
+  if (h == SIG_ERR)
+pfatal_with_name ("signal");
+  return h;
+}
+
+static void
+xsigprocmask (int how, sigset_t const *restrict set, sig

Bug#940852: tzdata should install tzdata.zi file

2019-09-20 Thread Paul Eggert


Package: tzdata
Version: 2019c-1

The upstream tzdata by default installs a file /usr/share/zoneinfo/tzdata.zi 
that contains version info along with the exact source used to generate the TZif 
binary files. This file was introduced in tzdb 2017c but apparently Debian 
hasn't picked it up yet. It is installed by Fedora, and I assume by other 
distributions. Please add it to Debian too. Proposed patch attached.
diff -pru src-1/debian/rules src-2/debian/rules
--- src-1/debian/rules	2019-09-11 14:01:07.0 -0700
+++ src-2/debian/rules	2019-09-20 13:19:23.657688442 -0700
@@ -47,6 +47,9 @@ override_dh_auto_build-indep:
 	# Generate a posixrules file
 	/usr/sbin/zic -d $(TZGEN) -p America/New_York

+	# Generate a tzdata.zi file
+	$(MAKE) VERSION_DEPS= tzdata.zi
+
 	# Replace hardlinks by symlinks
 	rdfind -outputname /dev/null -makesymlinks true -removeidentinode false $(TZGEN)
 	symlinks -r -s -c $(TZGEN)
diff -pru src-1/debian/tzdata.install src-2/debian/tzdata.install
--- src-1/debian/tzdata.install	2018-07-09 15:46:15.0 -0700
+++ src-2/debian/tzdata.install	2019-09-20 13:16:09.977451131 -0700
@@ -1,6 +1,7 @@
 debian/tzconfig /usr/sbin
 tzgen/* usr/share/zoneinfo/
 iso3166.tab usr/share/zoneinfo/
+tzdata.zi usr/share/zoneinfo/
 zone.tab usr/share/zoneinfo/
 zone1970.tab usr/share/zoneinfo/
 leap-seconds.list usr/share/zoneinfo/

Bug#867283: Crash in glibc's mktime in low-memory situations

2018-09-02 Thread Paul Eggert


I proposed a patch here:

https://sourceware.org/bugzilla/show_bug.cgi?id=21716#c1

Please give it a try.

Bug#897653: tar 1.30 breaks pristine-tar

2018-05-14 Thread Paul Eggert


On 05/14/2018 07:56 AM, Antonio Terceiro wrote:

I still need to study the  > code a bit further to try to come up with a better 
suggestion.
Sorry, the only suggestion I can make is that you should just use the 
new GNU tar. The old one was obviously busted and it generated busted 
tarballs.

Bug#883733: bug#29613: Debian Bug#883733: grep returns 0 even if there is no match

2017-12-08 Thread Paul Eggert


On 12/08/2017 03:11 AM, Santiago R.R. wrote:

$ echo 1 | grep -E '^(11+)\1+$|^1?$' ; echo $?
1
0

Shouldn't the last grep command exit 1 too?


Yes it should. This appears to be due to a longstanding bug in the glibc 
regular expression matcher. See:


https://sourceware.org/bugzilla/show_bug.cgi?id=11053

Bug#721358: bug#28574: cross compilnng, man pages

2017-10-01 Thread Paul Eggert


Pádraig Brady wrote:


+   && : $${TZ=UTC} && export TZ\


That should be UTC0 instead of UTC, as POSIX says that TZ=UTC is not portable. 
Other than that it looks good to me.

Bug#498336: bug#28306: grep: option to filter non-printable characters from contents

2017-08-31 Thread Paul Eggert


Santiago R.R. wrote:

What's your position on this?


Sounds like a reasonable option, though I think I might make it another form of 
coloring rather than a separate option.

Bug#532541: bug#27931: grep -o fails to count empty lines (Debain Bug #532541)

2017-08-03 Thread Paul Eggert


On 08/03/2017 06:28 AM, Santiago R.R. wrote:

the -o option, which is supposed to return only the matching
parts of the search, fails:


It's not failing. It's behaving as documented: -o outputs only nonempty 
matches. Otherwise, commands like 'grep -o "a*"' would output a separate 
line for each byte in the input. Although this behavior for -o is 
longstanding and is documented in the manual, it's not in the grep 
--help output so that's an oversight. I installed the attached to fix 
grep --help, and am closing the bug report on the GNU side.


Users who want to match empty lines can use 'grep "^$"', which is what 
I'd expect them to do anyway (-o would be superfluous there even if it 
included empty matches).


>From fe06a81c1fdaeda10bfdde82b43e2b18bfd1de5e Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 3 Aug 2017 13:07:01 -0700
Subject: [PATCH] doc: improve -o help

* src/grep.c (usage): Document that -o outputs only nonempty
matches (Bug#27931).
---
 src/grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/grep.c b/src/grep.c
index 8d22aec..dd338d9 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1949,7 +1949,7 @@ Output control:\n\
   --label=LABEL use LABEL as the standard input file name prefix\n\
 "));
   printf (_("\
-  -o, --only-matching   show only the part of a line matching PATTERN\n\
+  -o, --only-matching   show only nonempty parts of lines matching PATTERN\n\
   -q, --quiet, --silent suppress all normal output\n\
   --binary-files=TYPE   assume that binary files are TYPE;\n\
 TYPE is 'binary', 'text', or 'without-match'\n\
-- 
2.13.3

Bug#870576: Configure Emacs --without-pop or (in Emacs 26) --with-mailutils

2017-08-02 Thread Paul Eggert


Package: emacs25
Version: 25.1+1-4

Debian ships Emacs with the default configuration, which means it installs a 
separate program 'movemail' that retrieves email via the POP3 protocol. When it 
uses POP3, 'movemail' supports only unencrypted mail transfer, which is a 
significant security problem for people reading their email.


To avoid this problem, I suggest that Debian build emacs via './configure 
--without-pop', as this disables POP in movemail. Although this will remove a 
feature, the feature is so insecure that it cannot be recommended.


When Emacs 26 comes out, its ./configure program will have an option 
--with-mailutils, and I suggest that Debian use this option and make the 
'mailutils' package a prerequisite for Emacs. This will add support for 
encrypted POP3 email, thus restoring the POP3 capability lost by using 
--without-pop.


Thanks.

Bug#863002: bug#27763: egrep.sh: grep missing path

2017-07-19 Thread Paul Eggert


Santiago Ruano Rincón wrote:


As suggested by this user, it would be better if egrep/fgrep script
calls grep using its absolute path.


Debian bug 863002 doesn't explain why it would be better, as the original bug 
report is evidently a case of misunderstanding how PATH works.


Although we used to do it the way you're suggesting, the current way is better 
for users who want to specify their own 'grep' command with their own option 
preferences, and to have these preferences also apply to 'egrep' and 'fgrep'. See:


https://debbugs.gnu.org/cgi/bugreport.cgi?bug=19998

Bug#852617: autoconf: AC_SYS_LARGEFILE should output to CPPFLAGS

2017-01-25 Thread Paul Eggert


On 01/25/2017 10:24 AM, Zack Weinberg wrote:

The ChangeLog entry for the addition says "Import AC_SYS_LARGEFILE
from largefile.m4 serial 12", so that sounds like there was an add-on
.m4 file with the same functionality floating around prior to that - I
don't know where to find copies of that file.


It's from gnulib, which in turn got it from coreutils.

Gnulib still has m4/largefile.m4, although now it's merely a copy of 
what's in (the next version of) Autoconf. That is, people use the Gnulib 
largefile.m4 because they don't want to wait for the next release of 
Autoconf to come out.

Bug#851934: bug#23035: date: regression in timezone printing (+%Z)

2017-01-20 Thread Paul Eggert

Thanks for the heads-up. Rather than add that tzset call, which is a bit of a 
hack, I'd rather make parse_datetime2 more reentrant so that it's immune to this 
problem. So I installed this:


http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4e6e16b3f43ce96302b1e52e48730c1f15e18c14

into Gnulib to improve the parse_datetime2 API, and installed the attached 
patches into coreutils. This uncovered a bug in one of our recently-added test 
cases, which the attached patches also fix.
From 22767d84c2d80a66d2fc886f55872616432b786d Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 20 Jan 2017 18:08:03 -0800
Subject: [PATCH 1/2] build: update gnulib submodule to latest

---
 gnulib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnulib b/gnulib
index 0e68c6a..4e6e16b 16
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 0e68c6a37ed08fc553dd6fb097d97d798dcfa40d
+Subproject commit 4e6e16b3f43ce96302b1e52e48730c1f15e18c14
-- 
2.9.3

From 8b1bb0fa4859ff8460c9f7ecb94ce411d9baa9b3 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 20 Jan 2017 18:24:02 -0800
Subject: [PATCH 2/2] date: fix TZ= regression

Problem reported by Paul Wise for Debian, in:
https://bugs.debian.org/851934
This is fallout from the fix for GNU Bug#23035.
* src/date.c (batch_convert): New args TZ and TZSTRING.
All uses changed.
(batch_convert, main): Adjust to parse_datetime2 API change.
(main): Allocate time zone object.
* tests/misc/date-debug.sh: Fix incorrect test case,
caught by the fix.
* tests/misc/date.pl: Test the fix.
---
 src/date.c   | 14 +-
 tests/misc/date-debug.sh |  4 ++--
 tests/misc/date.pl   |  6 ++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/src/date.c b/src/date.c
index eed0901..eb7c624 100644
--- a/src/date.c
+++ b/src/date.c
@@ -286,7 +286,8 @@ Show the local time for 9AM next Friday on the west coast of the US\n\
Return true if successful.  */
 
 static bool
-batch_convert (const char *input_filename, const char *format, timezone_t tz)
+batch_convert (const char *input_filename, const char *format,
+   timezone_t tz, char const *tzstring)
 {
   bool ok;
   FILE *in_stream;
@@ -320,7 +321,8 @@ batch_convert (const char *input_filename, const char *format, timezone_t tz)
   break;
 }
 
-  if (! parse_datetime2 (, line, NULL, parse_datetime_flags))
+  if (! parse_datetime2 (, line, NULL,
+ parse_datetime_flags, tz, tzstring))
 {
   if (line[line_length - 1] == '\n')
 line[line_length - 1] = '\0';
@@ -502,10 +504,11 @@ main (int argc, char **argv)
 }
 }
 
-  timezone_t tz = tzalloc (getenv ("TZ"));
+  char const *tzstring = getenv ("TZ");
+  timezone_t tz = tzalloc (tzstring);
 
   if (batch_file != NULL)
-ok = batch_convert (batch_file, format, tz);
+ok = batch_convert (batch_file, format, tz, tzstring);
   else
 {
   bool valid_date = true;
@@ -545,7 +548,8 @@ main (int argc, char **argv)
   if (set_datestr)
 datestr = set_datestr;
   valid_date = parse_datetime2 (, datestr, NULL,
-parse_datetime_flags);
+parse_datetime_flags,
+tz, tzstring);
 }
 }
 
diff --git a/tests/misc/date-debug.sh b/tests/misc/date-debug.sh
index 06de8dd..48f4605 100755
--- a/tests/misc/date-debug.sh
+++ b/tests/misc/date-debug.sh
@@ -48,10 +48,10 @@ date: new date/time = '(Y-M-D) 1990-12-14 00:00:00 TZ=+09:00'
 date: '(Y-M-D) 1990-12-14 00:00:00 TZ=+09:00' = 661100400 epoch-seconds
 date: after time adjustment (+0 hours, -90 minutes, +0 seconds, +0 ns),
 date: new time = 661095000 epoch-seconds
-date: output timezone: -06:00 (set from TZ="America/Belize" environment value)
+date: output timezone: +09:00 (set from TZ="Asia/Tokyo" environment value)
 date: final: 661095000.0 (epoch-seconds)
 date: final: (Y-M-D) 1990-12-13 13:30:00 (UTC0)
-date: final: (Y-M-D) 1990-12-13 07:30:00 (output timezone TZ=-06:00)
+date: final: (Y-M-D) 1990-12-13 22:30:00 (output timezone TZ=+09:00)
 Thu Dec 13 07:30:00 CST 1990
 EOF
 
diff --git a/tests/misc/date.pl b/tests/misc/date.pl
index 519c247..f026909 100755
--- a/tests/misc/date.pl
+++ b/tests/misc/date.pl
@@ -291,6 +291,12 @@ my @Tests =
   {ERR => "date: invalid date 'TZ=\"\"\"'\n"},
   {EXIT => 1},
  ],
+
+ # https://bugs.debian.org/851934#10
+ ['cross-TZ-mishandled', "-d 'TZ=\"EST5\" 1970-01-01 00:00'",
+  {ENV => 'TZ=PST8'},
+  {OUT => 'Wed Dec 31 21:00:00 PST 1969'},
+ ],
 );
 
 # Repeat the cross-dst test, using Jan 1, 2005 and every interval from 1..364.
-- 
2.9.3

Bug#842339: [Bug-tar] possible fixes for CVE-2016-6321

2016-10-29 Thread Paul Eggert

Thanks for the heads-up. Yes, it appears the 2003 change was not sufficiently 
paranoid about ".." in member names. Luckily, the tar manual still documents the 
pre-2003 behavior, so we can restore that behavior as a simple bug fix. I 
installed the attached patch into Savannah as one way to do that. This patch 
causes 'tar' to issue two diagnostics when given a member name containing "..", 
and I suppose tar should be cleaned up at some point to issue just one 
diagnostic. The main thing, though, is that the patch is simple and fixes the 
security gotcha in question.


I don't view this as a serious bug, as the tar manual has long said that you 
should extract untrusted tarballs only into empty directories, and doing that 
forestalls the attack even without this patch. (There are other reasons for this 
longstanding recommendation.)
From 99ceaad4d0efd8669b373e1f542f7205f2548456 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@penguin.cs.ucla.edu>
Date: Sat, 29 Oct 2016 21:04:40 -0700
Subject: [PATCH] When extracting, skip ".." members

* NEWS: Document this.
* src/extract.c (extract_archive): Skip members whose names
contain "..".
---
 NEWS  | 8 +++-
 src/extract.c | 8 
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index caa77bc..07daaa7 100644
--- a/NEWS
+++ b/NEWS
@@ -1,9 +1,15 @@
-GNU tar NEWS - User visible changes. 2016-05-27
+GNU tar NEWS - User visible changes. 2016-10-29
 Please send GNU tar bug reports to <bug-...@gnu.org>
 
 
 version 1.29.90 (Git)
 
+* Member names containing '..' components are now skipped when extracting.
+
+This fixes tar's behavior to match its documentation, and is a bit
+safer when extracting untrusted archives over old files (an unsafe
+practice that the tar manual has long recommended against).
+
 * Report erroneous use of positional options.
 
 During archive creation or update, tar keeps track of positional
diff --git a/src/extract.c b/src/extract.c
index f982433..7904148 100644
--- a/src/extract.c
+++ b/src/extract.c
@@ -1629,12 +1629,20 @@ extract_archive (void)
 {
   char typeflag;
   tar_extractor_t fun;
+  bool skip_dotdot_name;
 
   fatal_exit_hook = extract_finish;
 
   set_next_block_after (current_header);
 
+  skip_dotdot_name = (!absolute_names_option
+		  && contains_dot_dot (current_stat_info.orig_file_name));
+  if (skip_dotdot_name)
+ERROR ((0, 0, _("%s: Member name contains '..'"),
+	quotearg_colon (current_stat_info.orig_file_name)));
+
   if (!current_stat_info.file_name[0]
+  || skip_dotdot_name
   || (interactive_option
 	  && !confirm ("extract", current_stat_info.file_name)))
 {
-- 
2.7.4

Bug#783122: tzdata: Wrong data for Europe/Minsk

2016-10-01 Thread Paul Eggert

This problem should go away (or at least be different) once Ubuntu updates to tz 
2016g, which no longer uses "MSK" to abbreviate Minsk Time. 2016g uses "+03" 
instead, as part of the push to use numeric time zone abbreviations instead of 
inventing alphabetic ones.

Bug#831673: bug#24024: grep: Mixing "max-count" and "after-context" outputs too few lines

2016-09-07 Thread Paul Eggert


Given after-context=3 it is expected to output at least 4 lines
as documented, but adding max-count=1 makes it stop on the next
matching line.


Thanks for reporting this. Although grep's behavior is documented ("context does 
not include matching lines" in the node General Output Control) the 
documentation could be clearer and I installed the attached patch.
From 90a2dd8b7f93ef0a8f08741e6fcb07220f9549f6 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Wed, 7 Sep 2016 22:22:37 -0700
Subject: [PATCH] doc: define "context lines"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reported by Igor Bogomazov via Santiago Ruano Rincón (Bug#24024).
* doc/grep.texi (Context Line Control): Define "context lines".
---
 doc/grep.texi | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/doc/grep.texi b/doc/grep.texi
index 80768dd..7e51d45 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -338,6 +338,7 @@ do
 done
 @end example
 
+@cindex context lines
 When @command{grep} stops after @var{num} matching lines,
 it outputs any trailing context lines.
 Since context does not include matching lines,
@@ -501,8 +502,11 @@ even those that contain newline characters.
 @node Context Line Control
 @subsection Context Line Control
 
+@cindex context lines
+@dfn{Context lines} are non-matching lines that are near a matching line.
+They are output only if one of the following options are used.
 Regardless of how these options are set,
-@command{grep} will never print any given line more than once.
+@command{grep} never outputs any given line more than once.
 If the @option{-o} (@option{--only-matching}) option is specified,
 these options have no effect and a warning is given upon their use.
 
@@ -530,7 +534,7 @@ Print @var{num} lines of leading context before matching 
lines.
 @opindex -C
 @opindex --context
 @opindex -@var{num}
-@cindex context
+@cindex context lines
 Print @var{num} lines of leading and trailing output context.
 
 @item --group-separator=@var{string}
-- 
2.7.4

Bug#806331: [Reproducible-builds] [xz-devel] Re: xz-utils: make the selected POSIX shell stable accross build environments

2016-06-15 Thread Paul Eggert


On 06/15/2016 01:44 PM, Ximin Luo wrote:

In such a case, it is a bug to be using $POSIX_SHELL - which only tests for conformance 
with POSIX and not these "other bugs that make it unusable".
Gnulib can't test for all POSIX violations, only for the ones it knows 
about. CONFIG_SHELL lets the user override Gnulib's guess in 
environments where the guess is wrong. This sort of thing has been in 
Gnulib (and Autoconf) for ages, I expect many people have grown used to 
it, and I'm leery of changing this just for the purpose of reproducible 
builds. For reproducible builds, I suggest configuring with 
CONFIG_SHELL=/bin/sh as that should make the build reproducible without 
having to change Autoconf or Gnulib.


More generally, 'configure' and reproducible builds are competing 
objectives. 'configure' aims to guess characteristics of the target 
environment by depending on details of the build environment; in 
contrast, reproducible builds want to suppress details of the build 
environment whenever possible. Probably the best way to marry these two 
is for the reproducible build to start with a reproducible environment, 
and setting CONFIG_SHELL to a known value is one step in that direction.

Bug#186568: bug#22793: grep -E assertion failure with back references

2016-02-25 Thread Paul Eggert


arn...@skeeve.com wrote:

Paul Eggert <egg...@cs.ucla.edu> wrote:


With recent 'grep' you can work around the problem by configuring
--with-included-regex.


Not so. I did a fresh

./boostrap
./configure --with-included-regex
make

and it still core dumps:

$ echo abc | ./src/grep -E  '(.*)(.*)(.*)\3\2\1'
grep: regexec.c:1413: pop_fail_stack: Assertion `((Idx) (num) < ((Idx) -2))' 
failed.
Aborted (core dumped)

I looked at it in a debugger fs->num before the --fs->num executes looks to
be -1.


Sorry, you're right. I got confused into thinking that grep Bug#22793 and grep 
Bug#21513 are the same bugs, but they're not. I have unmerged them.


This is still a glibc bug, not a grep bug; it's just that we don't have a fix.

grep Bug#21513 is indeed fixed by configuring --with-included-regex.

Bug#186568: bug#22793: grep -E assertion failure with back references

2016-02-24 Thread Paul Eggert

With recent 'grep' you can work around the problem by configuring 
--with-included-regex. That has some other undesirable properties, though.


This is really a glibc bug 
 and the glibc 
patch could be applied to the Debian copy of glibc. Here's the patch:


http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=5513b40999149090987a0341c018d05d3eea1272

In other words, Debian bug #186568 is really a glibc bug, not a grep 
bug. Can you please fix the Debian bug report accordingly? I'll CC: this 
message there.

Bug#809007: [bug-diffutils] bug#22245: [la...@debian.org: Bug#809007: diffutils: FTBFS: FAIL: test-update-copyright.sh]

2015-12-26 Thread Paul Eggert


Santiago Vila wrote:

I find a little bit odd that only the left brace is escaped in
the git commit above. Sure, it will remove the warning about the
left brace, but it looks a little bit inconsistent.


It depends on which sort of consistency one wants. I mildly prefer omitting 
backslashes when not needed, as this helps avoid usages like:


grep '\[.*\]' *.c

which has undefined behavior due to the '\]'. Admittedly that's a POSIX regular 
expression, not Perl.

Bug#801825: autogen: non-free file "doc/gendocs_template" (CC-BY-ND-3.0)

2015-10-16 Thread Paul Eggert


Bruce Korb wrote:

This file comes from gnulib.


Its copyright notice came from Texinfo and I assume that was from its original 
contributor.  Ludovic, do you know what's going on with the copyright notice of 
doc/gendocs_template?

Bug#772901: os-prober wrong output with grep 2.21 or later

2014-12-11 Thread Paul Eggert


Package: os-prober
Version: 1.65
Tags: patch

os-prober uses 'grep' in an unportable way, in that it assumes that the regular 
expression . matches the NUL byte (all zero bits).  POSIX doesn't guarantee 
this, and as of grep 2.21 this might not work.  If os-prober assumes GNU grep 
the fix should be fairly straightforward; please see the attached (untested) 
patch.  If os-prober is intended to be portable to non-GNU grep, more hacking 
will be needed, as POSIX says grep has undefined behavior when given binary 
input data.


For more details about this issue please see:

https://bugzilla.redhat.com/show_bug.cgi?id=1172405
https://bugzilla.redhat.com/show_bug.cgi?id=1172804
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19348
diff -pru os-prober/os-probes/mounted/x86/20microsoft os-prober-fix/os-probes/mounted/x86/20microsoft
--- os-prober/os-probes/mounted/x86/20microsoft	2014-11-12 07:19:18.0 -0800
+++ os-prober-fix/os-probes/mounted/x86/20microsoft	2014-12-11 19:25:06.749770598 -0800
@@ -31,19 +31,19 @@ if item_in_dir -q bootmgr $2; then
 	for boot in $(item_in_dir boot $2); do
 		bcd=$(item_in_dir bcd $2/$boot)
 		if [ -n $bcd ]; then
-			if grep -qs W.i.n.d.o.w.s. .8 $2/$boot/$bcd; then
+			if grep -aqs W.i.n.d.o.w.s. .8 $2/$boot/$bcd; then
 long=Windows 8 (loader)
-			elif grep -qs W.i.n.d.o.w.s. .7 $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .7 $2/$boot/$bcd; then
 long=Windows 7 (loader)
-			elif grep -qs W.i.n.d.o.w.s. .V.i.s.t.a $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .V.i.s.t.a $2/$boot/$bcd; then
 long=Windows Vista (loader)
-			elif grep -qs W.i.n.d.o.w.s. .S.e.r.v.e.r. .2.0.0.8. .R.2. $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .S.e.r.v.e.r. .2.0.0.8. .R.2. $2/$boot/$bcd; then
 long=Windows Server 2008 R2 (loader)
-			elif grep -qs W.i.n.d.o.w.s. .S.e.r.v.e.r. .2.0.0.8. $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .S.e.r.v.e.r. .2.0.0.8. $2/$boot/$bcd; then
 long=Windows Server 2008 (loader)
-			elif grep -qs W.i.n.d.o.w.s. .R.e.c.o.v.e.r.y. .E.n.v.i.r.o.n.m.e.n.t $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .R.e.c.o.v.e.r.y. .E.n.v.i.r.o.n.m.e.n.t $2/$boot/$bcd; then
 long=Windows Recovery Environment (loader)
-			elif grep -qs W.i.n.d.o.w.s. .S.e.t.u.p $2/$boot/$bcd; then
+			elif grep -aqs W.i.n.d.o.w.s. .S.e.t.u.p $2/$boot/$bcd; then
 long=Windows Recovery Environment (loader)
 			else
 long=Windows Vista (loader)
diff -pru os-prober/os-probes/mounted/x86/83haiku os-prober-fix/os-probes/mounted/x86/83haiku
--- os-prober/os-probes/mounted/x86/83haiku	2014-09-28 14:04:17.0 -0700
+++ os-prober-fix/os-probes/mounted/x86/83haiku	2014-12-11 19:32:45.177765083 -0800
@@ -13,7 +13,7 @@ case $type in
 	*) debug $partition is not a BeFS partition: exiting; exit 1 ;;
 esac
 
-if head -c 512 $partition | grep -qs system.haiku_loader; then
+if head -c 512 $partition | grep -aqs system.haiku_loader; then
 	debug Stage 1 bootloader found
 else
 	debug Stage 1 bootloader not found: exiting

Bug#699325: Emacs 24.3 occasionally crashes (segfault) just after starting it

2014-10-12 Thread Paul Eggert

I audited the Emacs trunk source code for getenv-related races that have 
undefined behavior and could have the reported symptoms.  I found some other 
races and installed a fix for them as Emacs trunk bzr 118095.  I expect this 
patch to be harder to backport to older Emacs versions, and less urgent as the 
races appear to be less likely.


Since we have fixes installed in the trunk I'll take the liberty of closing the 
Emacs bug report.  Please let us know if the bug occurs even with the fixes; if 
that happens I plan to reopen the bug report and look into it further.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#699325: Emacs 24.3 occasionally crashes (segfault) just after starting it

2014-10-11 Thread Paul Eggert

The failure scenario described in https://bugs.debian.org/699325#17 was fixed 
in Emacs trunk bzr 111064; see Bug#13054 http://bugs.gnu.org/13054.  This fix 
is in the next Emacs release, and the fix should be easily backportable to older 
Emacs releases.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-16 Thread Paul Eggert


Paul Eggert wrote:

Attached are some proposed patches which should improve the performance
of grep -P when applied to binary files, among other things.  I have
some other ideas for boosting performance further but thought I'd
publish these first.


I pushed those patches, along with the attached further patches to fix 
up some porting glitches and bugs I encountered in subsequent testing. 
I plan to follow up soon on Bug#18454 with more performance-related 
patches in this area.
From 53c5d9fd50b6895b886c1d19d0851562fc03e00c Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Tue, 16 Sep 2014 17:29:40 -0700
Subject: [PATCH 07/10] grep: avoid false alarms for mb_clen and to_uchar

* cfg.mk (_gl_TS_unmarked_extern_functions): New var,
to bypass the tight_scope false alarms on mb_clen and to_uchar.
---
 cfg.mk | 4 
 1 file changed, 4 insertions(+)

diff --git a/cfg.mk b/cfg.mk
index 947d184..3316b5d 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -28,6 +28,10 @@ local-checks-to-skip =   \
 # Tools used to bootstrap this package, used for announcement.
 bootstrap-tools = autoconf,automake,gnulib
 
+# The tight_scope test gets confused about inline functions.
+# like 'to_uchar'.
+_gl_TS_unmarked_extern_functions = main usage mb_clen to_uchar
+
 # Now that we have better tests, make this the default.
 export VERBOSE = yes
 
-- 
1.9.3

From 493ddec2e61d48953600575896a5d3ce1d1a582b Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Mon, 15 Sep 2014 22:25:21 -0700
Subject: [PATCH 08/10] grep: use mbclen cache in one more place

* src/grep.c (fgrep_to_grep_pattern): Use mb_clen here, too.
---
 src/grep.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/grep.c b/src/grep.c
index 72a811e..e4379bc 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1912,8 +1912,7 @@ fgrep_to_grep_pattern (size_t len, char const *keys,
 
   for (; len; keys += n, len -= n)
 {
-  wchar_t wc;
-  n = mbrtowc (wc, keys, len, mb_state);
+  n = mb_clen (keys, len, mb_state);
   switch (n)
 {
 case (size_t) -2:
-- 
1.9.3

From 219f10596c17e38b2716673a140c2b3827549862 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Mon, 15 Sep 2014 17:27:58 -0700
Subject: [PATCH 09/10] grep: port -P speedup to hosts lacking
 PCRE_STUDY_JIT_COMPILE

* src/pcresearch.c (Pcompile): Do not assume that
PCRE_STUDY_JIT_COMPILE is defined.
(empty_match): Define on all platforms.
---
 src/pcresearch.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 95877e3..ce65758 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -33,10 +33,6 @@ static pcre *cre;
 /* Additional information about the pattern.  */
 static pcre_extra *extra;
 
-/* Table, indexed by ! (flag  PCRE_NOTBOL), of whether the empty
-   string matches when that flag is used.  */
-static int empty_match[2];
-
 # ifdef PCRE_STUDY_JIT_COMPILE
 static pcre_jit_stack *jit_stack;
 # else
@@ -44,6 +40,10 @@ static pcre_jit_stack *jit_stack;
 # endif
 #endif
 
+/* Table, indexed by ! (flag  PCRE_NOTBOL), of whether the empty
+   string matches when that flag is used.  */
+static int empty_match[2];
+
 void
 Pcompile (char const *pattern, size_t size)
 {
@@ -129,11 +129,11 @@ Pcompile (char const *pattern, size_t size)
   pcre_assign_jit_stack (extra, NULL, jit_stack);
 }
 
-  empty_match[false] = pcre_exec (cre, extra, , 0, 0, PCRE_NOTBOL, NULL, 0);
-  empty_match[true] = pcre_exec (cre, extra, , 0, 0, 0, NULL, 0);
-
 # endif
   free (re);
+
+  empty_match[false] = pcre_exec (cre, extra, , 0, 0, PCRE_NOTBOL, NULL, 0);
+  empty_match[true] = pcre_exec (cre, extra, , 0, 0, 0, NULL, 0);
 #endif /* HAVE_LIBPCRE */
 }
 
-- 
1.9.3

From 530fd765922b16643c78652ef036024fc4dd72eb Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Mon, 15 Sep 2014 18:33:19 -0700
Subject: [PATCH 10/10] grep: fix -P speedup bug with empty match

* src/pcresearch.c (NSUB): New top-level constant, replacing
'nsub' within Pexecute.
(Pcompile, Pexecute): Use it.
(Pexecute): Don't assume sub[1] is zero after a PCRE_ERROR_BADUTF8
match failure.
* tests/pcre-invalid-utf8-input: Test for this bug.
---
 src/pcresearch.c  | 32 +++-
 tests/pcre-invalid-utf8-input |  5 +
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index ce65758..c41f7ef 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -44,6 +44,10 @@ static pcre_jit_stack *jit_stack;
string matches when that flag is used.  */
 static int empty_match[2];
 
+/* This must be at least 2; everything after that is for performance
+   in pcre_exec.  */
+enum { NSUB = 300 };
+
 void
 Pcompile (char const *pattern, size_t size)
 {
@@ -132,8 +136,10 @@ Pcompile (char const *pattern, size_t size)
 # endif
   free (re);
 
-  empty_match[false] = pcre_exec (cre, extra, , 0, 0, PCRE_NOTBOL

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-14 Thread Paul Eggert

Attached are some proposed patches which should improve the performance 
of grep -P when applied to binary files, among other things.  I have 
some other ideas for boosting performance further but thought I'd 
publish these first.  Please give them a try if you have the time.  I 
doubt whether this will solve the performance problem entirely with -P 
and encoding errors but at least it should be heading in the right 
direction.
From ad34b7d8556e9fc274690666ac6ded2b6576feb3 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Sun, 14 Sep 2014 11:42:08 -0700
Subject: [PROPOSED PATCH 1/6] grep: remove/refactor unnecessary code about
 line splitting

* src/grep.c (do_execute): Remove.  Caller now uses 'execute'.
* src/pcresearch.c (Pexecute): Improve comment about this.
---
 src/grep.c   | 45 +
 src/pcresearch.c |  7 +--
 2 files changed, 6 insertions(+), 46 deletions(-)

diff --git a/src/grep.c b/src/grep.c
index 1f801e9..719dff1 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1048,49 +1048,6 @@ prtext (char const *beg, char const *lim)
   outleft -= n;
 }
 
-/* Invoke the matcher, EXECUTE, on buffer BUF of SIZE bytes.  If there
-   is no match, return (size_t) -1.  Otherwise, set *MATCH_SIZE to the
-   length of the match and return the offset of the start of the match.  */
-static size_t
-do_execute (char const *buf, size_t size, size_t *match_size)
-{
-  size_t result;
-  const char *line_next;
-
-  /* With the current implementation, using --ignore-case with a multi-byte
- character set is very inefficient when applied to a large buffer
- containing many matches.  We can avoid much of the wasted effort
- by matching line-by-line.
-
- FIXME: this is just an ugly workaround, and it doesn't really
- belong here.  Also, PCRE is always using this same per-line
- matching algorithm.  Either we fix -i, or we should refactor
- this code---for example, we could add another function pointer
- to struct matcher to split the buffer passed to execute.  It would
- perform the memchr if line-by-line matching is necessary, or just
- return buf + size otherwise.  */
-  if (! (execute == Fexecute || execute == Pexecute)
-  || MB_CUR_MAX == 1 || !match_icase)
-return execute (buf, size, match_size, NULL);
-
-  for (line_next = buf; line_next  buf + size; )
-{
-  const char *line_buf = line_next;
-  const char *line_end = memchr (line_buf, eolbyte,
- (buf + size) - line_buf);
-  if (line_end == NULL)
-line_next = line_end = buf + size;
-  else
-line_next = line_end + 1;
-
-  result = execute (line_buf, line_next - line_buf, match_size, NULL);
-  if (result != (size_t) -1)
-return (line_buf - buf) + result;
-}
-
-  return (size_t) -1;
-}
-
 /* Scan the specified portion of the buffer, matching lines (or
between matching lines if OUT_INVERT is true).  Return a count of
lines printed. */
@@ -1104,7 +1061,7 @@ grepbuf (char const *beg, char const *lim)
   for (p = beg; p  lim; p = endp)
 {
   size_t match_size;
-  size_t match_offset = do_execute (p, lim - p, match_size);
+  size_t match_offset = execute (p, lim - p, match_size, NULL);
   if (match_offset == (size_t) -1)
 {
   if (!out_invert)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 3475d4a..0c5220d 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -149,8 +149,11 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
   int e = PCRE_ERROR_NOMATCH;
   char const *line_end;
 
-  /* PCRE can't limit the matching to single lines, therefore we have to
- match each line in the buffer separately.  */
+  /* pcre_exec mishandles matches that cross line boundaries.
+ PCRE_MULTILINE isn't a win, partly because it's incompatible with
+ -z, and partly because it checks the entire input buffer and is
+ therefore slow on a large buffer containing many matches.
+ Avoid these problems by matching line-by-line.  */
   for (; p  buf + size; p = line_start = line_end + 1)
 {
   line_end = memchr (p, eolbyte, buf + size - p);
-- 
1.9.3

From b7b7711dd072c335a45dbf09115b1597fed2ae76 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Sun, 14 Sep 2014 11:44:12 -0700
Subject: [PROPOSED PATCH 2/6] grep: speed up -P on files containing many
 multibyte errors

* src/pcresearch.c (empty_match): New var.
(Pcompile): Set it.
(Pexecute): Use it.
---
 src/pcresearch.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 0c5220d..95877e3 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -33,6 +33,10 @@ static pcre *cre;
 /* Additional information about the pattern.  */
 static pcre_extra *extra;
 
+/* Table, indexed by ! (flag  PCRE_NOTBOL), of whether the empty
+   string matches when that flag is used

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-12 Thread Paul Eggert


Vincent Lefevre wrote:

Glibc regards it as ASCII:


You're right.  Sorry, I was confused.  FreeBSD, Solaris, and AIX work 
the way that I thought, though.  Plus, in GNU regular expressions the 
pattern . works the way that I thought with LC_ALL=C; my guess 
(without investigating this) is that this is because whoever wrote the 
regex code assumed the BSDish behavior.  Arguably this is a glitch in 
the GNU regex code, in that for consistency . should not match 
encoding errors in unibyte locales.


Here's a pair of test cases to illustrate the glitch:

$ printf '\200\n' | LC_ALL=en_US.utf8 grep '.' | wc
  0   0   0
$ printf '\200\n' | LC_ALL=C grep '.' | wc
  1   0   2


I just mean that grep . is a method given by some people, that
was working before UTF-8.


And it still works, if by . one means match one character.

Unfortunately there is no POSIX regular expression that does what you're 
looking for (match either one character, or a single byte that is an 
encoding error).  This is because POSIX says the behavior is undefined 
on encoding errors.  The GNU syntax for regular expressions extends 
POSIX and does not dump core, but it still provides no way to write the 
pattern you're asking for, and the behavior is unspecified on encoding 
errors.  Perhaps this should be improved by fixing the abovementioned 
glitch and by providing a syntax extension for matching encoding errors, 
though we'd need a volunteer to do that.


The situation with libpcre is weirder: there's a pattern '\C' for 
matching a single byte even if it's an encoding error, but as far as I 
can tell there's no way to use regular expressions safely on arbitrary 
data containing encoding errors unless you're in unibyte mode (in which 
case '\C' provides no extra power).  I.e., \C appears to be useless in 
any program for which undefined behavior is unacceptable.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-12 Thread Paul Eggert


On 09/12/2014 02:29 PM, Vincent Lefevre wrote:

an option to control what happens on encoding errors would be better 
and sufficient.


It might suffice for your use cases, but it's more complicated and less 
flexible than being able to match bytes within the regular expression.  
(Plus, someone would have to implement it, which is perhaps the biggest 
objection to either approach )  But I take your point that \C is 
best avoided.  This whole area is pretty hairy, I'm afraid.


Speaking of hairy, why doesn't grep use PCRE_MULTILINE?  Using 
PCRE_MULTILINE shouldn't be that hard, and should boost performance 
quite a bit in typical usage.  Or am I being too optimistic here?



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-12 Thread Paul Eggert


Vincent Lefevre wrote:

I wonder whether anyone is interested in matching individual bytes
in a file regarded as UTF-8 encoded. This seems weird.


It's not weird at all.  For example, suppose we invent the notation 
[[:error:]] to match encoding errors.  Then the pattern '[[:error:]]' 
would match all encoding errors in a file, which could well be a useful 
thing.


Currently, for example, the tz package http://www.iana.org/time-zones 
has a Make rule 'check_character_set' that verifies that the source 
files are all properly encoded.  It executes this shell command:


! grep -nv '^.*$' file names

This relies on GNU grep's behavior that . does not match an encoding 
error.  But it's a command that is not obvious.  It'd be simpler and 
clearer to write this:


! grep -n '[[:error:]]' file names

if such a feature were available.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-12 Thread Paul Eggert


Vincent Lefevre wrote:


But both of these solutions have the drawback of working only in
UTF-8 locales.


Not at all; '[[:error:]]' would match a single-byte encoding error in 
the current locale.  The tz database is interested in UTF-8 so it sets 
the LC_ALL environment variable to a UTF-8 locale, but that setting 
shouldn't be required in general.


Also, the tz database needs grep patterns that iconv doesn't support. 
For example, one rule is that commentary (which starts with #) can 
contain UTF-8 characters, but the ordinary data (before the #) is 
limited to a smaller set.  This is captured by the command:


grep -Env '^[ordinarycharset]*(#.*)?$'

where 'ordinarycharset' is the set of ASCII characters in ordinary tz 
data.  Here it's useful that '.' does not match encoding errors on 
GNU/Linux.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-12 Thread Paul Eggert

Come to think of it, grep -P misbehaves badly in multibyte locales that 
are not UTF-8.  It should report an error and exit rather than output 
gibberish.  I installed the attached patch to catch that.


From cac91e3e233b769d60d7b5d6bc0e8afc67c0c713 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Fri, 12 Sep 2014 19:06:27 -0700
Subject: [PATCH] grep: diagnose -P in non-UTF-8 multibyte locale

* src/pcresearch.c (Pcompile):
libpcre supports only unibyte and UTF-8 locales,
so report an error and exit if used in other locales.
* NEWS: Mention this.
* tests/euc-mb: Test this.
---
 NEWS | 3 +++
 src/pcresearch.c | 8 ++--
 tests/euc-mb | 4 
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index 3624b76..36bb48f 100644
--- a/NEWS
+++ b/NEWS
@@ -19,6 +19,9 @@ GNU grep NEWS-*- outline 
-*-
   The GREP_OPTIONS environment variable is now obsolescent, and grep
   now warns if it is used.  Please use an alias or script instead.
 
+  In locales with multibyte character encodings other than UTF-8,
+  grep -P now reports an error and exits instead of misbehaving.
+
 * Noteworthy changes in release 2.20 (2014-06-03) [stable]
 
 ** Bug fixes
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 17e0e32..3475d4a 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -52,13 +52,17 @@ Pcompile (char const *pattern, size_t size)
   char const *ep;
   char *re = xnmalloc (4, size + 7);
   int flags = (PCRE_MULTILINE
-   | (match_icase ? PCRE_CASELESS : 0)
-   | (using_utf8 () ? PCRE_UTF8 : 0));
+   | (match_icase ? PCRE_CASELESS : 0));
   char const *patlim = pattern + size;
   char *n = re;
   char const *p;
   char const *pnul;
 
+  if (using_utf8 ())
+flags |= PCRE_UTF8;
+  else if (MB_CUR_MAX != 1)
+error (EXIT_TROUBLE, 0, _(-P supports only unibyte and UTF-8 locales));
+
   /* FIXME: Remove these restrictions.  */
   if (memchr (pattern, '\n', size))
 error (EXIT_TROUBLE, 0, _(the -P option only supports a single pattern));
diff --git a/tests/euc-mb b/tests/euc-mb
index aa254ca..6a9a845 100755
--- a/tests/euc-mb
+++ b/tests/euc-mb
@@ -40,4 +40,8 @@ make_input BABAAB  exp || framework_failure_
 compare exp out || fail=1
 make_input BABABA |euc_grep AB; test $? = 1 || fail=1
 
+# -P supports only unibyte and UTF-8 locales.
+LC_ALL=$locale grep -P x /dev/null
+test $? = 2 || fail=1
+
 Exit $fail
-- 
1.9.3

Bug#758105: handling bytes not part of the charset, and other garbage

2014-09-11 Thread Paul Eggert


Vincent Lefevre wrote:


There's no reason that '.' matches something that doesn't belong to
the charset in C locale, but doesn't match in a UTF-8 locale.


In the C locale on GNU/Linux, all byte values are members of the 
charset.  That is why it's OK for '.' to accept that byte in the C 
locale but reject it in a UTF-8 locale.



It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text


This has been true for quite some time in 'grep', at least with the 
standard matchers.  It may not have been true for -P but that relied on 
undefined behavior that could crash grep, and we can't have that.


It would make sense to add a notation to mean match any character or 
invalid byte, as an extension.  That'd take some work, though.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert


Vincent Lefevre wrote:


I've just reported a new Debian concerning the performance problem.


It's not clear from http://bugs.debian.org/761157 that the performance 
problem occurs only with -P, but I assume that's what is meant.


Since this is a performance bug with PCRE, I suggest moving the Debian 
bug report to the Debian libpcre3 package.  Grep cannot go back to the 
old way, which could cause grep to crash, and the bug cannot be fixed in 
grep because libpcre3 does not provide a fast way to search arbitrary 
data that may include encoding errors.  It really is a problem that 
requires changes to libpcre3 to fix; grep cannot fix it.


In the meantime, in order to use 'grep' to search for strings in 
arbitrary data, I suggest omitting the '-P'.  Also, I suggest using the 
C locale.


As the GNU bug 18266 grep -P and invalid exits with error has been 
fixed, I'm closing that bug report.  Please feel free to open a separate 
GNU bug report for the performance issue.


PS.  While composing this email I noticed another bug in grep -P and 
encoding errors, which I fixed by installing the attached patch.
From fb39b32b12be0c6114f09d51818cd703161b104e Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Thu, 11 Sep 2014 09:52:01 -0700
Subject: [PATCH] grep: fix false matches with -P '...$' and invalid UTF-8

* src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching
initial substrings of a line.
---
 src/pcresearch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 4e2ccf8..17e0e32 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -163,7 +163,8 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
 break;
   valid_bytes = sub[0];
   e = pcre_exec (cre, extra, p, valid_bytes, 0,
- options | PCRE_NO_UTF8_CHECK, sub, nsub);
+ options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL,
+ sub, nsub);
   if (e != PCRE_ERROR_NOMATCH)
 break;
   p += valid_bytes + 1;
-- 
1.9.3

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert


On 09/11/2014 11:37 AM, Jim Meyering wrote:

Would you mind adding a test to trigger that one?


Ordinarily I would have done that already but this -P stuff is so buggy 
and slow that I got discouraged.  (If we keep having trouble with -P I 
may start lobbying to remove it) Anyway, I gave it a shot with the 
attached further patch.
From 266b8d4485053a6733e11d43a66c09d080c520fa Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Thu, 11 Sep 2014 12:05:19 -0700
Subject: [PATCH] grep: fix false matches with -P '...$' and invalid UTF-8

* tests/pcre-invalid-utf8-input: Add a test for that.
---
 tests/pcre-invalid-utf8-input | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input
index f42e0dd..9da4b18 100755
--- a/tests/pcre-invalid-utf8-input
+++ b/tests/pcre-invalid-utf8-input
@@ -13,9 +13,12 @@ require_en_utf8_locale_
 
 fail=0
 
-printf 'j\202\nj\n'  in || framework_failure_
+printf 'j\202j\nj\nk\202\n'  in || framework_failure_
 
 LC_ALL=en_US.UTF-8 grep -P j in
 test $? -eq 0 || fail=1
 
+LC_ALL=en_US.UTF-8 grep -P 'k$' in
+test $? -eq 1 || fail=1
+
 Exit $fail
-- 
1.9.3

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-11 Thread Paul Eggert


Vincent Lefevre wrote:

the C locale corresponds to ANSI_X3.4-1968,


No it doesn't, at least not on any current platform I'm aware of.  And 
POSIX does not require that.  POSIX even allows the C locale to be 
multibyte, e.g., UTF-8.



I would say that this should be the same for invalid
byte sequences in a UTF-8 locale.


One *could* design an encoding with that property, but it wouldn't be 
UTF-8; it would be something else.  I don't know of any C library that 
does that to UTF-8.  There are good arguments against doing it, e.g., 
one loses the property that one can concatenate character strings by 
concatenating their byte representations.


Anyway I'm afraid we may be going off the deep end here.  After all, 
grep can't impose its coding system design onto the operating system; 
it's more the other way around.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: handling bytes not part of the charset, and other garbage

2014-09-11 Thread Paul Eggert


Vincent Lefevre wrote:


ypig% LC_ALL=C locale charmap
ANSI_X3.4-1968


That may be what the 'locale' command says, but bytes with the top bit 
on are considered to be valid single-byte characters.  There are no 
encoding errors.  So, in that sense it's not strict ASCII.



the current behavior breaks the sometimes used grep . solution
to match non-empty lines.


grep . matches lines containing one or more characters.  Encoding 
errors are not characters, at least, not as far as plain grep is concerned.


Perhaps PCRE is different, and if libpcre worked with encoding errors we 
could simply use its way of matching them.  But there doesn't seem to be 
a safe way to do that.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Paul Eggert


Paul Eggert wrote:

perhaps there's a PCRE version dependency here?


I found a PCRE-version-dependent problem that may be relevant, and 
installed the attached further patch to fix it.
From dc7d532d16dec740d11b6817c9b558543aca0136 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Wed, 10 Sep 2014 00:04:58 -0700
Subject: [PATCH] grep: port recent fix to older pcre version

* src/pcresearch.c (Pexecute): Don't assume that a pcre_exec
that returns PCRE_ERROR_NOMATCH leaves its sub argument alone.
This assumption is false for libpcre-3 version 8.31-2ubuntu2.
---
 src/pcresearch.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 2a01e6d..4e2ccf8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -157,14 +157,16 @@ Pexecute (char const *buf, size_t size, size_t 
*match_size,
   /* Treat encoding-error bytes as data that cannot match.  */
   for (;;)
 {
+  int valid_bytes;
   e = pcre_exec (cre, extra, p, line_end - p, 0, options, sub, nsub);
   if (e != PCRE_ERROR_BADUTF8)
 break;
-  e = pcre_exec (cre, extra, p, sub[0], 0,
+  valid_bytes = sub[0];
+  e = pcre_exec (cre, extra, p, valid_bytes, 0,
  options | PCRE_NO_UTF8_CHECK, sub, nsub);
   if (e != PCRE_ERROR_NOMATCH)
 break;
-  p += sub[0] + 1;
+  p += valid_bytes + 1;
   options = PCRE_NOTBOL;
 }
 
-- 
1.9.3

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert


Norihiro Tanaka wrote:

I'm worried that to re-run for invalid UTF-8 makes slowness for searching
of the large number of binary files.


Yes, that could be a problem, but even so it's better for grep to report 
matches than to give up and fail.  Perhaps someone could optimize this 
better later, but to be honest given how flaky libpcre is we're probably 
better off spending our scarce development resources elsewhere.


Santiago's latest patch still had some troubles, unfortunately.  It 
could mishandle '^' by having it match just past an encoding error.  It 
was less efficient than it could be, as it checked all valid bytes for 
UTF-8-edness twice.  If I understand PCRE correctly (which quite 
possibly I don't), it also appeared to mishandle matches that contain 
nested subexpressions.  But the worst part was that the code was too 
complicated (and this was true even before Santiago's patch was 
applied).  So I rewrote it and installed the attached patch instead. 
Please give it a try.
From 29855e7bbe47b91680ae0cba5729c5becfaa3216 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Tue, 9 Sep 2014 12:41:54 -0700
Subject: [PATCH] grep: -P now treats invalid UTF-8 input as non-matching

Problem reported by Santiago Vila in: http://bugs.gnu.org/18266
* NEWS: Mention this.
* src/pcresearch.c (Pexecute): Treat UTF-8 encoding errors
as non-matching data, instead of exiting 'grep'.
* tests/pcre-infloop: grep now exits with status 1, not 2.
* tests/pcre-invalid-utf8-input: grep now exits with status 0, not 2.
---
 NEWS  |  3 ++
 src/pcresearch.c  | 70 +--
 tests/pcre-infloop|  2 +-
 tests/pcre-invalid-utf8-input |  2 +-
 4 files changed, 33 insertions(+), 44 deletions(-)

diff --git a/NEWS b/NEWS
index 550bf4c..ca79525 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,9 @@ GNU grep NEWS-*- outline -*-
 
   Performance has improved for very long strings in patterns.
 
+  grep -P no longer reports an error and exits when given invalid UTF-8 data.
+  Instead, it considers the data to be non-matching.
+
 ** Bug fixes
 
   grep -E rejected unmatched ')', instead of treating it like '\)'.
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 820dd00..2a01e6d 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -136,34 +136,41 @@ Pexecute (char const *buf, size_t size, size_t 
*match_size,
 #else
   /* This array must have at least two elements; everything after that
  is just for performance improvement in pcre_exec.  */
-  int sub[300];
+  enum { nsub = 300 };
+  int sub[nsub];
 
-  const char *line_buf, *line_end, *line_next;
+  char const *p = start_ptr ? start_ptr : buf;
+  int options = p == buf || p[-1] == eolbyte ? 0 : PCRE_NOTBOL;
+  char const *line_start = buf;
   int e = PCRE_ERROR_NOMATCH;
-  ptrdiff_t start_ofs = start_ptr ? start_ptr - buf : 0;
+  char const *line_end;
 
   /* PCRE can't limit the matching to single lines, therefore we have to
  match each line in the buffer separately.  */
-  for (line_next = buf;
-   e == PCRE_ERROR_NOMATCH  line_next  buf + size;
-   start_ofs -= line_next - line_buf)
+  for (; p  buf + size; p = line_start = line_end + 1)
 {
-  line_buf = line_next;
-  line_end = memchr (line_buf, eolbyte, (buf + size) - line_buf);
-  if (line_end == NULL)
-line_next = line_end = buf + size;
-  else
-line_next = line_end + 1;
-
-  if (start_ptr  start_ptr = line_end)
-continue;
+  line_end = memchr (p, eolbyte, buf + size - p);
 
-  if (INT_MAX  line_end - line_buf)
+  if (INT_MAX  line_end - p)
 error (EXIT_TROUBLE, 0, _(exceeded PCRE's line length limit));
 
-  e = pcre_exec (cre, extra, line_buf, line_end - line_buf,
- start_ofs  0 ? 0 : start_ofs, 0,
- sub, sizeof sub / sizeof *sub);
+  /* Treat encoding-error bytes as data that cannot match.  */
+  for (;;)
+{
+  e = pcre_exec (cre, extra, p, line_end - p, 0, options, sub, nsub);
+  if (e != PCRE_ERROR_BADUTF8)
+break;
+  e = pcre_exec (cre, extra, p, sub[0], 0,
+ options | PCRE_NO_UTF8_CHECK, sub, nsub);
+  if (e != PCRE_ERROR_NOMATCH)
+break;
+  p += sub[0] + 1;
+  options = PCRE_NOTBOL;
+}
+
+  if (e != PCRE_ERROR_NOMATCH)
+break;
+  options = 0;
 }
 
   if (e = 0)
@@ -180,10 +187,6 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
   error (EXIT_TROUBLE, 0,
  _(exceeded PCRE's backtracking limit));
 
-case PCRE_ERROR_BADUTF8:
-  error (EXIT_TROUBLE, 0,
- _(invalid UTF-8 byte sequence in input));
-
 default:
   /* For now, we lump all remaining PCRE failures into this basket.
  If anyone cares to provide sample grep usage that can

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert


Norihiro Tanaka wrote:

I see that new version has no response for following test which was used
previously.

 printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b'



Thanks for reporting that.  The test case works for me (Fedora 20 
x86-64, GCC 4.9.1):


$ printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' | od -c
000 200   a   b  \n
004

Fedora 20 is using pcre version 8.33-6.fc20; perhaps there's a PCRE 
version dependency here?  Can you use GDB to put a breakpoint on 
pcre_exec and see what values it's returning, and what it's storing into 
sub[0] and sub[1]?  Here's what I see (I compiled grep with '-g3 -O0'):


$ printf '\x80ab\n' in
$ gdb src/grep
...
(gdb) b pcre_exec
...
(gdb) r -P '.?b' in
...
(gdb) fin
...
(gdb) n
...
(gdb) p e
$1 = -10
(gdb) c
...
(gdb) fin
...
(gdb) n
...
(gdb) p e
$2 = -1
(gdb) c
...
(gdb) fin
...
(gdb) n
...
(gdb) p e
$3 = 1
(gdb) p sub[0]
$4 = 0
(gdb) p sub[1]
$5 = 2
(gdb) p p
$6 = 0x62f001 ab\n
(gdb) p buf
$7 = 0x62f000 \200ab\n


That is, the first call to pcre_exec reports the encoding error, the 
second one (on the empty string) reports no match, and the third one (on 
ab) finds the match.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#760861: bug#18428: Bug#760861: bug#18428: coreutils binary breaks coreutils documentation

2014-09-08 Thread Paul Eggert


Pádraig Brady wrote:

So we'll stick with the longer form
(which is likely to be cut n pasted in any case)


While this sounds like a win, I still like the idea of renaming the 
troublesome info node, as there is a lot of advice out there to use the 
old forms for 'info' and it's probably better to support that advice, at 
least for a while, than to make it immediately stop working.


I noticed other problems that are at least somewhat related to the 
recent coreutils multi-binary executable changes, and fixed some of 
these problems with the attached patches.  (I ran out of energy before 
fixing the rest.  :-)  Patch 2 renames the troublesome node.


Come to think of it, how about removing the 'coreutils' command 
entirely?  Why should users invoke 'coreutils' directly?  We could move 
it to libexec and remove it from the documentation.
From 2f40bf03ecb3637625cec578371f23dcae8fc1af Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Mon, 8 Sep 2014 11:40:39 -0700
Subject: [PATCH 1/4] doc: mention which commands are optional

* doc/coreutils.texi (coreutils invocation, df invocation)
(stty invocation, whoami invocation, nproc invocation)
(arch invocation, hostname invocation, hostid invocation)
(uptime invocation, chroot invocation, nice invocation)
(stdbuf invocation): Document that the command is installed
optionally.
---
 doc/coreutils.texi | 49 +
 1 file changed, 49 insertions(+)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 0178f60..14ee3b0 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1516,6 +1516,9 @@ or by explicitly calling @command{coreutils} with the
 coreutils @option{--coreutils-prog=PROGRAM} @dots{}
 @end example
 
+The @command{coreutils} command is not installed by default, so
+portable scripts should not rely on its existence.
+
 @node Output of entire files
 @chapter Output of entire files
 
@@ -11434,6 +11437,9 @@ Ignored; for compatibility with System V versions of 
@command{df}.
 
 @end table
 
+@command{df} is installed only on systems that have usable mount tables,
+so portable scripts should not rely on its existence.
+
 @exitstatus
 Failure includes the case where no output is generated, so you can
 inspect the exit status of a command like @samp{df -t ext3 -t reiserfs
@@ -13841,6 +13847,10 @@ systems, those or other settings also may not
 be available, but it's not feasible to document all the variations: just
 try it and see.
 
+@command{stty} is installed only on platforms with the POSIX terminal
+interface, so portable scripts should not rely on its existence on
+non-POSIX platforms.
+
 @exitstatus
 
 @menu
@@ -14760,6 +14770,10 @@ that file instead.  A common choice is 
@file{/var/log/wtmp}.
 The only options are @option{--help} and @option{--version}.  @xref{Common
 options}.
 
+The @command{users} command is installed only on platforms with the
+POSIX @code{utmpx.h} include file or equivalent, so portable scripts
+should not rely on its existence on non-POSIX platforms.
+
 @exitstatus
 
 
@@ -14908,6 +14922,10 @@ After each login name print a character indicating the 
user's message status:
 
 @end table
 
+The @command{who} command is installed only on platforms with the
+POSIX @code{utmpx.h} include file or equivalent, so portable scripts
+should not rely on its existence on non-POSIX platforms.
+
 @exitstatus
 
 
@@ -15641,6 +15659,9 @@ arch [@var{option}]
 
 The program accepts the @ref{Common options} only.
 
+@command{arch} is not installed by default, so portable scripts should
+not rely on its existence.
+
 @exitstatus
 
 
@@ -15832,6 +15853,10 @@ hostname [@var{name}]
 The only options are @option{--help} and @option{--version}.  @xref{Common
 options}.
 
+@command{hostname} is not installed by default, and other packages
+also supply a @command{hostname} command, so portable scripts should
+not rely on its existence or on the exact behavior documented above.
+
 @exitstatus
 
 
@@ -15857,6 +15882,10 @@ On that system, the 32-bit quantity happens to be 
closely
 related to the system's Internet address, but that isn't always
 the case.
 
+@command{hostid} is installed only on systems that have the
+@code{gethostid} function, so portable scripts should not rely on its
+existence.
+
 @exitstatus
 
 @node uptime invocation
@@ -15890,6 +15919,13 @@ also include processes in the uninterruptible sleep 
state (that is,
 those processes which are waiting for disk I/O).  The Linux kernel
 includes uninterruptible processes.
 
+@command{uptime} is installed only on platforms with infrastructure
+for obtaining the boot time, and other packages also supply an
+@command{uptime} command, so portable scripts should not rely on its
+existence or on the exact behavior documented above.
+
+@exitstatus
+
 @node SELinux context
 @chapter SELinux context
 
@@ -16203,6 +16239,10 @@ files to the required positions under your intended 
new root directory.
 Finally, if the executable requires any

Bug#758105: grep -P and invalid exits with error

2014-09-01 Thread Paul Eggert


Vincent Lefevre wrote:


   [...] Note that this option can also be passed to pcre_exec()
   and pcre_dfa_exec(), to suppress the validity checking of
   subject strings only. If the same string is being matched
   many times, the option can be safely set for the second and
   subsequent matchings to improve performance.

The last sentence would imply that the UTF8 checking is done on the
whole input buffer before matching is done.


That's pretty subtle, and perhaps too subtle.  A plausible 
interpretation of the phrase same string is being matched is that 
libpcre checks only the matched string, and that bytes after the match 
(which did not need to be examined to do the match) are not checked. 
Can you confirm with the libpcre authors that this plausible 
interpretation is incorrect, i.e., that the entire input string is 
checked, even the unmatched part?  If that's what is intended, the 
documentation should state so clearly, so at least there's a 
documentation bug there.



If there are many invalid UTF8 bytes, this would be slow, IMHO


That's OK.  We don't need grep -P to be fast on invalid input.


But is the copy of the buffer really needed? Couldn't the invalid
UTF8 sequences just be replaced by null bytes?


I'd rather not, because that changes the semantics of matching.  The 
null byte is valid input data that might get matched.



in case of invalid UTF8 bytes, in some (many?) cases, the
cause is a binary file (possibly with some text in it), where lines
can be very long. So, wouldn't it mean that it can take significantly
more memory?


Sure.  But that's the same for -P as it is for plain grep.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: grep -P and invalid exits with error

2014-08-29 Thread Paul Eggert

Thanks, but that patch seems to depend on libpcre internals, in that it 
knows that pcre_exec cannot possibly succeed without first checking 
its entire input buffer for invalid UTF-8 bytes.  Even if that's true 
now, it reflects a performance bug that might be fixed in a future 
libpcre version.


Also, I don't see why grep needs to copy the buffer when there's an 
encoding error.  Why not simply rerun the matcher on the initial prefix 
that doesn't have an encoding-error byte, and then (if that doesn't find 
a match), try matching the suffix after the encoding-error byte?  This 
approach would not only avoid the buffer copy, it would avoid knowledge 
of libpcre internals.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Paul Eggert


Santiago wrote:

Another solution would be to don't check if binary files are valid
(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd
avoid security holes


It wouldn't.  (We already tried it.)


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert


Santiago wrote:

Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373


That commit was necessary to avoid undefined behavior in libpcre.  We 
can't simply undo the commit (unless you want to reintroduce security 
holes into grep :-).  The current behavior is the best we can do, unless 
someone fixes libpcre (which doesn't appear to be likely), or unless 
someone takes the time to write code in grep to work around the problem.


One way forward is suggested in http://bugs.gnu.org/17245#43.  No 
doubt there are others.  Can you suggest a volunteer to take this on?



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert


Vincent Lefevre wrote:


it would be better to replace invalid UTF-8 sequences by
zero bytes before passing them to libpcre. Is it allowed to do
that in Pexecute()?


Sorry, I don't know.  I was hoping that the volunteer (whoever it is) 
could figure all this stuff out.


grep should work correctly even if the input contains NUL bytes, so 
perhaps it would be better to replace an invalid byte by the UTF-8 
sequence for U+FFFD REPLACEMENT CHARACTER, as that's one standard way to 
deal with this problem.  Or perhaps the volunteer will have a better idea.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert


Vincent Lefevre wrote:

The problem with this solution is that it would change the length
of the text, while replacing invalid bytes by zero bytes could be
done in place (if allowed), with very little change of the code,
I think.


True.  Though it might be more user-friendly to use '?' as the 
replacement byte.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert


Vincent Lefevre wrote:

On input, using null bytes may be better if one wants to be able to
match real replacement characters without false positives.


Maybe, though this is no place to get fancy.  It's simple to tell users 
an invalid byte acts like '?'.  Simple is good.


Anyway, this is a matter for the implementing volunteer to decide, 
whoever that happens to be.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#602162: /usr/share/zoneinfo/Australia/Sydney: DST indistinguishable

2014-08-07 Thread Paul Eggert

This bug is fixed in the recently-released 2014f release of the tz 
database, and when that release propagates into Debian you should be 
able to close this bug report.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#736919: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

2014-04-16 Thread Paul Eggert


Jim Meyering wrote:

This bug is still present in upstream libpcre version 8.35.


Ah, sorry, I thought it was Debian-specific.  I've reopened grep bug 
16586 http://bugs.gnu.org/16586, and have forwarded it to Philip 
Hazel, who currently has the PCRE bug assigned, according to 
http://bugs.exim.org/1468.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#736919: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

2014-04-15 Thread Paul Eggert


Santiago wrote:

it was a debian-pcre-specific bug.


Thanks, closing the bug upstream.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#738546: bug#16713: [Jakub Wilk] Bug#738546: typo in gzip(1) manpage: syncronizing - synchronizing

2014-02-10 Thread Paul Eggert


On 02/10/2014 07:37 AM, Bdale Garbee wrote:

A user of my Debian packaging of gzip points to a typo in the man page.


That's not in the upstream version, so I'm taking the liberty of closing 
the upstream bug report.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#720352: bug#15147: tr: crash upon failed write(2)

2013-08-20 Thread Paul Eggert

I can reproduce the problem without coreutils
on Ubuntu 13.04 x86-64.  Compile the following
program with plain gcc foo.c and then run
./a.out /dev/full; it'll dump core the same way.
So it appears that this is a bug in the C library,
not in coreutils.

It's a fairly serious bug, I'd say.

#include stdio.h
int
main (void)
{
  static char io_buf[BUFSIZ];
  if (fwrite (io_buf, 1, sizeof io_buf, stdout) != sizeof io_buf)
{
  perror (write error);
  return 1;
}
  return 0;
}


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#452365: [Bug-tar] symlink-eating bug: the reproducer

2013-05-30 Thread Paul Eggert

Thanks, I just now reproduced the problem with tar 1.26 but could
not reproduce it with the latest git version.

this appears to be similar to this bug:

http://lists.gnu.org/archive/html/bug-tar/2011-06/msg0.html

which was fixed in tar upstream, here:

http://lists.gnu.org/archive/html/bug-tar/2011-06/msg1.html

Maybe you could backport that fix to Debian?


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#598018: bug#12947: [brl...@debian.org: Bug#598018: install: temporary insecure file permissions]

2012-11-20 Thread Paul Eggert

Thanks, I installed this patch into the coreutils master branch,
and I'm marking the upstream coreutils bug as done.

From 7ee71d9ddad1435bbea00779bcd4c62482ea3473 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Tue, 20 Nov 2012 13:15:34 -0800
Subject: [PATCH] install: fix security race

* src/copy.c (copy_internal): Use DST_MODE_BITS, not SRC_MODE.
See Bernhard R. Link in http://bugs.gnu.org/12947 and in
http://bugs.debian.org/598018.
---
 src/copy.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 16aed03..7a35414 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -2394,8 +2394,13 @@ copy_internal (char const *src_name, char const 
*dst_name,
   /* POSIX says the permission bits of the source file must be
  used as the 3rd argument in the open call.  Historical
  practice passed all the source mode bits to 'open', but the extra
- bits were ignored, so it should be the same either way.  */
-  if (! copy_reg (src_name, dst_name, x, src_mode  S_IRWXUGO,
+ bits were ignored, so it should be the same either way.
+
+ This call uses DST_MODE_BITS, not SRC_MODE.  These are
+ normally the same, and the exception (where x-set_mode) is
+ used only by 'install', which POSIX does not specify and
+ where DST_MODE_BITS is what's wanted.  */
+  if (! copy_reg (src_name, dst_name, x, dst_mode_bits  S_IRWXUGO,
   omitted_permissions, new_dst, src_sb))
 goto un_backup;
 }
-- 
1.7.11.7


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#693463: zdiff man page contains incorrect info about /tmp

2012-11-16 Thread Paul Eggert

Package: gzip
Version: 1.5-1.1

Debian has added the following text to zdiff.1, but
this addition is no longer correct, as /tmp is not used
in this case on Debian in gzip 1.5.  Can you please remove this
from the zdiff man page?  Thanks.

.P
When both files must be uncompressed before comparison, the second is
uncompressed to
.IR /tmp .
In all other cases,
.IR zdiff  and  zcmp
use only a pipe.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#647522: non-deterministic compression results with gzip -n9

2012-03-18 Thread Paul Eggert

Cyril, thanks for the test case.  When I used 'valgrind' on it
I found where gzip is accessing uninitialized data.  I pushed
into gzip master the patch at the end of this message; it fixed
things for me.

The Debian patch, which zeros out a lot of buffers, should
work if gzip is compressing regular files, but may have
problems in unusual cases if gzip compresses data from
pipes, devices, or other non-regular files, because in that
case short reads may later cause garbage to be put into the
dictionary.  So I suggest using the following patch instead.

http://git.savannah.gnu.org/cgit/gzip.git/commit/?id=0a284baeaedca68017f46d2646e4c921aa98a90d

From b9de47462b1b487cf4024b4c157ee5ac6c5849c3 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Sun, 18 Mar 2012 11:07:02 -0700
Subject: [PATCH] gzip: fix nondeterministic compression results

Reported by Jakub Wilk in http://bugs.debian.org/647522.
* deflate.c (fill_window): Don't let garbage pollute the dictionary.
---
 deflate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/deflate.c b/deflate.c
index 6c19552..5405f10 100644
--- a/deflate.c
+++ b/deflate.c
@@ -571,6 +571,8 @@ local void fill_window()
 n = read_buf((char*)window+strstart+lookahead, more);
 if (n == 0 || n == (unsigned)EOF) {
 eofile = 1;
+/* Don't let garbage pollute the dictionary.  */
+memzero (window + strstart + lookahead, MIN_MATCH - 1);
 } else {
 lookahead += n;
 }
-- 
1.7.6.5





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#647522: non-deterministic compression results with gzip -n9

2012-02-08 Thread Paul Eggert

Thanks very much for the patch.  But can someone who's looked into it
please explain why 'window' needs to be zeroed out?  This will save
me time in reviewing the patch.  Thanks.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#647522: non-deterministic compression for CREDITS.gz in libppl9 amd64 armel

2012-02-06 Thread Paul Eggert

I can't reproduce the problem on x86-64 with vanilla
gzip 1.4 and vanilla gzip 1.3.12.  So the problem appears to be
either architecture-dependent, or it's a property of
the Debian patches to gzip, or something like that, and
I expect we'll need more information about how to
reproduce the problem.  It looks like the problem is with
1.3.12-9 on armel so you might want to focus your attention
there.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#655118: bug#10592: Bug#655118: Please enabled hardened build flags

2012-01-30 Thread Paul Eggert

I am not observing this problem in the Emacs trunk, with either
GCC 4.6.2 or GCC 4.7.0 20120127 (experimental), when I compile
with -Wformat -Wformat-security.  I suspect the problem has
already been fixed in the trunk in a different way, by
rewriting movemail to use prototypes.  I'm therefore taking
the liberty of marking this bug as fixed in the Emacs bug
database; please feel free to reopen it if I've misunderstood
the situation.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#653073: [Pkg-sysvinit-devel] Bug#653073: bug#10363: /etc/mtab - /proc/mounts symlink affects df(1) output for

2012-01-19 Thread Paul Eggert

On 01/19/12 07:29, Henrique de Moraes Holschuh wrote:
 Note: there is no reason why the kernel could not return the mount
 information with shadowed paths removed in a separate procfs node, as
 that would cause no security/troubleshooting problems.

That's what I was thinking of, and it'd be a much better fix,
as it would fix things for all applications.

The current approach expects all app developers to modify
their applications in order to deal with a feature that app
developers typically don't know about and don't understand;
this isn't a good way to introduce a new feature.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#653073: bug#10363: [Pkg-sysvinit-devel] Bug#653073: bug#10363: /etc/mtab - /proc/mounts symlink affects df(1) output for

2012-01-19 Thread Paul Eggert

On 01/19/12 08:30, Henrique de Moraes Holschuh wrote:
 On the app side, I will tell you what you're likely to get back from the
 crowd on LKML:  write a proper BSD/MIT/LGPL library

This argument would have stronger force if there were real code in
a real application, code that solved the overall problem -- code
that we could read and run.  I don't know of any such code.

 the kernel is not in any better position to remove shadowed paths
 than userspace, both are perfectly capable of doing it.

This seems to contradict an earlier comment made by someone else,
So at the moment is a bit of a guess which entries are real and which
are obscured. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10363#53

I don't know who's right, nor do I understand what all the underlying
issues are.  I expect most other app developers are in a similar boat.
It's not a good situation to be in.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#653073: bug#10363: /etc/mtab - /proc/mounts symlink affects df(1) output for

2012-01-18 Thread Paul Eggert

On 01/18/12 06:25, Goswin von Brederlow wrote:
 What df should do is automatically skip the entries that are obscured or
 generally inaccessible.

Isn't this missing some of the larger context?  df is just doing what
lots of other programs do: finding out what file systems one has,
and reporting statistics on them.  It sounds suboptimal to require
the maintainers of all these programs (coreutils, nautilus, etc.)
to rewrite their apps to deal with obscured entries.  Surely it would
be better to have the kernel ordinarily return just the ordinary entries,
and to return obscured entries only when they are specially requested.
That way, this issue would be isolated to the few bits of code that really
want to see obscured entries.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#647522: non-deterministic compression results with gzip -n9

2011-12-09 Thread Paul Eggert

On 12/08/11 02:27, Riku Voipio wrote:
 According to gzip RFC, the last 4 bytes are ISIZE, which should be
 uncompressed input size. Which leaves me rather baffled how that can
 differ on same input files - and how gunzip is completly happy with
 both versions of compressed file, producing the same output.

I looked only at NEWS.amd64.gz and NEWS.armel.gz.  For those two files
your diagnosis does not seem to be right, as these two files do not
differ in the last 4 bytes, but in bytes before then:

$ od -tx1 NEWS.amd64.gz NEWS.amd64.gz.od
$ od -tx1 NEWS.armel.gz NEWS.armel.gz.od
$ diff -u NEWS.*.gz.od
--- NEWS.amd64.gz.od2011-12-09 12:03:41.090594754 -0800
+++ NEWS.armel.gz.od2011-12-09 12:03:57.298663371 -0800
@@ -317,6 +317,6 @@
 0011700 fa 9f da 9b 92 ad 57 44 19 45 c5 42 e5 b6 d9 c2
 0011720 7e 80 02 bd 58 94 33 74 ba 0a 62 24 52 7b 35 33
 0011740 b2 87 51 76 b7 af cc 7f 09 b0 2d 14 d6 8d f9 4d
-0011760 94 51 39 49 cd 87 2e ff 0b 5a 6c 8d f6 80 35 00
+0011760 94 51 39 49 cd f7 50 ff 17 5a 6c 8d f6 80 35 00
 0012000 00
 0012001

So the differences are not in ISIZE.  Can you track down
what's actually differing and why?  (That would save some
time for me when debugging)  Thanks.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#647522: non-deterministic compression results with gzip -n9

2011-12-09 Thread Paul Eggert

I should add that it's OK (from the point of view of
the RFCs) if gzip produces different outputs given the same
inputs when compressing.  The RFCs allow that and presumably
other gzip implementations do that.  All that's required is
that compress+decompress result in a copy of the original.

That being said, it's nicer if gzip is deterministic and it'd
be helpful to to get to the bottom of this and make
the nondeterminism go away in future versions of gzip.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#136231: [Bug-tar] Failure with --owner and --group when names cannot be mapped to IDs

2011-08-13 Thread Paul Eggert

On 08/08/2011 03:28 AM, Thayne Harbaugh wrote:

 Attached is a patch that allows archives to be created with
 arbitrary owner or group names.

Thanks for the bug report and patch; I was unaware of the problem.
This runs into another area that I'd been meaning to enhance for some
time: tar doesn't let you specify both user name and number (only one
or the other), and similarly for groups.  I wrote and installed the
following patch into GNU tar, to address both enhancement requests
simultaneously.  Please give it a try when you have the time.

tar: --owner and --group names and numbers
The --owner and --group options now accept operands of the form
NAME:NUM, so that you can specify both symbolic name and numeric
ID for owner and group.  Also, in these options, NAME no longer
needs to be present in the current host's user and group
databases; this implements Debian enhancement request 136231
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=136231 reported
by Mark W. Eichin, communicated by Thayne Harbaugh to bug-tar in
http://lists.gnu.org/archive/html/bug-tar/2011-08/msg1.html.
* NEWS, doc/tar.texi (Option Summary, override): Document enhancement.
* src/common.h (group_name_option, owner_name_option): New decls.
* src/create.c (start_header): Don't assume owner and group names
are in current host database.
* src/tar.c (parse_owner_group): New function, for parsing NAME:NUM.
(parse_opt): Use it.
(decode_options): Initialize owner_name_option, group_name_option.
* tests/owner.at: New file, to test this enhancement.
* tests/Makefile.am (TESTSUITE_AT): Add it.
* tests/testsuite.at: Include it.
diff --git a/NEWS b/NEWS
index 12c1dd6..06955f4 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,18 @@
-GNU tar NEWS - User visible changes. 2011-03-12
+GNU tar NEWS - User visible changes. 2011-08-13
 Please send GNU tar bug reports to bug-...@gnu.org

 
+version ?.? - ?, 201?-??-??
+
+* New features
+
+** --owner and --group names and numbers
+
+The --owner and --group options now accept operands of the form
+NAME:NUM, so that you can specify both symbolic name and numeric ID
+for owner and group.  In these options, NAME no longer needs to be
+present in the current host's user and group databases.
+
 version 1.26 - Sergey Poznyakoff, 2011-03-12

 * Bugfixes
diff --git a/doc/tar.texi b/doc/tar.texi
index 357c8c1..48e8c3c 100644
--- a/doc/tar.texi
+++ b/doc/tar.texi
@@ -2713,9 +2713,9 @@ tutorial}).
 @item --group=@var{group}

 Files added to the @command{tar} archive will have a group @acronym{ID} of 
@var{group},
-rather than the group from the source file.  @var{group} is first decoded
-as a group symbolic name, but if this interpretation fails, it has to be
-a decimal numeric group @acronym{ID}.  @xref{override}.
+rather than the group from the source file.  @var{group} can specify a
+symbolic name, or a numeric @acronym{ID}, or both as
+@var{name}:@var{id}.  @xref{override}.

 Also see the comments for the @option{--owner=@var{user}} option.

@@ -3082,8 +3082,8 @@ from an archive.  @xref{Overwrite Old Files}.

 Specifies that @command{tar} should use @var{user} as the owner of members
 when creating archives, instead of the user associated with the source
-file.  @var{user} is first decoded as a user symbolic name, but if
-this interpretation fails, it has to be a decimal numeric user @acronym{ID}.
+file.  @var{user} can specify a symbolic name, or a numeric
+@acronym{ID}, or both as @var{name}:@var{id}.
 @xref{override}.

 This option does not affect extraction from archives.
@@ -4947,8 +4947,22 @@ tar: Option --mtime: Treating date `yesterday' as 
2006-06-20

 Specifies that @command{tar} should use @var{user} as the owner of members
 when creating archives, instead of the user associated with the source
-file.  The argument @var{user} can be either an existing user symbolic
-name, or a decimal numeric user @acronym{ID}.
+file.
+
+If @var{user} contains a colon, it is taken to be of the form
+@var{name}:@var{id} where a nonempty @var{name} specifies the user
+name and a nonempty @var{id} specifies the decimal numeric user
+@acronym{ID}.  If @var{user} does not contain a colon, it is taken to
+be a user number if it is one or more decimal digits; otherwise it is
+taken to be a user name.
+
+If a name is given but no number, the number is inferred from the
+current host's user database if possible, and the file's user number
+is used otherwise.  If a number is given but no name, the name is
+inferred from the number if possible, and an empty name is used
+otherwise.  If both name and number are given, the user database is
+not consulted, and the name and number need not be valid on the
+current host.

 There is no value indicating a missing number, and @samp{0} usually means
 @code{root}.  Some people like to force @samp{0} as the value to offer in
@@ -4971,8 +4985,9 @@ $ @kbd{tar -c -f archive.tar --owner=root .}
 @opindex group

 Files added to the @command{tar} archive will have a group @acronym{ID} of 
@var{group},

Bug#605639: bug#7529: Bug#605639: deal better with different filesystem timestamp resolutions

2010-12-03 Thread Paul Eggert

On 12/03/10 02:03, Jim Meyering wrote:

 Would you mind adding a Bug fixes entry for this
 in coreutils' NEWS file?  It'd be nice to commit that
 along with an update of the gnulib submodule to the latest.

Sure, done, with this notice:

  cp -u no longer does unnecessary copying merely because the source
  has finer-grained time stamps than the destination.

 As for a test, it shouldn't be too hard to create a root-only test
 on linux/gnu systems, since _PC_TIMESTAMP_RESOLUTION is not defined.
 Create two loop-mounted file systems of types that have the desired
 difference in time stamp resolution, and run commands like Dan did.

Hmm, well, I can see a lot going wrong with that, such as garbage in the
mount table if the test is interrupted.  (Also, there's the little problem
that I lack root access on the hosts that I do builds on these days: does
that get me off the hook?  :-)




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#605639: bug#7529: Bug#605639: deal better with different filesystem timestamp resolutions

2010-12-01 Thread Paul Eggert

Good eye!  Thanks for the bug report and example.  I installed
the following one-byte patch into gnulib; please give it a try.
It should propagate into coreutils the next time coreutils
updates from gnulib.

A test case for this would require two file systems, one with
finer-grained time stamps than the other, where we can create
files in the latter.  I suspect this goes beyond what coreutils's
test cases can easily do.

From 409c6b774c25afce33f8b67fbf7af3eb3304f6cf Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Wed, 1 Dec 2010 21:25:56 -0800
Subject: [PATCH] utimecmp: fine-grained src to nearby coarse-grained dest

* lib/utimecmp.c (utimecmp): When UTIMECMP_TRUNCATE_SOURCE is set,
and the source is on a file system with higher-resolution time
stamps, than the destination, and _PC_TIMESTAMP_RESOLUTION does
not work, and the time stamps are close together, the algorithm to
determine the exact resolution from the read-back mtime was buggy:
it had a != where it should have had an ==.  This bug has been
in the code ever since it was introduced to gnulib.
Problem reported by Dan Jacobson in
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7529.
---
 ChangeLog  |   14 ++
 lib/utimecmp.c |2 +-
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d4eb684..67e2977 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2010-12-01  Paul Eggert  egg...@cs.ucla.edu
+
+   utimecmp: fine-grained src to nearby coarse-grained dest
+
+   * lib/utimecmp.c (utimecmp): When UTIMECMP_TRUNCATE_SOURCE is set,
+   and the source is on a file system with higher-resolution time
+   stamps, than the destination, and _PC_TIMESTAMP_RESOLUTION does
+   not work, and the time stamps are close together, the algorithm to
+   determine the exact resolution from the read-back mtime was buggy:
+   it had a != where it should have had an ==.  This bug has been
+   in the code ever since it was introduced to gnulib.
+   Problem reported by Dan Jacobson in
+   http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7529.
+
 2010-11-30  Bruno Haible  br...@clisp.org
 
strerror_r-posix: Fix autoconf test.
diff --git a/lib/utimecmp.c b/lib/utimecmp.c
index 63a0c9a..8c3ca65 100644
--- a/lib/utimecmp.c
+++ b/lib/utimecmp.c
@@ -325,7 +325,7 @@ utimecmp (char const *dst_name,
 
 res = SYSCALL_RESOLUTION;
 
-for (a /= res; a % 10 != 0; a /= 10)
+for (a /= res; a % 10 == 0; a /= 10)
   {
 if (res == BILLION)
   {
-- 
1.7.2




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#602907: [Bug-tar] need help with tar output stability problem

2010-11-10 Thread Paul Eggert

I cannot reproduce the problem that you observe.
With tar 1.24 and 1.25, I get the
same 60753920-byte file that you generate with tar 1.23.
I also get the same file if I use tar 1.15.1. I am building
tar on RHEL 5.5 (x86-64) with GCC 4.5.1; except that the 1.15.1
tar is that of RHEL 5.5 itself.

But between tar 1.23 and 1.24, tar began to
output smaller tarballs for some inputs than it had before, which
while surely a good thing in general unfortunatly breaks pristine-tar.

-rw-r--r-- 1 joey joey 60753920 Nov 10 03:03 1.23/recreatetarball
-rw-r--r-- 1 joey joey 60856320 Nov 10 17:38 1.24/recreatetarball

There seems to be some kind of confusion here, as the 1.24
tarball that you generate is larger than than the 1.23 tarball you
generate. But you write that it is smaller.

You might want to look at your setup to see why your results differ
from mine. I do notice that the tarballs differ in their representation
of this file:

icedove-l10n-3.1.6/upstream/af/chrome/locale/af/messenger/addressbook/replicationProgress.properties

This file name is exactly 100 bytes long, and if I recall, that used
to be a problem area in GNU tar. The 1.24 tarball contains a longlink
representation of the file (which isn't right), whereas the 1.23 tarball
is right.

Perhaps you got a prerelease version of 1.24, with bugs? Did you build
1.24 from sources yourself? Have you tried with 1.25? What platform
are you running GNU tar on, and how do you build GNU tar?

--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#602907: [Bug-tar] need help with tar output stability problem

2010-11-10 Thread Paul Eggert

On 11/10/2010 10:14 PM, Bdale Garbee wrote:
 On Wed, 10 Nov 2010 21:59:03 -0800, Paul Eggert egg...@cs.ucla.edu wrote:
 This file name is exactly 100 bytes long, and if I recall, that used
 to be a problem area in GNU tar.  The 1.24 tarball contains a longlink
 representation of the file (which isn't right), whereas the 1.23 tarball
 is right.
 
 Ugh.  Smoking gun.  I was carrying a patch around for ages to try and
 work around this bug, which tripped a bug in dpkg for a while.  Finally
 realized it was no longer needed and removed it from my build of 1.24.

Ah, sorry, I'm a bit confused.  Is your theory that this age-old patch broke 
1.24?
If so, we don't need to do anything upstream.  If not, then please let us
know (for example, what patch is it, and what test case illustrates the
need for the patch).



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#587702: tar (rmt) hangs since update in lenny when using --rsh-command=\/usr\/bin\/ssh

2010-09-14 Thread Paul Eggert

This bug was also reported to bug-tar in
http://lists.gnu.org/archive/html/bug-tar/2010-09/msg00033.html
and a fix has been applied to the upstream sources
http://git.savannah.gnu.org/cgit/paxutils.git/commit/?id=3098be8a03bded997cfd3b43e92f1784eaeb4322
so the bug should be fixed in the next upstream release.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#586301: [bug-diffutils] Re: Bug#586301: diffutils: doubled slashes in --recursive output (path//path) (fwd)

2010-08-14 Thread Paul Eggert

On 08/14/10 23:46, Jim Meyering wrote:
 [PATCH] diff -r: avoid printing excess slashes in concatenated file names

Thanks, that looks good to me.  Hmm, at some point we should
replace zalloc with xzalloc too, I suppose, and maybe get
rid of diffutils' 'concat' function.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#497514: coreutils: chmod, chown, and chgrp change ctime even when no change was necessary

2008-09-11 Thread Paul Eggert

Erik Rossen [EMAIL PROTECTED] writes:

 And, if one wants to be REALLY pedantic, it looks like the file node is
 supposed to be changed each time.  For example, here is an extract:

As I read the spec, chown and chgrp are explicitly required to make
the equivalent of a chown() call, which in turn is required to change
the ctime.  However, chmod is not required to make the equivalent of a
chmod() call, and there is no requirement in the 'chmod' spec that it
change the ctime.  So POSIX allows the optimization for the 'chmod'
command, but not for the 'chown' and 'chgrp' commands.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#474436: coreutils: ls --time-style=locale no longer works

2008-04-07 Thread Paul Eggert

Bo Borgerson [EMAIL PROTECTED] writes:

 Is it
 possible to maintain an English translation with only LC_TIME info?

Yes.  I maintain an English translation by hand for GNU diffutils.
This is for LC_MESSAGES but a similar thing could be done for LC_TIME.

The diffutils message translations are to display appropriate symbols
for copyright (© rather than (C)) and for authors' names (Torbjörn
Granlund).  Also, it rewrites illegal to unrecognized in a context
where POSIX requires the word illegal even though the GNU project
prefers to reserve illegal for things that are actually illegal.
Perhaps coreutils could do something similar there, too, after the
next stable release goes out.

Bug#290727: coreutils: dd: Please support /dev/stdin as argument to if=

2008-03-31 Thread Paul Eggert

How about this patch?  It fixes the bug that was reported.  It does have
the downside of possibly failing with EMFILE when the current version would
not fail, but that is a minor drawback.

2008-03-30  Paul Eggert  [EMAIL PROTECTED]

* lib/fd-reopen.c: Work even if FILE is /dev/stdin.
Problem reported by Geoffrey Lee in http://bugs.debian.org/290727.
* tests/dd/misc: Check for this bug.

diff --git a/lib/fd-reopen.c b/lib/fd-reopen.c
index 2ce4678..c12fef6 100644
--- a/lib/fd-reopen.c
+++ b/lib/fd-reopen.c
@@ -1,6 +1,6 @@
 /* Invoke open, but return either a desired file descriptor or -1.
 
-   Copyright (C) 2005, 2006 Free Software Foundation, Inc.
+   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -31,15 +31,13 @@
 int
 fd_reopen (int desired_fd, char const *file, int flags, mode_t mode)
 {
-  int fd;
+  int fd = open (file, flags, mode);
 
-  close (desired_fd);
-  fd = open (file, flags, mode);
   if (fd == desired_fd || fd  0)
 return fd;
   else
 {
-  int fd2 = fcntl (fd, F_DUPFD, desired_fd);
+  int fd2 = dup2 (fd, desired_fd);
   int saved_errno = errno;
   close (fd);
   errno = saved_errno;
diff --git a/tests/dd/misc b/tests/dd/misc
index 9172582..2b54cfb 100755
--- a/tests/dd/misc
+++ b/tests/dd/misc
@@ -46,6 +46,13 @@ if dd oflag=append if=$tmp_in of=$tmp_out 2 /dev/null; then
   compare $tmp_in $tmp_out || fail=1
 fi
 
+case $(cat /dev/stdin $tmp_in 2/dev/null) in
+(data)
+  rm -f $tmp_out
+  dd if=/dev/stdin of=$tmp_out $tmp_in || fail=1
+  compare $tmp_in $tmp_out || fail=1
+esac
+
 if dd iflag=nofollow if=$tmp_in count=0 2 /dev/null; then
   dd iflag=nofollow if=$tmp_sym count=0 2 /dev/null  fail=1
 fi



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#467378: coreutils: Please include a program to truncate files

2008-02-25 Thread Paul Eggert

Jim Meyering [EMAIL PROTECTED] writes:

 If you don't mind truncating first, how about this?

 true  /var/spool/whatever/foo
 dd bs=1 seek=2G of=/var/spool/whatever/foo  /dev/null

Also, the latter command works even if the former command is omitted.
That is, by itself, that invocation of dd resizes
/var/spool/whatever/foo to 2 GiB, discarding or extending the file as
needed, which is what the original request asked for.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#461049: dd's reaction to a close of the output. (debian Bug#461049)

2008-01-31 Thread Paul Eggert

Following up on Debian bug 461049
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461049,
Nick Stoughton [EMAIL PROTECTED] writes:

 Note that last paragraph ... I believe that this does give dd permission
 to report its statistics on SIGPIPE.

It certainly does.  Wow.  That perform some additional processing
loophole is big enough to drive a truck through, though; as worded it
would let dd (say) execute rm -fr $HOME on receipt of SIGPIPE.

Surely there was intended to be _some_ limit on the additional
processing that utilities can do when they receive a random signal.
I would think that the intent was that this additional processing be
limited to cleanup actions (e.g., remove a temp file, or perhaps
restore the terminal state).  Printing statistics goes a bit beyond
that, and one could easily argue that it goes beyond what the standard
was intended to allow.

 Few of the systems that I have tried this on do produce statistics under
 these conditions. Those that do are running some old core-utils
 implementation of dd!

In 2005 I submitted the patch to coreutils dd to make it treat SIGPIPE
like all other known dd implementations do.  This was partly motivated
by my interpretation of POSIX, but it was also partly because I
couldn't see a good reason why coreutils dd would be incompatible with
all other dd implementations I knew of.

There is a similar issue with SIGQUIT, by the way.  Pre-2005 coreutils
'dd' treated SIGQUIT like SIGPIPE: that is, it printed statistics
before killing itself with SIGQUIT.  I don't view this as being
standard behavior either.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#432941: [Leo Moisio] Bug#432941: autoconf: autoreconf can't handle multi-line assignment to ACLOCAL_AMFLAGS

2007-07-13 Thread Paul Eggert

Thanks for reporting the problem.  I installed this doc fix for now.

* doc/autoconf.texi (autoreconf Invocation): Document ACLOCAL_AMFLAGS
limitation reported by Leo Moisio in
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=432941.
Index: doc/autoconf.texi
===
RCS file: /cvsroot/autoconf/autoconf/doc/autoconf.texi,v
retrieving revision 1.1162
diff -u -p -r1.1162 autoconf.texi
--- doc/autoconf.texi   26 Jun 2007 17:42:27 -  1.1162
+++ doc/autoconf.texi   13 Jul 2007 17:38:09 -
@@ -1665,6 +1665,9 @@ none,obsolete}.

 If you want @command{autoreconf} to pass flags that are not listed here
 on to @command{aclocal}, set @code{ACLOCAL_AMFLAGS} in your @file{Makefile.am}.
+Due to a limitation in the Autoconf implementation these flags currently
+must be set on a single line in @file{Makefile.am}, without any
+backslash-newlines.

 @c = Initialization and Output Files.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

1 2 >

1 - 100 of 193 matches

Mail list logo