from:"Jim Meyering"

Re: [coreutils] [PATCH] fix a typo in sort --parallel help message

2010-12-20 Thread Jim Meyering

Pádraig Brady wrote:
> Drats, I made a typo.
> OK to push this?
>
> commit 9ccc49f2a7b5daf86a0a8683c159798890ef1dcc
> Author: Pádraig Brady 
> Date:   Mon Dec 20 07:49:07 2010 +
>
> maint: fix a typo in sort --parallel help message
>
> Also fix up Chen Guo's emails
> * src/sort.c (usage): Add a missing "of"
> * THANKS: Add Chen Guo
> * .mailmap: Add Chen Guo's UCLA address

Sure.  Thanks!

[coreutils] [PATCH] build: update gnulib: ACL-vs-Solaris 8 portability etc.

2010-12-20 Thread Jim Meyering

FYI, this pulls in Paul Eggert's ACL-vs-Solaris-8 fix
as well as some portability fixes.

>From c00fb912ee1c53ef60374dba124053a4fe057acf Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Sun, 19 Dec 2010 21:23:19 +0100
Subject: [PATCH] build: update gnulib: ACL-vs-Solaris 8 portability etc.

---
 gnulib |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gnulib b/gnulib
index c96bccb..d534aa8 16
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit c96bccbbaa28b6888e487ff86c8b43bd594d2de4
+Subproject commit d534aa814510c8cd4ac70ffee3e0abd58daba356
--
1.7.3.4

[coreutils] [PATCH] tests: avoid new false-positive failure on at least FreeBSD 8.1

2010-12-21 Thread Jim Meyering

FYI, without this, the new part of the mv/trailing-slash test
would fail on FreeBSD 8.1:

>From 7ca55ef3d1ac512aaa99871adcdd8f21eb5af775 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 20 Dec 2010 14:27:08 +0100
Subject: [PATCH] tests: avoid new false-positive failure on at least FreeBSD 8.1

* tests/mv/trailing-slash: Accommodate different diagnostic
on FreeBSD 8.1.
---
 tests/mv/trailing-slash |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/tests/mv/trailing-slash b/tests/mv/trailing-slash
index 906174a..78b1042 100755
--- a/tests/mv/trailing-slash
+++ b/tests/mv/trailing-slash
@@ -56,6 +56,10 @@ printf '%s\n' \
 > expected-err
 touch b
 cp b no-such/ 2> err && fail=1
+
+# Map "No such file..." diagnostic to the expected "Not a directory"
+sed 's/No such file or directory/Not a directory/' err > k && mv k err
+
 compare err expected-err || fail=1

 Exit $fail
--
1.7.3.3

Re: [coreutils] [PATCH] tests: do not assume compiler knows -Wxxx flags

2010-12-22 Thread Jim Meyering

Paul Eggert wrote:
> * gnulib-tests/Makefile.am (test_xvasprintf_CFLAGS):
> (test_lock_CFLAGS, test_tls_CFLAGS): Do not append GCC-specific
> flags like -Wno-format-security unless the GCC-specific flag
> -Werror is also specified.  This avoids a "make check" failure on
> Solaris when using Sun C 5.8.
> ---
...
> -test_xvasprintf_CFLAGS = $(AM_CFLAGS) -Wno-format-security
> +test_xvasprintf_CFLAGS = $(AM_CFLAGS) \
> +  `test X$(WERROR_CFLAGS) = X || echo ' -Wno-format-security'`

Thanks for the fix.
Here's a minor adjustment.
While I've never defined WERROR_CFLAGS to anything
other than the empty string or -Werror, it's easy to
accommodate, and using a positive test like  test -n '...' && ...
is more readable to me than the negative one test X... = X || ...

>From 51f60a62372581fc7260f4c0eae52f06c39201ec Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Wed, 22 Dec 2010 11:10:23 +0100
Subject: [PATCH] tests: adjust preceding change to handle general WERROR_CFLAGS 
values

* gnulib-tests/Makefile.am (test_xvasprintf_CFLAGS):
(test_lock_CFLAGS, test_tls_CFLAGS): Avoid a syntax error when
$(WERROR_CFLAGS) expands to more than one token.
---
 gnulib-tests/Makefile.am |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gnulib-tests/Makefile.am b/gnulib-tests/Makefile.am
index 311baf9..edaa0a7 100644
--- a/gnulib-tests/Makefile.am
+++ b/gnulib-tests/Makefile.am
@@ -9,12 +9,12 @@ AM_CFLAGS = $(GNULIB_TEST_WARN_CFLAGS) $(WERROR_CFLAGS)
 # test-xvasprintf.c:100: error: format not a string literal and no \
 #   format arguments [-Wformat-security]
 test_xvasprintf_CFLAGS = $(AM_CFLAGS) \
-  `test X$(WERROR_CFLAGS) = X || echo ' -Wno-format-security'`
+  `test -n '$(WERROR_CFLAGS)' && echo ' -Wno-format-security'`

 # test-lock.c: In function 'lock_mutator_thread':
 # test-lock.c:148: error: cast from function call of type 'pthread_t' to \
 #   non-matching type 'void *' [-Wbad-function-cast]
 test_lock_CFLAGS = $(AM_CFLAGS) \
-  `test X$(WERROR_CFLAGS) = X || echo ' -Wno-bad-function-cast'`
+  `test -n '$(WERROR_CFLAGS)' && echo ' -Wno-bad-function-cast'`
 test_tls_CFLAGS = $(AM_CFLAGS) \
-  `test X$(WERROR_CFLAGS) = X || echo ' -Wno-bad-function-cast'`
+  `test -n '$(WERROR_CFLAGS)' && echo ' -Wno-bad-function-cast'`
--
1.7.3.3

[coreutils] coreutils-8.8 released [stable]

2010-12-22 Thread Jim Meyering

This is to announce coreutils-8.8, a stable, bug-fix release.
There are numerous fixes for our newly-parallelized sort.

The only significant non-bug-fix change was to add a useful set of
features to split that lets you split input into N roughly-equal pieces,
with options to split on line boundaries or not, and, when honoring line
boundaries, to distribute lines in a round-robin fashion or not.
See "info split" for details and examples, or the on-line manual:
http://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html

See NEWS below for a summary.

Here's the GNU Coreutils home page, in case you're wondering what it is:
http://www.gnu.org/software/coreutils/

Thanks to Paul Eggert, Chen Guo and Pádraig Brady for the many hours
they spent contributing to this release and to everyone else who has
been contributing, helping to manage the mailing list and reporting bugs.

Jim [on behalf of the coreutils maintainers]


For a summary of changes and contributors, see:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=shortlog;h=v8.8
or run this command from a git-cloned coreutils directory:
  git shortlog v8.7..v8.8

To summarize the many gnulib-related changes, run these commands from
a git-cloned coreutils directory:
  git checkout v8.8
  git submodule summary v8.7


Here are the compressed sources:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.8.tar.gz   (11MB)
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.8.tar.xz   (4.6MB)

Here are the GPG detached signatures[*]:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.8.tar.gz.sig
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.8.tar.xz.sig

To reduce load on the main server, use a mirror listed at:
  http://www.gnu.org/order/ftp.html

[*] You can use either of the above signature files to verify that
the corresponding file (without the .sig suffix) is intact.  First,
be sure to download both the .sig file and the corresponding tarball.
Then, run a command like this:

  gpg --verify coreutils-8.8.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 000B

and rerun the `gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.67.65-9144
  Automake 1.11a
  Gnulib v0.0-4551-gfe2a230
  Bison 2.4.3

NEWS
==
* Noteworthy changes in release 8.8 (2010-12-22) [stable]

** Bug fixes

  cp -u no longer does unnecessary copying merely because the source
  has finer-grained time stamps than the destination.

  od now prints floating-point numbers without losing information, and
  it no longer omits spaces between floating-point columns in some cases.

  sort -u with at least two threads could attempt to read through a
  corrupted pointer. [bug introduced in coreutils-8.6]

  sort with at least two threads and with blocked output would busy-loop
  (spinlock) all threads, often using 100% of available CPU cycles to
  do no work.  I.e., "sort < big-file | less" could waste a lot of power.
  [bug introduced in coreutils-8.6]

  sort with at least two threads no longer segfaults due to use of pointers
  into the stack of an expired thread. [bug introduced in coreutils-8.6]

  sort --compress no longer mishandles subprocesses' exit statuses,
  no longer hangs indefinitely due to a bug in waiting for subprocesses,
  and no longer generates many more than NMERGE subprocesses.

  sort -m -o f f ... f no longer dumps core when file descriptors are limited.

** Changes in behavior

  sort will not create more than 8 threads by default due to diminishing
  performance gains.  Also the --parallel option is no longer restricted
  to the number of available processors.

** New features

  split accepts the --number option to generate a specific number of files.


Also announced here:
  https://savannah.gnu.org/forum/forum.php?forum_id=6662


pgplNsjnapUbJ.pgp
Description: PGP signature

Re: [coreutils] RE: cp command performance

2010-12-24 Thread Jim Meyering

Pádraig Brady wrote:
> Didn't anyone get my previous reply on this.
> I'm wondering about my email server now.

Hi Pádraig,
This is the first reply I've seen from you in this thread.

[coreutils] [PATCH] maint: avoid syntax-check failure due to unused #include

2010-12-24 Thread Jim Meyering

I noticed that "make syntax-check" was failing:

...
src/getlimits.c
maint.mk: the above files include [<"]c-ctype.h[">] but don't use it
make[3]: *** [sc_prohibit_c_ctype_without_use] Error 1

Here's the fix:

>From 01211e9af728a5123b9cced19b7dfdc02b019f5b Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Fri, 24 Dec 2010 14:59:06 +0100
Subject: [PATCH] maint: avoid syntax-check failure due to unused #include

* src/getlimits.c: Don't include "c-ctype.h"; no longer used.
---
 src/getlimits.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/src/getlimits.c b/src/getlimits.c
index 986a5d7..1eab822 100644
--- a/src/getlimits.c
+++ b/src/getlimits.c
@@ -22,7 +22,6 @@
 #include 

 #include "system.h"
-#include "c-ctype.h"
 #include "long-options.h"

 #define PROGRAM_NAME "getlimits"
--
1.7.3.3

[coreutils] coreutils-8.9 next week

2010-12-30 Thread Jim Meyering

Given today's bug fix, I expect to release coreutils-8.9 next
week, so if you know of any other changes that would be appropriate
for a bug-fix-only release, please tell us now.

Re: [coreutils] testsuite worthless on cygwin

2010-12-30 Thread Jim Meyering

Eric Blake wrote:

> Unfortunately, 'make check' is failing almost all of the tests on
> cygwin.  Why?  Because init.sh uses find_exe_basenames_ to see whether
> all .exe files are safe to install a shim function around, but [.exe
> fails the test.  Meanwhile, create_exe_shims_ returns 1 rather than 0
> (or 77) when skipping a directory, which makes path_prepend_ fail almost
> every single test with the message 'something failed (above):
> /path/to/coreutils/src)'.
>
> I'm thinking that init.sh should skip problematic files (we're less
> likely to be testing [ directly to have to worry about conflicts with

Skipping sounds best to me.

> [.exe) rather than giving up on the entire directory.  Any other
> opinions, or should I go ahead and propose a patch?

Please do.

Re: [coreutils] [PATCH] maint: allow gettext 0.17 again

2010-12-30 Thread Jim Meyering

Eric Blake wrote:
> Commit 041c9c47 traded the 'gettext' module for the lighter 'gettext-h'
> module, so as to not require the latest gettext release (we only need
> the latest release if we ship gettext as a dependent library, but
> coreutils has long preferred to use it as an external library).
> But that commit overlooked two places necessary to allow the use of
> gettext 0.17.

IMHO, we should do our best to make 0.18 work as a build-from-clone
(not build from tarball) condition.
However, since you say that 0.18.x still does not build on cygwin,
relaxing that requirement to 0.17 makes it easier for more than a
few people to build coreutils, and I won't object.

> * bootstrap.conf (buildreq): Relax prerequisite.
> * configure.ac (AM_GNU_GETTEXT_VERSION): Likewise.
> ---
>
> I need this patch in order to bootstrap on cygwin (at the moment, all
> versions of gettext 0.18 are unbuildable on cygwin, due to problems
> between gcc's need for implicit symbol imports from libgcc1
> vs. Bruno's explicit avoidance of implicit symbols in gettext).

I confirmed that this works for me, with 0.18.1,
so you're welcome to push it.

[coreutils] [PATCH] maint: generate much of the THANKS file

2010-12-31 Thread Jim Meyering

Resurrecting a patch I started work on over two years ago...
I'll push this next year.

>From 886f1b958a06c62a8b4a3579d7e330ac2f255d12 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 17 Nov 2008 12:05:27 +0100
Subject: [PATCH] maint: generate much of the THANKS file

Before this change, we had a tendency to manually list each
contributor's name in THANKS.  Now, each commit "Author" is
included in the generated THANKS file automatically, and most
of the old THANKS file is now a template, THANKS.in.
We'll still have to manually list the names of people who report
problems without a usable patch.

* THANKS.in: New file, derived from THANKS, but removing names of
those who are listed as git log 'Author:'s.
* THANKS: Remove file.
* thanks-gen: New file.
* Makefile.am (THANKS): New rule.
(EXTRA_DIST): Add .mailmap, THANKS.in and thanks-gen.
* .gitignore: Add THANKS and THANKS-to-translators.
* .mailmap: Unify on single address and name-spelling per contributor.
---
 .gitignore  |2 +
 .mailmap|   25 ++--
 Makefile.am |   27 +-
 THANKS => THANKS.in |   62 +-
 thanks-gen  |   16 +
 5 files changed, 77 insertions(+), 55 deletions(-)
 rename THANKS => THANKS.in (92%)
 create mode 100755 thanks-gen

diff --git a/.gitignore b/.gitignore
index cd73b9a..7fead3d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -81,3 +81,5 @@ ID
 Makefile
 Makefile.in
 TAGS
+THANKS
+THANKS-to-translators
diff --git a/.mailmap b/.mailmap
index d4fcef5..e3d7a27 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1,12 +1,31 @@
 # Map git author names and email addresses to canonical/preferred form.
  
+ 
  
  
-Paul R. Eggert  
-Paul R. Eggert  
+Paul Eggert  
+Paul Eggert  
+ 
+
 # Evan's two changes listed my email address.
 Evan Hunt  Evan Hunt 
+
  
-Pádraig Brady  
+Pádraig Brady  
  
  
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+
+# Prefer spelled-out middle name and its address.
+Arne Henrik Juul  Arne H. Juul 
+
+# Had email as name.
+Dan Jacobson  jida...@jidanni.org 
diff --git a/Makefile.am b/Makefile.am
index b61229d..4fc6e8e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -68,9 +68,11 @@ syntax_check_exceptions =\
 EXTRA_DIST =   \
   $(changelog_etc) \
   $(syntax_check_exceptions)   \
+  .mailmap \
   .prev-version\
   .version \
   .vg-suppressions \
+  THANKS.in\
   THANKS-to-translators\
   THANKStt.in  \
   bootstrap\
@@ -79,7 +81,8 @@ EXTRA_DIST =  \
   cfg.mk   \
   dist-check.mk\
   gl/modules/getloadavg.diff   \
-  maint.mk
+  maint.mk \
+  thanks-gen

 ALL_RECURSIVE_TARGETS += install-root
 install-root:
@@ -152,3 +155,25 @@ check-ls-dircolors:
  |sed -n '/^  *"/p'|tr , '\n'|sed 's/^  *//'   \
  |sed -n 's/^"\(..\)"/\1/p'|sort -u);  \
test "$$dc" = "$$ls"
+
+# Sort in traditional ASCII order, regardless of the current locale;
+# otherwise we may get into trouble with distinct strings that the
+# current locale considers to be equal.
+ASSORT = LC_ALL=C sort
+
+# Extract all lines up to the first one starting with "##".
+prologue = perl -ne '/^\#\#/ and exit; print' $(srcdir)/THANKS.in
+
+THANKS: THANKS.in Makefile.am .mailmap thanks-gen .version
+   $(AM_V_GEN) \
+   {   \
+ $(prologue); echo;\
+ { perl -ne '/^$$/.../^$$/ and print' $(srcdir)/THANKS.in  \
+ | grep -v '^$$' | perl -pe 's/  +/\0/';   \
+   git log --pretty=format:'%aN%x00%aE'\
+ | $(ASSORT) -u;   \
+ } | $(srcdir)/thanks-gen  \
+   | LC_ALL=en_US.UTF-8 sort -f;   \
+ echo; \
+ printf ';; %s\n' 'Local Variables:' 'coding: utf-8' End:; \
+   } > $...@-t && mv $...@-t $@
diff --git a/THANKS b/THANKS.in
similarity index 92%
rename from THANKS
rename to THANKS.in
index 311fa5b..b475eef 100644
--- a/THANKS
+++ b/THANKS.in
@@ -1,14 +1,19 @@
 These people have contributed to the GNU coreutils (formerly, the fileutils,
 textutils, and/or sh-utils

Re: [coreutils] [PATCH] maint: update to latest gnulib, for testsuite improvement

2010-12-31 Thread Jim Meyering

Eric Blake wrote:
> * gnulib: Update to latest for init.sh fix.
> * bootstrap: Resync from gnulib.
> * tests/init.sh: Likewise.
> ---
>
> There are enough gnulib changes that I thought I'd ask for review
> before pushing this.

Thanks.  Please do.
I was getting ready to do the same thing and had already tested
on x86_64 and i686 with Fedora 14, using gnulib commit 57505a67
"mountlist: tweak previous commit".

> * gnulib fe2a230...9dd4319 (88):
>   > tests: avoid failing coreutils tests on cygwin
>   > sys_select: Avoid warning about missing memset declaration on HP-UX 11.
>   > waitpid: Fix link error in C++ mode.
>   > isnan: Use GCC built-ins when possible.
>   > isnand: Fix mistake.
>   > open: Avoid C++ error on HP-UX 11.
>   > time_r: Add missing declarations on HP-UX 11.
>   > mountlist: tweak previous commit
>   > mountlist: fix local drive detection on cygwin
>   > ftoastr, snprintf: ftoastr + snprintf module
...

[coreutils] THANKS-generating and copyright-updating changes pushed

2011-01-01 Thread Jim Meyering

FYI, I've just pushed these administrative changes:

commit 9d6231ef2a74671fa08bc263519edfe46e8d1805
Author: Jim Meyering 
Date:   Sat Jan 1 11:37:32 2011 +0100

maint: update all copyright year number ranges

Run "make update-copyright".

commit 257909013ef559418f612e8592f55b29dafda154
Author: Jim Meyering 
Date:   Sat Jan 1 11:10:00 2011 +0100

build: update gnulib for version-etc copyright year update

* tests/sample-test: Update copyright to 2011, to appease syntax-check.

commit 9a008a9e24166375cf512457155d38c760c89258
Author: Jim Meyering 
Date:   Mon Nov 17 12:05:27 2008 +0100

maint: generate much of the THANKS file

Before this change, we had a tendency to manually list each
contributor's name in THANKS.  Now, each commit "Author" is
included in the generated THANKS file automatically, and most
of the old THANKS file is now a template, THANKS.in.
We'll still have to manually list the names of people who report
problems without a usable patch.

* THANKS.in: New file, derived from THANKS, but removing names of
those who are listed as git log 'Author:'s.
* THANKS: Remove file.
* thanks-gen: New file.
* Makefile.am (THANKS): New rule.
(EXTRA_DIST): Add .mailmap, THANKS.in and thanks-gen.
* .gitignore: Add THANKS and THANKS-to-translators.
* .mailmap: Unify on single address and name-spelling per contributor.

Re: [coreutils] coreutils-8.9 next week

2011-01-03 Thread Jim Meyering

Jim Meyering wrote:
> Given today's bug fix, I expect to release coreutils-8.9 next
> week, so if you know of any other changes that would be appropriate
> for a bug-fix-only release, please tell us now.

Unless I hear otherwise, I'll release coreutils-8.9 tomorrow
with the current changes:

http://git.savannah.gnu.org/cgit/coreutils.git/

plus an update to the latest from gnulib.

[coreutils] coreutils-8.9 released [stable]

2011-01-04 Thread Jim Meyering

This is to announce coreutils-8.9, a stable, bug-fix release.
The sole bug was in split, introduced in 8.8.

See NEWS below for a summary.

Here's the GNU Coreutils home page:
http://www.gnu.org/software/coreutils/

Jim [on behalf of the coreutils maintainers]

Here are the compressed sources:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.9.tar.xz(4.6 MB)
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.9.tar.gz (11 MB)

Here are the GPG detached signatures[*]:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.9.tar.xz.sig
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.9.tar.gz.sig

If you prefer a mirror, you can use links like these:
  http://ftpmirror.gnu.org/coreutils/coreutils-8.9.tar.xz
  http://ftpmirror.gnu.org/coreutils/coreutils-8.9.tar.gz

[*] You can use either of the above signature files to verify that
the corresponding file (without the .sig suffix) is intact.  First,
be sure to download both the .sig file and the corresponding tarball.
Then, run a command like this:

  gpg --verify coreutils-8.9.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 000B

and rerun the `gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.68.11-45b92
  Automake 1.11a
  Gnulib v0.0-4673-gfa6be5b
  Bison 2.4.3

=
NEWS

* Noteworthy changes in release 8.9 (2011-01-04) [stable]

** Bug fixes

  split no longer creates files with a suffix length that
  is dependent on the number of bytes or lines per file.
  [bug introduced in coreutils-8.8]


Also announced at http://savannah.gnu.org/forum/forum.php?forum_id=6679


pgpUd2gtQhEbL.pgp
Description: PGP signature

[coreutils] new branch in gnulib: coreutils-8.9

2011-01-04 Thread Jim Meyering

In releasing coreutils-8.9, I left its gnulib submodule pointing at a
private commit that was going to be pushed, pending an ACK...  Since a
slightly different commit was pushed, coreutils' submodule SHA1 was
invalid for a short interval.  However, I have pushed my local commit
to the new gnulib branch named "coreutils-8.9".  With that, anyone
cloning coreutils and trying to build from the 8.9 tag *will* get a
usable gnulib hierarchy.

Thanks to Eric Blake for noticing so quickly and for the suggestion
to push my commit to a branch in gnulib.

[coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-04 Thread Jim Meyering

Ralf Wildenhues wrote:
> Hi Jim,
>
> * Jim Meyering wrote on Tue, Jan 04, 2011 at 07:14:49PM CET:
>> In releasing coreutils-8.9, I left its gnulib submodule pointing at a
>> private commit that was going to be pushed, pending an ACK...  Since a
>> slightly different commit was pushed, coreutils' submodule SHA1 was
>> invalid for a short interval.  However, I have pushed my local commit
>> to the new gnulib branch named "coreutils-8.9".  With that, anyone
>> cloning coreutils and trying to build from the 8.9 tag *will* get a
>> usable gnulib hierarchy.
>
> FYI, this would have also been a good point at which you could have
> merged the commit used for coreutils into the gnulib master branch.
> Even a fake merge 'git merge -s ours coreutils-8.9' would be
> appropriate, if the end result is semantically that the effects of your
> patch are incorporated in the tree.
>
> You could still do this.
>
> That way, the next update of gnulib as coreutils submodule could still
> be a fast-forward.

Hi Ralf,

I've done as you suggest (note that it required temporarily
disabling the hook that prohibits pushing merge commits to master).
That has the added benefit that the branch is no longer needed.

Thanks for the suggestion,

Jim

[coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-04 Thread Jim Meyering

Eric Blake wrote:
> On 01/04/2011 01:05 PM, Jim Meyering wrote:
>>> That way, the next update of gnulib as coreutils submodule could still
>>> be a fast-forward.
>>
>> Hi Ralf,
>>
>> I've done as you suggest (note that it required temporarily
>> disabling the hook that prohibits pushing merge commits to master).
>> That has the added benefit that the branch is no longer needed.
>
> Except that you --amend'ed the commit before merging, so that the
> gnulib.git master branch refers to commit a77937c rather than the
> intended fa6be5b that is current on the gnulib.git coreutils-8.9 branch,
> so right now, one _still_ needs the coreutils-8.9 branch to exist.

sigh.  Thanks, Eric.
My first merge used an old, invalid branch.
This one does the right thing:

  git merge -s ours origin/coreutils-8.9

coming up...

[coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-04 Thread Jim Meyering

Jim Meyering wrote:
> In releasing coreutils-8.9, I left its gnulib submodule pointing at a
> private commit that was going to be pushed, pending an ACK...  Since a
> slightly different commit was pushed, coreutils' submodule SHA1 was
> invalid for a short interval.  However, I have pushed my local commit
> to the new gnulib branch named "coreutils-8.9".  With that, anyone
> cloning coreutils and trying to build from the 8.9 tag *will* get a
> usable gnulib hierarchy.
>
> Thanks to Eric Blake for noticing so quickly and for the suggestion
> to push my commit to a branch in gnulib.

I've included a patch, below, that should prevent recurrence.
I've made the new rule a dependent of "alpha beta stable",
so that it's only run at release time.

Obviously it'd be nice to do this for each and every submodule,
but for that, you'd need a URL for each of them, so I'm
keeping it simple for now.

Eventually I'll move it into maint.mk, where it will perform the test
only when gnulib is one of the submodules.

diff --git a/cfg.mk b/cfg.mk
index 4df74f0..0517201 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -337,3 +337,14 @@ include $(srcdir)/dist-check.mk
 update-copyright-env = \
   UPDATE_COPYRIGHT_USE_INTERVALS=1 \
   UPDATE_COPYRIGHT_MAX_LINE_LENGTH=79
+
+# Just before tagging a release, ensure that the gnulib submodule
+# commit we're using is public.
+alpha beta stable: gnulib-commit-check
+.PHONY: gnulib-commit-check
+gnulib-commit-check:
+   submod=gnulib; \
+   st=$$(git submodule status $$submod) || exit 1; \
+   commit=$$(echo $$st | awk '{ print $$1 }') || exit 1; \
+   wget -q -O /dev/null \
+ "http://git.savannah.gnu.org/gitweb/?p=$$submod.git;a=tree;h=$$commit";

Re: [coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-05 Thread Jim Meyering

Eric Blake wrote:

> On 01/04/2011 03:01 PM, Eric Blake wrote:
>> Found one, and it even works on libvirt (where the gnulib submodule is
>> named .gnulib instead of gnulib).  It should also work on a repository
>> with multiple submodules, although I have not yet tested it on bison.git.
>>
>> gnulib-commit-check:
>>  git submodule foreach test '$$(git merge-base --independent \
>>origin $$sha1 | wc -w)' = 1
>>
>> This effectively runs the shell command:
>>
>> git merge-base --independent origin $sha1
>>
>> with $sha1 set to the value recorded in the superproject (identical to
>> your computation of $commit), and outputs the minimal set of revisions
>> not reachable from any other in the set of arguments.  If origin
>> contains $sha1, the output is a single hash (it happens to be that of
>> origin); if $sha1 is a local-only revision, the output is two hashes, so
>> the test fails, which in turn lets 'make alpha' fail.
>
> Not quite robust enough - if your superproject contains commits not
> pushed upstream yet, then merge-base would still output a single
> reference.  This should close that gap, by requiring that the least
> reachable commit be exactly the upstream origin.
>
> gnulib-commit-check:
>   git submodule foreach test '$$(git rev-parse origin)' \
> = '"$$(git merge-base --independent origin $$sha1)"'

Sounds nice.  Thanks!
I'll take a look this evening.

Re: [coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-06 Thread Jim Meyering

Eric Blake wrote:
> On 01/04/2011 03:01 PM, Eric Blake wrote:
>> Found one, and it even works on libvirt (where the gnulib submodule is
>> named .gnulib instead of gnulib).  It should also work on a repository
>> with multiple submodules, although I have not yet tested it on bison.git.
>>
>> gnulib-commit-check:
>>  git submodule foreach test '$$(git merge-base --independent \
>>origin $$sha1 | wc -w)' = 1
>>
>> This effectively runs the shell command:
>>
>> git merge-base --independent origin $sha1
>>
>> with $sha1 set to the value recorded in the superproject (identical to
>> your computation of $commit), and outputs the minimal set of revisions
>> not reachable from any other in the set of arguments.  If origin
>> contains $sha1, the output is a single hash (it happens to be that of
>> origin); if $sha1 is a local-only revision, the output is two hashes, so
>> the test fails, which in turn lets 'make alpha' fail.
>
> Not quite robust enough - if your superproject contains commits not
> pushed upstream yet, then merge-base would still output a single
> reference.  This should close that gap, by requiring that the least
> reachable commit be exactly the upstream origin.
>
> gnulib-commit-check:
>   git submodule foreach test '$$(git rev-parse origin)' \
> = '"$$(git merge-base --independent origin $$sha1)"'

Thanks again for the fine test.
It passed my tests, so I propose this in your name:

>From 8ba2dc9163f753c4953e8686f2b611d4e2a3ae84 Mon Sep 17 00:00:00 2001
From: Eric Blake 
Date: Thu, 6 Jan 2011 10:35:18 +0100
Subject: [PATCH] maint.mk: add pre-release check to ensure submodule commits 
are public

* top/maint.mk (public-submodule-commit): New rule.
(submodule-checks): New variable.
(alpha beta stable): Depend on the variable.
---
 ChangeLog|7 +++
 top/maint.mk |   12 +++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6b99e92..5e319f9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2011-01-06  Eric Blake  
+
+   maint.mk: add pre-release check to ensure submodule commits are public
+   * top/maint.mk (public-submodule-commit): New rule.
+   (submodule-checks): New variable.
+   (alpha beta stable): Depend on the variable.
+
 2011-01-05  Pádraig Brady 
and Jim Meyering  

diff --git a/top/maint.mk b/top/maint.mk
index 5545e69..3a3ede6 100644
--- a/top/maint.mk
+++ b/top/maint.mk
@@ -1118,9 +1118,19 @@ no-submodule-changes:
  : ;   \
fi

+submodule-checks = no-submodule-changes public-submodule-commit
+
+# Ensure that each sub-module commit we're using is public.
+# Without this, it is too easy to tag and release code that
+# cannot be built from a fresh clone.
+.PHONY: public-submodule-commit
+public-submodule-commit:
+   git submodule foreach 'test $$(git rev-parse origin)' \
+ = '"$$(git merge-base --independent origin $$sha1)"'
+
 .PHONY: alpha beta stable
 ALL_RECURSIVE_TARGETS += alpha beta stable
-alpha beta stable: $(local-check) writable-files no-submodule-changes
+alpha beta stable: $(local-check) writable-files $(submodule-checks)
test $@ = stable\
  && { echo $(VERSION) | grep -E '^[0-9]+(\.[0-9]+)+$$' \
   || { echo "invalid version string: $(VERSION)" 1>&2; exit 1;};}\
--
1.7.3.4

Re: [coreutils] Re: new branch in gnulib: coreutils-8.9

2011-01-06 Thread Jim Meyering

Eric Blake wrote:

> On 01/06/2011 02:44 AM, Jim Meyering wrote:
>>> gnulib-commit-check:
>>> git submodule foreach test '$$(git rev-parse origin)' \
>>>   = '"$$(git merge-base --independent origin $$sha1)"'
>>
>> Thanks again for the fine test.
>> It passed my tests, so I propose this in your name:
>>
>>>From 8ba2dc9163f753c4953e8686f2b611d4e2a3ae84 Mon Sep 17 00:00:00 2001
>> From: Eric Blake 
>> Date: Thu, 6 Jan 2011 10:35:18 +0100
>> Subject: [PATCH] maint.mk: add pre-release check to ensure submodule
>> commits are public
>
> Yes, feel free to push that in my name to gnulib.
>
>> +submodule-checks = no-submodule-changes public-submodule-commit
>
> If someone complains about these two tests being git-centric, a solution
> for a project using maint.mk with a different VCS would be to make
> submodule-checks defined via ?=, so that it can be overridden in cfg.mk
> to an empty macro to skip git submodule consistency checks.  But I'm
> okay using = instead of ?= and waiting until someone requests that extra
> flexibility, if you don't want to change it now.

Good idea.
I've also made the test silently skip when there is no .git/ directory,
just like no-submodule-changes rule does.

One more iteration:

>From 807533b8c9cec5320db9cf7e0746d229bd0700a6 Mon Sep 17 00:00:00 2001
From: Eric Blake 
Date: Thu, 6 Jan 2011 10:35:18 +0100
Subject: [PATCH] maint.mk: add pre-release check to ensure submodule commits 
are public

* top/maint.mk (public-submodule-commit): New rule.
(submodule-checks): New variable.
(alpha beta stable): Depend on the variable.
---
 ChangeLog|7 +++
 top/maint.mk |   18 +-
 2 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6b99e92..5e319f9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2011-01-06  Eric Blake  
+
+   maint.mk: add pre-release check to ensure submodule commits are public
+   * top/maint.mk (public-submodule-commit): New rule.
+   (submodule-checks): New variable.
+   (alpha beta stable): Depend on the variable.
+
 2011-01-05  Pádraig Brady 
and Jim Meyering  

diff --git a/top/maint.mk b/top/maint.mk
index 5545e69..f892304 100644
--- a/top/maint.mk
+++ b/top/maint.mk
@@ -1118,9 +1118,25 @@ no-submodule-changes:
  : ;   \
fi

+submodule-checks ?= no-submodule-changes public-submodule-commit
+
+# Ensure that each sub-module commit we're using is public.
+# Without this, it is too easy to tag and release code that
+# cannot be built from a fresh clone.
+.PHONY: public-submodule-commit
+public-submodule-commit:
+   if test -d $(srcdir)/.git; then \
+ git submodule foreach 'test $$(git rev-parse origin)' \
+ = '"$$(git merge-base --independent origin $$sha1)"'  \
+   || { echo '$(ME): found non-public submodule commit' >&2;   \
+exit 1; }; \
+   else\
+ : ;   \
+   fi
+
 .PHONY: alpha beta stable
 ALL_RECURSIVE_TARGETS += alpha beta stable
-alpha beta stable: $(local-check) writable-files no-submodule-changes
+alpha beta stable: $(local-check) writable-files $(submodule-checks)
test $@ = stable\
  && { echo $(VERSION) | grep -E '^[0-9]+(\.[0-9]+)+$$' \
   || { echo "invalid version string: $(VERSION)" 1>&2; exit 1;};}\
--
1.7.3.4

Re: [coreutils] [PATCH] maint: suppress some clang scan-build warnings

2011-01-06 Thread Jim Meyering

Pádraig Brady wrote:
>
> maint: suppress some clang scan-build warnings
>
> * src/pr.c (char_to_clump): Remove a dead store.
> * src/remove.c (fts_skip_tree): Likewise.
> * src/sort.c (key_warnings): Likewise.
> (sort): Suppress an uninitialized pointer warning.

Thanks!  That is fine, with one correction below.

I hope clang's analysis improves to the point
that we can make coreutils warning-free without too much effort.
I noticed that this warning about ln.c is bogus:

203 && (backup_type == no_backups || !symbolic_link)
204 && (!symbolic_link || stat (source, &source_stats) == 0)
205 && SAME_INODE (source_stats, dest_stats)

  Within the expansion of the macro 'SAME_INODE':
  The left operand of '==' is a garbage value

Here's the expansion:
 ((source_stats).st_ino == (dest_stats).st_ino &&
  (source_stats).st_dev == (dest_stats).st_dev)

The problem is that clang made the assumption, a few lines above,
that "symbolic_link" was true, yet doesn't deduce the consequence:
whenever SAME_INODE is reached, the just-prior call to stat has
succeeded, and hence has defined "source_stats".

...
> diff --git a/src/sort.c b/src/sort.c
...
> @@ -3770,6 +3770,7 @@ sort (char *const *files, size_t nfiles, char const 
> *output_file,
>size_t nthreads)
>  {
>struct buffer buf;
> +  IF_LINT (buf.buf == NULL);

ETOOLATE ;-)  You must mean this:

IF_LINT (buf.buf = NULL);

Re: [coreutils] [PATCH] coreutils new feature: split --filter

2011-01-07 Thread Jim Meyering

Karl Heuer wrote:
> For some years now I've been bemoaning the lack of a
> `split --filter=COMMAND' feature, which would write to
> pipes instead of files, instead of having to split to
> a set of files and then run the filter on each of them.
> The environment variable FILE is set to the name of the
> file that would have been written; it's the command's
> responsibility to use that string appropriately.
> Sample usage (note proper quoting of $FILE):
> $ split --filter 'gzip >$FILE.gz'
>
> The details turned out to be rather trickier than I'd
> expected, but I think it's ready to be shared with the
> folks in charge of coreutils.  Source patch attached.
> I haven't touched the documentation or tests.

Hi Karl,
It's been a long time.  Hope you're doing well.

This sounds like a feature I'd be happy to use.
I think we're ok on the copyright front, since split
used to be part of the textutils and you have an assignment
on file for that project.

Do you feel like updating NEWS and doc/coreutils.texi and/or
writing a test or two?

Regardless, someone should be able to review your patch
in the next few days.

Thanks!

Re: [coreutils] join feature: auto-format

2011-01-07 Thread Jim Meyering

Pádraig Brady wrote:
...
>> While -e without -o was previously a noop, and so could safely be extended 
>> IMHO,
>> this will also change the behavior when with -e and -j are specified.
>> Previously if -j > 1 was specified, and that field was missing,
>> then -e would be used in its place, rather than the empty string.
>> This still does that, but also does the padding.
>> Without the -j issue I'd be 80:20 for just extending -e to auto pad,
>> but given -j I'm 50:50.  The alternative it to select this with
>> say '-o padded', but that's less discoverable, and complicates
>> the interface somewhat.
>
> Considering this more, I think it's safer to auto pad only
> when '-o padded' is specified. I notice the plan9 `join` man page
> has an example that uses -e '' to explicitly specify the NUL string as filler,
> which would have triggered our auto pad if we left it as above.

I'm glad you found that, confirming that
the conservative approach is better.

Re: [coreutils] RFC for adding file creation mode feature into touch utility.

2011-01-07 Thread Jim Meyering

Eric Blake wrote:
> On 01/07/2011 09:23 AM, Pádraig Brady wrote:
>>> Hmm - mkdir already supports -m/--mode, so it does sound like making
>>> touch support the same option might be a useful enhancement.  Of course,
>>> if the file exists, the --mode option would be ignored.  And POSIX
>>> requires that if the --mode option is not provided but a file is
>>> created, that the file have mode 0666 as modified by umask.  We can't
>>> use -m, though (POSIX already requires it for mtime).
>>
>> Why does mkdir have this option rather than relying on umask?
>> My guess is to close security races with sticky bits etc.
>> and that creating a file will not have the same issues?
>
> mkdir has -m because POSIX requires it; also, you are right that the
> setup of 'mkdir -p -m' creates a hierarchy where you want the
> permissions to be set correctly for the entire set for race avoidance.
> But touch does not create hierarchies.
>
>>
>> ...
>>> I'm 70:30 for adding this idea, and it's rather simple to do, as well
>>> (would still need doc, NEWS, and test additions).
>>
>> I'm a little less enthused to diverge from POSIX for this.
>
> It's not diverging from POSIX, so much as adding an extension.  It all
> boils down to whether the extension would prove useful, and whether it
> can optimize work flows faster than any existing alternative that gets
> the same end result but with more steps.  But thinking about it more,
> I'd rather use just the long-option --mode rather than burn a short
> option -M (or any other).
>
> As for creating a file with +x or with 07000 permissions already added,
> why?  Creating an empty file as already executable or setuid doesn't
> make much sense (the empty file will be a nop); why not wait until the
> file has final executable contents before adding +x or special
> permissions later?  So, that leaves only permissions matching of a
> reference file (and recent GNU cp already supports that), or restricting
> permissions less than touch's default of 0666 (but touch honors umask,
> so just modify your umask before touch), neither of which need a chmod
> after the fact, nor a --mode argument to touch.
>
> What would really clinch the deal is if any other touch implementation
> out there already has something like this.  But off the top of my head,
> I don't know of any (neither FreeBSD nor Solaris touch provides anything
> like this).  So I guess I'm now 50:50 (undecided on whether the
> extension makes sense without more evidence of its usefulness).

I confess that my first reaction was "why bother?"
Touch is nominally a very simple tool, and adding --mode support
feels like adding unwarranted bloat -- and not in line with
the one-job-one-tool philosophy.

Why would one prefer the GNU-specific touch --mode ...
over the portable combination of touch and chmod?

Re: [coreutils] [PATCH] coreutils new feature: split --filter

2011-01-07 Thread Jim Meyering

Karl Heuer wrote:
>>I think we're ok on the copyright front, since split
>>used to be part of the textutils and you have an assignment
>>on file for that project.
>
> Great.  I am, of course, OK with assigning copyright in any case.
>
>>Do you feel like updating NEWS and doc/coreutils.texi and/or
>>writing a test or two?
>
> Sure.  I haven't looked at the format for new tests; it might save
> some time if I just say that input FOO should produce output BAR,
> and let someone else translate that into what's needed.  Meanwhile,
> if someone else thinks of tests that should be included (or finds
> a problem with my code as written), keep me in the loop to avoid
> duplication of effort.

You may want to use tests/sample-test as a starting point
and to take inspiration from the others in tests/misc/split-*.

Re: [coreutils] RFC for adding file creation mode feature into touch utility.

2011-01-08 Thread Jim Meyering

Rakib Mullick wrote:
>> I confess that my first reaction was "why bother?"
>> Touch is nominally a very simple tool, and adding --mode support
>> feels like adding unwarranted bloat -- and not in line with
>> the one-job-one-tool philosophy.
>
> Yes, right. Then, touch only supposed to change file's timestamp. No?
> But, touch also creates a file for us. Not only it creates a file for
> us, when it creates a file it takes reference from other files. So,
> althrough we're in philosophy of one-tool-do-one-job-well, there are
> something else that we can get.
>
> And if adding --mode option to touch gets really out of philosophy,
> then I think we need a new tool to for creating file with all the
> options that we might have want. That will truly help us to keep the
> philosophy intact.

?

When the utility of a new feature is not obvious, the proposal for
that new feature must be accompanied by significant justification,
and so far, the argument that "adding this option to touch saves us the
trouble of invoking chmod" is insufficient.

Typical "good" justification usually goes like this:

I want to do X using POSIX tools, and the portable way to do it is
too cumbersome, but if I add this option to one of the coreutils,
it becomes much easier.  Here is how to do X using current tools:

...list of actual commands (inefficient, cumbersome, etc)...

and now, do X using the program with the proposed change:

...list of commands demonstrating benefits of proposed change...

Then, if there is no reasonably simple/concise way to solve the problem
with some other tool, we evaluate whether some coreutils program
should be extended to do the job.

>> Why would one prefer the GNU-specific touch --mode ...
>> over the portable combination of touch and chmod?
>>
> Cause its easier to do 'touch --mode xxx filename' than 'touch
> filename' and 'chmod xxx'.

As mentioned above, that is not enough justification.

> And Eric also pointed out that we will be
> able to do 'touch -r a -M b' to create a referenced file with
> different mode.

Let's back up a step...
Why do you want an option for this when you can already use touch
to create a file with selected permissions via your umask?
If you say that you require an execute bit or some special bit to
be set, please explain why.  I do not see how having such bits set
on an empty file would be useful.  If you're about to add content,
then the time stamps set by touch will be overwritten, and you might as
well have created the file via the shell, (if there was only one) say
with ": > FILE".

[coreutils] Re: Bug#609049: du: process_file: Assertion `level == prev_level - 1' failed

2011-01-08 Thread Jim Meyering

Jonathan Nieder wrote:
> Package: coreutils
> Version: 8.5-1
>
> Happy new year.  Today I ran
>
>  $ du -sk ~/*
>
> to see where all the space is going.  The traversal almost certainly
> was in ~/src, and being bored, I suspended it to examine that
> directory:
>
>  ^Z
>  [1]+  Stopped du -sk ~/*
>
> In another terminal I ran "cd ~/src && du -sk *".  ~/src/netbsd was
> very big, so I moved it (using chmod +w first) to ~/.cache on the same
> filesystem for future reference.  And now time to resume the toplevel
> search:
>
>  $ fg
>  du -sk ~/*
>  du: fts_read failed: No such file or directory
>  du: du.c:583: process_file: Assertion `level == prev_level - 1' failed.
>  Aborted (core dumped)
>
> Lacking debugging symbols, the core dump does not seem so useful, but
> I still have it.  libc6 is 2.11.2-7, filesystem is ext4, and the
> kernel is 2.6.37-rc7-686.  If there is additional information I could
> provide, just ask.

Thank you for the fine bug report.
Here's a fix:

While I do have a reproducer that will become a test suite addition
(coming soon), it relies on python (a first) and the python-inotify
package, so I'll have to be careful to skip the test when those
prerequisites aren't installed.  Also, there's an inherent race condition,
so I'll have to find the right compromise between absolute test-robustness
(too expensive in time and inodes) and reasonableness.

>From 92bbec0f697575a8bcbbe3fe0cb54b468c1022a4 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Sat, 8 Jan 2011 17:44:55 +0100
Subject: [PATCH] du: don't abort when a subdir is renamed out from under us

* NEWS (Bug fixes): Mention it.
* src/du.c (prev_level): Move declaration "up" to file-scope global.
(du_files): Reset prev_level to 0 upon abnormal fts_read termination.
Also improve a diagnostic.
Reported by Johathan Nieder in http://bugs.debian.org/609049
---
 NEWS |7 +++
 src/du.c |   15 +++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index 2a71ca6..793a8e6 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,13 @@ GNU coreutils NEWS-*- 
outline -*-

 * Noteworthy changes in release ?.? (-??-??) [?]

+** Bug fixes
+
+  du could abort with a failed assertion when part of the hierarchy being
+  traversed was moved to a higher level in the directory tree, and when
+  there was at least one more command line directory argument following
+  the one containing the moved sub-tree.
+

 * Noteworthy changes in release 8.9 (2011-01-04) [stable]

diff --git a/src/du.c b/src/du.c
index 77deb0c..4dc5456 100644
--- a/src/du.c
+++ b/src/du.c
@@ -63,8 +63,11 @@ extern bool fts_debug;
 /* A set of dev/ino pairs.  */
 static struct di_set *di_set;

-/* Define a class for collecting directory information. */
+/* Keep track of the preceding "level" (depth in hierarchy)
+   from one call of process_file to the next.  */
+static size_t prev_level;

+/* Define a class for collecting directory information. */
 struct duinfo
 {
   /* Size of files in directory.  */
@@ -399,7 +402,6 @@ process_file (FTS *fts, FTSENT *ent)
   struct duinfo dui;
   struct duinfo dui_to_print;
   size_t level;
-  static size_t prev_level;
   static size_t n_alloc;
   /* First element of the structure contains:
  The sum of the st_size values of all entries in the single directory
@@ -582,10 +584,15 @@ du_files (char **files, int bit_flags)
 {
   if (errno != 0)
 {
-  /* FIXME: try to give a better message  */
-  error (0, errno, _("fts_read failed"));
+  error (0, errno, _("%s: fts_read failed"),
+ quotearg_colon (fts->fts_path));
   ok = false;
 }
+
+  /* When exiting this loop early, be careful to reset the
+ global, prev_level, used in process_file.  Otherwise, its
+ (level == prev_level - 1) assertion could fail.  */
+  prev_level = 0;
   break;
 }
   FTS_CROSS_CHECK (fts);
--
1.7.3.5

Re: [coreutils] [PATCH] coreutils buglets, useless-if-before-free

2011-01-09 Thread Jim Meyering

Karl Heuer wrote:
> `useless-if-before-free --help' prints a literal
> NUL character instead of the backslash-zero that
> was intended.  Also, the "must lie within the
> first 8 lines" line is on line 9, and hence not
> getting automatically updated.  This patch fixes
> the former by adding a backslash, and the latter
> by condensing the three lines of what-it-does to
> a single line, leaving one line of slack for the
> future.  (An alternative would be to move the
> what-it-does text to appear below the VERSION.)

Thanks, Karl!
FYI, I've pushed that to the gnulib.git repository,
since that's the canonical source for that particular file.
I massaged your summary into a ChangeLog entry and
"git commit"ted it.  Once I'd done that, I ran:

  git format-patch --stdout -1 > FILE

to produce the patch you see below, which includes authorship
information (your name and email) and a commit log message.

We prefer to receive patches of the form below because then
we don't have to write as much and can simply run

  git am FILE

to commit the patch to our local repository.

If you're new to git, here's a quick intro:
  http://git.sv.gnu.org/cgit/coreutils.git/tree/README-hacking

Your patch counts as trivial, so I don't need a gnulib copyright
assignment just yet, but if you think you will be submitting other
changes to gnulib, please fill out the copyright assignment form for
gnulib, and you might as well add coreutils (you can do 4 at once):

  
git.sv.gnu.org/gitweb/?p=gnulib.git;f=doc/Copyright/request-assign.future;hb=HEAD
  http://www.gnu.org/prep/maintain/html_node/Copyright-Papers.html


>From 1b58d734924189cf963936e202f361dd19bd109c Mon Sep 17 00:00:00 2001
From: Karl Heuer 
Date: Sun, 9 Jan 2011 10:16:08 +0100
Subject: [PATCH] useless-if-before-free: fix typo in --help and make the 
internal,

automatic version date update process work once again.
--help output contained a NUL character instead of the
backslash-zero that was intended.  Also, the "must lie within
the first 8 lines" line is on line 9, and hence not getting
automatically updated.
* build-aux/useless-if-before-free: Fix the former by adding a
backslash, and the latter by condensing the three lines of what-it-does
to a single line, leaving one line of slack for the future.
---
 ChangeLog|   12 
 build-aux/useless-if-before-free |8 +++-
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index eb0d8bd..a2070af 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2011-01-09  Karl Heuer  
+
+   useless-if-before-free: fix typo in --help and make the internal,
+   automatic version date update process work once again.
+   --help output contained a NUL character instead of the
+   backslash-zero that was intended.  Also, the "must lie within
+   the first 8 lines" line is on line 9, and hence not getting
+   automatically updated.
+   * build-aux/useless-if-before-free: Fix the former by adding a
+   backslash, and the latter by condensing the three lines of what-it-does
+   to a single line, leaving one line of slack for the future.
+
 2011-01-08  Bruno Haible  

uninorm tests: Preserve copyright of Unicode data file.
diff --git a/build-aux/useless-if-before-free b/build-aux/useless-if-before-free
index 01a6d3d..a6c228b 100755
--- a/build-aux/useless-if-before-free
+++ b/build-aux/useless-if-before-free
@@ -2,11 +2,9 @@ eval '(exit $?0)' && eval 'exec perl -wST "$0" ${1+"$@"}'
   & eval 'exec perl -wST "$0" $argv:q'
 if 0;
 # Detect instances of "if (p) free (p);".
-# Likewise for "if (p != NULL) free (p);".  And with braces.
-# Also detect "if (NULL != p) free (p);".
-# And with 0 in place of NULL.
+# Likewise "if (p != 0)", "if (0 != p)", or with NULL; and with braces.

-my $VERSION = '2009-04-16 15:57'; # UTC
+my $VERSION = '2011-01-09 01:39'; # UTC
 # The definition above must lie within the first 8 lines in order
 # for the Emacs time-stamp write hook (at end) to update it.
 # If you change this file with Emacs, please let the write hook
@@ -63,7 +61,7 @@ detect free-like functions named FOO and BAR.

 OPTIONS:

-   --list   print only the name of each matching FILE (\0-terminated)
+   --list   print only the name of each matching FILE (\\0-terminated)
--name=N add name N to the list of \`free\'-like functions to detect;
   may be repeated

--
1.7.3.5

Re: [coreutils] Re: Bug#609049: du: process_file: Assertion `level == prev_level - 1' failed

2011-01-09 Thread Jim Meyering

Jim Meyering wrote:
...
> While I do have a reproducer that will become a test suite addition
> (coming soon), it relies on python (a first) and the python-inotify
> package, so I'll have to be careful to skip the test when those
> prerequisites aren't installed.  Also, there's an inherent race condition,
> so I'll have to find the right compromise between absolute test-robustness
> (too expensive in time and inodes) and reasonableness.

Here's a complete patch, including the test.
I would have preferred to avoid the one-second sleep.
Maybe someone here can see a better way.

>From 9bf47055f64efb56d022feca01ad901e85e21bc1 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Sat, 8 Jan 2011 17:44:55 +0100
Subject: [PATCH] du: don't abort when a subdir is renamed during traversal

* NEWS (Bug fixes): Mention it.
* src/du.c (prev_level): Move declaration "up" to file-scope global.
(du_files): Reset prev_level to 0 upon abnormal fts_read termination.
Reported by Johathan Nieder in http://bugs.debian.org/609049
Also, improve a diagnostic.
* tests/du/move-dir-while-traversing: Test for the above.
* tests/Makefile.am (TESTS): Add it.
---
 NEWS   |8 +++
 src/du.c   |   15 +--
 tests/Makefile.am  |1 +
 tests/du/move-dir-while-traversing |   85 
 4 files changed, 105 insertions(+), 4 deletions(-)
 create mode 100755 tests/du/move-dir-while-traversing

diff --git a/NEWS b/NEWS
index 2a71ca6..5a70243 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,14 @@ GNU coreutils NEWS-*- 
outline -*-

 * Noteworthy changes in release ?.? (-??-??) [?]

+** Bug fixes
+
+  du would abort with a failed assertion when two conditions are met:
+  part of the hierarchy being traversed is moved to a higher level in the
+  directory tree, and there is at least one more command line directory
+  argument following the one containing the moved sub-tree.
+  [bug introduced in coreutils-5.1.0]
+

 * Noteworthy changes in release 8.9 (2011-01-04) [stable]

diff --git a/src/du.c b/src/du.c
index 77deb0c..671cac7 100644
--- a/src/du.c
+++ b/src/du.c
@@ -63,8 +63,11 @@ extern bool fts_debug;
 /* A set of dev/ino pairs.  */
 static struct di_set *di_set;

-/* Define a class for collecting directory information. */
+/* Keep track of the preceding "level" (depth in hierarchy)
+   from one call of process_file to the next.  */
+static size_t prev_level;

+/* Define a class for collecting directory information. */
 struct duinfo
 {
   /* Size of files in directory.  */
@@ -399,7 +402,6 @@ process_file (FTS *fts, FTSENT *ent)
   struct duinfo dui;
   struct duinfo dui_to_print;
   size_t level;
-  static size_t prev_level;
   static size_t n_alloc;
   /* First element of the structure contains:
  The sum of the st_size values of all entries in the single directory
@@ -582,10 +584,15 @@ du_files (char **files, int bit_flags)
 {
   if (errno != 0)
 {
-  /* FIXME: try to give a better message  */
-  error (0, errno, _("fts_read failed"));
+  error (0, errno, _("fts_read failed: %s"),
+ quotearg_colon (fts->fts_path));
   ok = false;
 }
+
+  /* When exiting this loop early, be careful to reset the
+ global, prev_level, used in process_file.  Otherwise, its
+ (level == prev_level - 1) assertion could fail.  */
+  prev_level = 0;
   break;
 }
   FTS_CROSS_CHECK (fts);
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 06d81f0..a5dbd3e 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -370,6 +370,7 @@ TESTS = \
   du/long-from-unreadable  \
   du/long-sloop\
   du/max-depth \
+  du/move-dir-while-traversing \
   du/no-deref  \
   du/no-x  \
   du/one-file-system   \
diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
new file mode 100755
index 000..e42bc93
--- /dev/null
+++ b/tests/du/move-dir-while-traversing
@@ -0,0 +1,85 @@
+#!/bin/sh
+# Trigger a failed assertion in coreutils-8.9 and earlier.
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without

Re: [coreutils] Re: Bug#609049: du: process_file: Assertion `level == prev_level - 1' failed

2011-01-09 Thread Jim Meyering

Pádraig Brady wrote:

> On 09/01/11 22:05, Jim Meyering wrote:
>> diff --git a/tests/du/move-dir-while-traversing
>> b/tests/du/move-dir-while-traversing
>
>> +# We use a python-inotify script, so...
>> +python -m pyinotify -h > /dev/null \
>> +  || skip_ 'python-inotify package not installed'
>
> A small point is that error is a bit fedora specific.
> The package is python-pyinotify on debian/ubuntu for example.

Good point.  I've dropped the "-":

  || skip_ 'python inotify package not installed'

>> +from pyinotify import *
>
> I generally don't include everything from a module,
> to keep the namespace clean.
> I'd do instead:
>
> import pyinotify as pn
> import os,sys
> ...

I prefer that, too.

>> +dir = sys.argv[1]
...
>> +print 'started'
>
> The above print is buffered by default.
> As we're using it for synchronization I'd
>
> sys.stdout.write('started\n')
> sys.stdout.flush()

Good catch!

...
>> +nonempty() { test -s start-msg && return 0; sleep $1; }
>> +retry_delay_ nonempty .1 5
>
> I think the above may iterate only once?
> How about:
>
> nonempty() {
>   test -s start-msg || { sleep $1; return 1; }
> }
> retry_delay_ nonempty .1 5 || framework_failure

Yes, that fixes another bug.

>> +# The above delay is insufficient in ~50% of my trials.
>> +# Sometimes, when under heavy load, a parallel "make check" would
>> +# fail this test when sleeping just 0.1 seconds here.
>> +sleep 1
>
> Hopefully this extra sleep is not required now

Bingo.  It does, so far.
Thanks for the advice and fixes.
Here's the incremental I'm testing with:

diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
index e42bc93..fe1615c 100755
--- a/tests/du/move-dir-while-traversing
+++ b/tests/du/move-dir-while-traversing
@@ -21,7 +21,7 @@ print_ver_ du

 # We use a python-inotify script, so...
 python -m pyinotify -h > /dev/null \
-  || skip_ 'python-inotify package not installed'
+  || skip_ 'python inotify package not installed'

 # Move a directory "up" while du is processing its sub-directories.
 # While du is processing a hierarchy .../B/C/D/... this script
@@ -33,12 +33,14 @@ python -m pyinotify -h > /dev/null \

 cat <<'EOF' > inotify-watch-for-dir-access.py
 #!/usr/bin/env python
-from pyinotify import *
+import pyinotify as pn
+import os,sys
+
 dir = sys.argv[1]
 dest_parent = os.path.dirname(os.path.dirname(dir))
 dest = os.path.join(dest_parent, os.path.basename(dir))

-class ProcessDir(ProcessEvent):
+class ProcessDir(pn.ProcessEvent):

 def process_IN_OPEN(self, event):
 os.rename(dir, dest)
@@ -47,10 +49,11 @@ class ProcessDir(ProcessEvent):
 def process_default(self, event):
 pass

-wm = WatchManager()
-notifier = Notifier(wm)
-wm.watch_transient_file(dir, IN_OPEN, ProcessDir)
-print 'started'
+wm = pn.WatchManager()
+notifier = pn.Notifier(wm)
+wm.watch_transient_file(dir, pn.IN_OPEN, ProcessDir)
+sys.stdout.write('started\n')
+sys.stdout.flush()
 notifier.loop()
 EOF
 chmod a+x inotify-watch-for-dir-access.py
@@ -61,14 +64,9 @@ mkdir -p $t/3/a/b/c/$long d2 || framework_failure
 timeout 6 ./inotify-watch-for-dir-access.py $t/3/a/b > start-msg &

 # Wait for the watcher to start...
-nonempty() { test -s start-msg && return 0; sleep $1; }
+nonempty() { test -s start-msg || { sleep $1; return 1; }; }
 retry_delay_ nonempty .1 5

-# The above delay is insufficient in ~50% of my trials.
-# Sometimes, when under heavy load, a parallel "make check" would
-# fail this test when sleeping just 0.1 seconds here.
-sleep 1
-
 # The above watches for an IN_OPEN event on $t/3/a/b,
 # and when it triggers, moves the parent, $t/3/a, up one level
 # so it's directly under $t.

Re: [coreutils] Re: Bug#609049: du: process_file: Assertion `level == prev_level - 1' failed

2011-01-10 Thread Jim Meyering

Jim Meyering wrote:
...
> Here's the incremental I'm testing with:
>
> diff --git a/tests/du/move-dir-while-traversing 
> b/tests/du/move-dir-while-traversing
> index e42bc93..fe1615c 100755

Thanks again for the quick feedback.
I squashed in the changes above and pushed the result.
I also updated to the latest gnulib in a separate commit,
since it passed the tests I ran.

Re: [coreutils] [PATCH] maint: refactor to use read-file from gnulib

2011-01-12 Thread Jim Meyering

Pádraig Brady wrote:
> Removes 100 lines of tricky code.
...
> Subject: [PATCH] maint: refactor to use read-file from gnulib
>
> * bootstrap.conf: Add the read-file module
> * ptx.c: Replace the original code which would
> needlessly read SIZE_MAX bytes of files larger than this.
> * shuf.c: Replace the original code.
> ---
>  bootstrap.conf |1 +
>  src/ptx.c  |   80 +++
>  src/shuf.c |   48 +++--
>  3 files changed, 11 insertions(+), 118 deletions(-)

Nice.  It's not often we can remove so much code like that.

I glanced through it (looks fine), applied it, built
and ran a few smoke tests.

Push at will ;-)

Thanks!

Re: [coreutils] basename/dirname can't handle stdin?

2011-01-12 Thread Jim Meyering

> On 1/12/2011 2:02 PM, Eric Blake wrote:
...
>> But someone has to contribute the patch - are you willing?

Jeff Blaine wrote:
> Sure, I can try at least!

Great!
Here are some coding guidelines:

http://git.sv.gnu.org/cgit/coreutils.git/tree/HACKING

To start with, please look through the copyright assignment
section and start the process (assuming you're game), since
it can take a few weeks:

http://git.sv.gnu.org/cgit/coreutils.git/tree/HACKING#n429

The arg-handling (and --files0-from=FILE handling) loop you'll use
should be very similar to the ones in wc.c, du.c and sort.c.
You'll probably want to use tests that are very similar to
the ones that test the behavior in those tools.

Note this bit at the top of tests/misc/wc-files0-from:

#!/usr/bin/perl
# Exercise wc's --files0-from option.
# FIXME: keep this file in sync with tests/du/files0-from.

bug#7832: [PATCH] doc: minor clean-up

2011-01-12 Thread Jim Meyering

A few bits of HACKING were outdated:

>From d95e3c8cebb98ea26a99a64c6bb1ad1024c38662 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Wed, 12 Jan 2011 21:20:34 +0100
Subject: [PATCH] doc: clean up HACKING guidelines

* HACKING: Remove mention of "indent-tabs-mode: nil", since
we've remove all of those directives.  No longer needed.
Remove dated (pre-emacs-23) reference regarding WhiteSpace mode.
---
 HACKING |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/HACKING b/HACKING
index 2565d15..8933e9a 100644
--- a/HACKING
+++ b/HACKING
@@ -468,10 +468,8 @@ versions of gcc and the linux kernel, and modern GNU tools.
 Ensure that your changes are indented properly.
 ===
 Format the code the way GNU indent does.
-In a file with the "indent-tabs-mode: nil" directive at the end,
-running "indent --no-tabs" should induce no change.
-With other files, there will be some existing differences.
-Try not to add any more.
+Filtering most source files through "indent --no-tabs" should
+induce no change in indentation.  Try not to add any more.


 Avoid trailing white space
@@ -484,13 +482,11 @@ Do not add any more trailing blanks anywhere.  While 
"make syntax-check"
 will alert you if you slip up, it's better to nip any problem in the
 bud, as you're typing.  A good way to help you adapt to this rule is
 to configure your editor to highlight any offending characters in the
-files you edit.  If you use Emacs, customize its font-lock mode (FIXME:
-provide more detail) or try one of its whitespace packages.  This appears
-to be the one that will end up in emacs 23:
+files you edit.  If you use Emacs, customize its font-lock mode
+or use its WhiteSpace mode:

 http://www.emacswiki.org/emacs/WhiteSpace

-[that page says its version also works with emacs 21 and 22]
 If you use vim, add this to ~/.vimrc:

 let c_space_errors=1
--
1.7.3.5

Re: [coreutils] basename/dirname can't handle stdin?

2011-01-12 Thread Jim Meyering

Jeff Blaine wrote:
> Eric, now that I've got the prereqs in place,
> built coreutils from git successfully, and read the
> coreutils HACKING file, let's make sure I am on your
> page:
>
> In no order:
>
> * Snarf getopt_long arg processing from wc.c or other
>   you mentioned

getopt_long?  There will be some of that (for the new options),
but most of the new code will be to process stdin, and you can
use code nearly identical to what wc uses:

  int i;
  ok = true;
  for (i = 0; /* */; i++)
{
  bool skip_file = false;
  enum argv_iter_err ai_err;
  char *file_name = argv_iter (ai, &ai_err);
  ...

The preceding set-up code can be mostly-reused, too,

  bool read_tokens = false;
  struct argv_iterator *ai;
  if (files_from)
{

with the exception that the "else"-block that sets up
to get names from argv.  I.e., while wc, du, sort can take
many arguments from argv, basename cannot.

> * Add --filter (combined with the lack of file args) to handle
>   processing of stdin
> * Add --files-from0 per wc.c code (found it there)

it's --files0-from

> * Add --suffix [suffix] to enable suffix killing (using
>   SUFFIX as the 2nd arg will no longer work going forward)
> * Update usage() obviously
>
> Correct?  Anything else?

  * Update NEWS to mention the new functionality.
  * Update doc/coreutils.texi
  * add tests to exercise the new behavior (as I mentioned earlier)

Re: [coreutils] join feature: auto-format

2011-01-12 Thread Jim Meyering

Pádraig Brady wrote:

> On 12/01/11 18:04, Assaf Gordon wrote:
>> Pádraig Brady wrote, On 01/11/2011 07:35 AM:
>>>
>>> Spending another few minutes on this, I realized
>>> that we should not be trying to homogenize the number
>>> of fields from each file, but rather the fields used
>>> for a particular file in each line. The only sensible
>>> basis for that is the first line as previously suggested.
>>>
>>> The interface would be a little different for that.
>>> I was thinking of:
>>>
>>>   -o 'header'  Infer the format from the first line of each file
>>>
>> I second the idea of using the first line as the basis for the 
>> auto-formatting,
>> but have reservation about the wording: '-o header' somewhat implies
>> that the first line has to be an actual header line (with column
>> names or similar), while it can just be the first line of actual
>> data if the file doesn't have a header line.
>>
>> Something like '-o auto' might be less confusing.
>
> But less descriptive.
> I'll go with "auto" as I can't think of anything better.

-o auto is indeed more apt.
Thanks for the suggestion.

Re: [coreutils] basename/dirname can't handle stdin?

2011-01-13 Thread Jim Meyering

Philipp Thomas wrote:

> * Jim Meyering (j...@meyering.net) [20110113 08:53]:
>
>> > * Add --suffix [suffix] to enable suffix killing (using
>> >   SUFFIX as the 2nd arg will no longer work going forward)
>
> Using SUFFIX as the second argument IMO has to keep working as otherwise
> you'll break tons of scripts!

What I think he meant is that we'll need a --suffix=SUFF
option to specify a suffix, since when in --filter mode,
each input will be treated as a file/dir name.

Whereas, on the command line, we'll still accept only 1 or 2 args.

Re: [coreutils] basename/dirname can't handle stdin?

2011-01-13 Thread Jim Meyering

Eric Blake wrote:
> On 01/13/2011 08:03 AM, Jim Meyering wrote:
>> Philipp Thomas wrote:
>>
>>> * Jim Meyering (j...@meyering.net) [20110113 08:53]:
>>>
>>>>> * Add --suffix [suffix] to enable suffix killing (using
>>>>>   SUFFIX as the 2nd arg will no longer work going forward)
>>>
>>> Using SUFFIX as the second argument IMO has to keep working as otherwise
>>> you'll break tons of scripts!
>>
>> What I think he meant is that we'll need a --suffix=SUFF
>> option to specify a suffix, since when in --filter mode,
>> each input will be treated as a file/dir name.
>>
>> Whereas, on the command line, we'll still accept only 1 or 2 args.
>
> BSD basename accepts more than 2 args.

I missed the addition of -a to the scope, but agree that it is worth adding.
The patch to add that new option belongs in a separate change-set, of course.

> basename arg1 arg2 arg3 - no suffix, and print out three lines
>
> That is, the suffix is available if either:
> you pass exactly two args, and -a is not used
> you pass the -s/--suffix option
>
> (well, the long option spelling of --suffix would be new to GNU, but
> makes sense).
>
> Likewise, BSD basename accepts -a (how about --all as the long option
> spelling) to override 2-argument becoming a suffix situation.

Sure.

Re: [coreutils] [PATCH] maint: trivial system header file cleanups

2011-01-13 Thread Jim Meyering

Pádraig Brady wrote:
>>From e1aaf8903db97f3240b1551fd6936ccdc652dfc8 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
> Date: Thu, 13 Jan 2011 09:36:38 +
> Subject: [PATCH] maint: trivial system header file cleanups
>
> * src/system.h: Note where it should be included, and
> make ordering check portable to GLIBC > 2
> * src/copy.c: Move  along with other system headers
> as is done elsewhere.
> * src/install.c: Move  along with other system headers
> as is done elsewhere.
> * src/ptx.c: Include  rather than "regex.h" as
> is done elsewhere.  Note  is kept after "system.h"
> as per commit dba300a0.

Those look fine.  Thanks.

...
> diff --git a/src/system.h b/src/system.h
> index 3b32cbd..b86e570 100644
> --- a/src/system.h
> +++ b/src/system.h
> @@ -14,11 +14,15 @@
> You should have received a copy of the GNU General Public License
> along with this program.  If not, see .  */
>
> +/* Include this file _after_ system headers if possible.  */
> +
>  #include 
>
> -/* Include sys/types.h before this file.  */
> +/* Include  before this file.
> +   Note this doesn't warn if we're included
> +   before all system headers.  */
>
> -#if 2 <= __GLIBC__ && 2 <= __GLIBC_MINOR__
> +#if 2 < __GLIBC__ || ( 2 == ___GLIBC__ && 2 <= __GLIBC_MINOR__ )

Good one.

[coreutils] might as well release yet again

2011-01-14 Thread Jim Meyering

I'm thinking of making yet another coreutils release.
The du bug isn't that big a deal, but still.
I prefer not to mix bug fixes and feature additions,
and a few of the latter are coming.

Does anyone have patches that would fit in a bug-fix release?

Re: [coreutils] might as well release yet again

2011-01-14 Thread Jim Meyering

Pádraig Brady wrote:
> On 14/01/11 11:22, Jim Meyering wrote:
>> I'm thinking of making yet another coreutils release.
>> The du bug isn't that big a deal, but still.
>> I prefer not to mix bug fixes and feature additions,
>> and a few of the latter are coming.
>
> I'd be inclined not to bother with a new release
> unless we can have more consequential fixes.

Ok.  I'm happy to put off the release for a couple weeks,
and to include features like your join auto-format.
Besides, there's Eric's sort vs multi-thread safety point.

>> Does anyone have patches that would fit in a bug-fix release?
...
> I noticed a small issue this morning.
> Assaf, the following warning shouldn't be printed, right?
> The attached patch addresses that.
>
> printf 'test\n1\n' | join --header -a1 - /dev/null
> test
> join: file 1 is not in sorted order
> 1

Good catch.

> Subject: [PATCH] join: ensure --header skips the order check with empty files
>
> * src/join.c: Skip the header even if one of the files is empty.
> * tests/misc/join: Add a test case.
> * NEWS: Mention the fix
> ---
>  NEWS|3 +++
>  src/join.c  |   12 
>  tests/misc/join |6 ++
>  3 files changed, 17 insertions(+), 4 deletions(-)

That looks perfect.  Thanks!

[coreutils] [PATCH] doc: show how to shred using a single zero-writing pass

2011-01-17 Thread Jim Meyering

I have just wiped a 250GB drive and didn't want to wait for
the default 3-pass process.  One pass is slow enough.
Having to use -n0 --zero seemed sufficiently un-obvious that
I added this example to the manual:

>From 7dc6335653afcdad9a3ffa327877571734644285 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 17 Jan 2011 11:32:35 +0100
Subject: [PATCH] doc: show how to shred using a single zero-writing pass

* doc/coreutils.texi (shred invocation): Give an example showing how
to invoke shred in its most basic (fastest) write-only-zeros mode.
---
 doc/coreutils.texi |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 9c3e2ed..8fb9f0c 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8892,6 +8892,15 @@ shred invocation
 shred --verbose /dev/sda5
 @end example

+On modern disks, a single pass that writes only zeros may be enough,
+and it will be much faster than the default.
+Use a command like this to tell @command{shred} to skip all random
+passes and to perform only a final zero-writing pass:
+
+@example
+shred --verbose -n0 --zero /dev/sda5
+@end example
+
 A @var{file} of @samp{-} denotes standard output.
 The intended use of this is to shred a removed temporary file.
 For example:
--
1.7.3.5

Re: [coreutils] [PATCH] uniq: don't continue field processing after end of line

2011-01-17 Thread Jim Meyering

Pádraig Brady wrote:

> On 16/01/11 23:53, Sami Kerola wrote:
>> Hi,
>>
>> I notice uniq -f 'insane_large_number' will make process to be busy
>> long time without good reason. Attached patch should fix this symptom.
>
> I'd slightly amend that to the following,
> to match the other limit checks in the function.
>
> diff --git a/src/uniq.c b/src/uniq.c
> index 7bdbc4f..9c7e37c 100644
> --- a/src/uniq.c
> +++ b/src/uniq.c
> @@ -214,7 +214,7 @@ find_field (struct linebuffer const *line)
>size_t size = line->length - 1;
>size_t i = 0;
>
> -  for (count = 0; count < skip_fields; count++)
> +  for (count = 0; count < skip_fields && i < size; count++)
>  {
>while (i < size && isblank (to_uchar (lp[i])))

Thank you!
I've also adjusted NEWS.
Here's your adjusted patch, followed by another to add a test
to exercise the bug/fix:

>From bf0ed321332b01fc38eb892d1deac16629aea07c Mon Sep 17 00:00:00 2001
From: Sami Kerola 
Date: Mon, 17 Jan 2011 00:27:06 +0100
Subject: [PATCH] uniq: don't continue field processing after end of line

* NEWS (Bug fixes): Mention it.
* src/uniq.c (find_field): Stop processing loop when end of line
is reached.  Before this fix, 'uniq -f 100 /etc/passwd'
would run for a very long time.
---
 NEWS   |2 ++
 src/uniq.c |2 +-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/NEWS b/NEWS
index 9ccad63..3ec35c7 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,8 @@ GNU coreutils NEWS-*- 
outline -*-
   rm -f no longer fails for EINVAL or EILSEQ on file systems that
   reject file names invalid for that file system.

+  uniq -f NUM no longer tries to process fields after end of line.
+

 * Noteworthy changes in release 8.9 (2011-01-04) [stable]

diff --git a/src/uniq.c b/src/uniq.c
index 7bdbc4f..9c7e37c 100644
--- a/src/uniq.c
+++ b/src/uniq.c
@@ -214,7 +214,7 @@ find_field (struct linebuffer const *line)
   size_t size = line->length - 1;
   size_t i = 0;

-  for (count = 0; count < skip_fields; count++)
+  for (count = 0; count < skip_fields && i < size; count++)
 {
   while (i < size && isblank (to_uchar (lp[i])))
 i++;
--
1.7.3.5


>From 660d57085140c86387f84eb508f4e15caa972bee Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 17 Jan 2011 12:27:55 +0100
Subject: [PATCH] tests: add a test for today's uniq bug

* tests/misc/uniq-perf: New file.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am|1 +
 tests/misc/uniq-perf |   25 +
 2 files changed, 26 insertions(+), 0 deletions(-)
 create mode 100755 tests/misc/uniq-perf

diff --git a/tests/Makefile.am b/tests/Makefile.am
index a5dbd3e..1e4e300 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -286,6 +286,7 @@ TESTS = \
   misc/tty-eof \
   misc/unexpand\
   misc/uniq\
+  misc/uniq-perf   \
   misc/xattr   \
   tail-2/wait  \
   chmod/c-option   \
diff --git a/tests/misc/uniq-perf b/tests/misc/uniq-perf
new file mode 100755
index 000..f000e76
--- /dev/null
+++ b/tests/misc/uniq-perf
@@ -0,0 +1,25 @@
+#!/bin/sh
+# before coreutils-8.10, seq 10|uniq -f 100 would run for days
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ uniq
+
+seq 100 > in || fail=1
+timeout 1 uniq -f 100 in || fail=1
+
+Exit $fail
--
1.7.3.5

[coreutils] uniq: minor code/perf improvement

2011-01-17 Thread Jim Meyering

I noticed another wasteful loop in that same function.
The loop can be replaced with a simple "MIN(...)" expression, and that gives
a 10% performance improvement for very large --skip-chars offsets:

$ (n=10; printf "a%0${n}d0\n" 0;printf "b%0${n}d1\n" 0) \
  | env time --f=%e ./uniq -s$n > /dev/null
6.03

$ (n=10; printf "a%0${n}d0\n" 0;printf "b%0${n}d1\n" 0) \
  | env time --f=%e uniq -s$n > /dev/null
6.80


>From 9aa02d21f75c3c0dd14b4e406ebd0dff0c75504e Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 17 Jan 2011 12:36:58 +0100
Subject: [PATCH] uniq: replace a wasteful loop with simple calculation

* src/uniq.c (find_field): Remove the byte-skipping loop altogether.
Instead, perform the simple calculation.  This results in a 10%
performance improvement for large byte offsets.
---
 src/uniq.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/src/uniq.c b/src/uniq.c
index 9c7e37c..b35938a 100644
--- a/src/uniq.c
+++ b/src/uniq.c
@@ -222,8 +222,7 @@ find_field (struct linebuffer const *line)
 i++;
 }

-  for (count = 0; count < skip_chars && i < size; count++)
-i++;
+  i += MIN (skip_chars, size - i);

   return line->buffer + i;
 }
--
1.7.3.5

Re: [coreutils] [PATCH] doc: show how to shred using a single zero-writing pass

2011-01-17 Thread Jim Meyering

Pádraig Brady wrote:
> On 17/01/11 10:36, Jim Meyering wrote:
>>>From 7dc6335653afcdad9a3ffa327877571734644285 Mon Sep 17 00:00:00 2001
>> From: Jim Meyering 
>> Date: Mon, 17 Jan 2011 11:32:35 +0100
>> Subject: [PATCH] doc: show how to shred using a single zero-writing pass
>>
>> * doc/coreutils.texi (shred invocation): Give an example showing how
>> to invoke shred in its most basic (fastest) write-only-zeros mode.
>> ---
>>  doc/coreutils.texi |9 +
>>  1 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
>> index 9c3e2ed..8fb9f0c 100644
>> --- a/doc/coreutils.texi
>> +++ b/doc/coreutils.texi
>> @@ -8892,6 +8892,15 @@ shred invocation
>>  shred --verbose /dev/sda5
>>  @end example
>>
>> +On modern disks, a single pass that writes only zeros may be enough,
>> +and it will be much faster than the default.
>
> Well only 3 times, due to the disk being the bottleneck
> (since we changed to the fast internal PRNG by default).
> Also for security, writing random data would probably be more effective.
> So I'd reword the above sentence to:
>
> "To simply clear a disk"

Regarding "only 3x" I agree that 3x is better than the 25x difference
when comparing with the default from coreutils-7.0 and earlier.
However, when it takes two hours with "-n0 --zero", using
the default would add 4 hours.

Interestingly, the difference is sometimes far greater than the expected 3x:
(this is with F14/ext4 on an i7-960 and an 60GB OCZ vertex-2 SSD)

$ seq 1000 > k
$ env time --f=%e shred -n0 --zero k
0.36
$ env time --f=%e shred k
3.75

Surprisingly, even with just -n1, it's still nearly 3x:
(I expected this to have minimal performance difference with
the -n0 --zero run):

$ env time --f=%e shred -n1 k
0.95

Let's compare syscalls (the only difference should be PRNG-related)

$ strace -c shred -n1 k
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 96.340.052992   52992 1   fdatasync
  2.720.001496   0  6421 1 write
  0.940.000518 259 2   read
  0.000.00   0 4   open
  0.000.00   0 6   close
  0.000.00   0 3   fstat
  0.000.00   0 1   lseek
  0.000.00   0 7   mmap
  0.000.00   0 3   mprotect
  0.000.00   0 1   munmap
  0.000.00   0 4   brk
  0.000.00   0 1 1 access
  0.000.00   0 1   execve
  0.000.00   0 4   fcntl
  0.000.00   0 1   arch_prctl
-- --- --- - - 
100.000.055006  6460 2 total

$ strace -c shred -n0 --zero k
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 96.570.053991   53991 1   fdatasync
  3.430.001916   0  6421 1 write
  0.000.00   0 2   read
  0.000.00   0 4   open
  0.000.00   0 6   close
  0.000.00   0 3   fstat
  0.000.00   0 1   lseek
  0.000.00   0 7   mmap
  0.000.00   0 3   mprotect
  0.000.00   0 1   munmap
  0.000.00   0 4   brk
  0.000.00   0 1 1 access
  0.000.00   0 1   execve
  0.000.00   0 4   fcntl
  0.000.00   0 1   arch_prctl
-- --- --- - - 
100.000.055907  6460 2 total

Odd... no difference in CPU time or syscall counts.

I wonder if the SSD is doing something special with blocks of all zeros,
which is reminiscent of The Reg's article:
http://www.theregister.co.uk/2011/01/14/ocz_and_ddrdrive_performance_row/

>> +Use a command like this to tell @command{shred} to skip all random
>> +passes and to perform only a final zero-writing pass:
>> +
>> +@example
>> +shred --verbose -n0 --zero /dev/sda5
>> +@end example
>
> It's probably not worth noting the equivalent:
> dd conv=fdatasync bs=2M < /dev/zero > /dev/sda5

I like shred's --verbose %-completion indicator, though even with dd,
you might be able to get a similar progress indicator using
the phantom-progress.bash script here:

  http://blog.ksplice.com/2011/01/solving-problems-with-proc/

Re: [coreutils] [PATCH] doc: show how to shred using a single zero-writing pass

2011-01-18 Thread Jim Meyering

Jim Meyering wrote:
...
> Odd... no difference in CPU time or syscall counts.
>
> I wonder if the SSD is doing something special with blocks of all zeros,
> which is reminiscent of The Reg's article:
> http://www.theregister.co.uk/2011/01/14/ocz_and_ddrdrive_performance_row/

I've redone the doc change accordingly:

>From 0f9cf6b545aa07d552538735182007ae240d6071 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 17 Jan 2011 11:32:35 +0100
Subject: [PATCH] doc: show how to shred more efficiently

* doc/coreutils.texi (shred invocation): Give an example showing how
to invoke shred in single-pass mode, and warn that -n0 --zero may
be inadequate.
---
 doc/coreutils.texi |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 9c3e2ed..8a1b3b6 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8892,6 +8892,20 @@ shred invocation
 shred --verbose /dev/sda5
 @end example

+On modern disks, a single pass should be adequate,
+and it will take one third the time of the default three-pass approach.
+
+@example
+# 1 pass, write pseudo-random data; 3x faster than the default
+shred --verbose -n1 /dev/sda5
+@end example
+
+To be on the safe side, use at least one pass that overwrites using
+pseudo-random data.  I.e., don't be tempted to use @samp{-n0 --zero},
+in case some disk controller optimizes the process of writing blocks
+of all zeros, and thereby does not clear all bytes in a block.
+Some SSDs may do just that.
+
 A @var{file} of @samp{-} denotes standard output.
 The intended use of this is to shred a removed temporary file.
 For example:
--
1.7.3.5

Re: [coreutils] [PATCH] uniq: don't continue field processing after end of line

2011-01-18 Thread Jim Meyering

Jim Meyering wrote:
> Pádraig Brady wrote:
>
>> On 16/01/11 23:53, Sami Kerola wrote:
>>> Hi,
>>>
>>> I notice uniq -f 'insane_large_number' will make process to be busy
>>> long time without good reason. Attached patch should fix this symptom.
>>
>> I'd slightly amend that to the following,
>> to match the other limit checks in the function.
>>
>> diff --git a/src/uniq.c b/src/uniq.c
>> index 7bdbc4f..9c7e37c 100644
>> --- a/src/uniq.c
>> +++ b/src/uniq.c
>> @@ -214,7 +214,7 @@ find_field (struct linebuffer const *line)
>>size_t size = line->length - 1;
>>size_t i = 0;
>>
>> -  for (count = 0; count < skip_fields; count++)
>> +  for (count = 0; count < skip_fields && i < size; count++)
>>  {
>>while (i < size && isblank (to_uchar (lp[i])))
>
> Thank you!
> I've also adjusted NEWS.
> Here's your adjusted patch, followed by another to add a test
> to exercise the bug/fix:
>
> Subject: [PATCH] uniq: don't continue field processing after end of line
>
> * NEWS (Bug fixes): Mention it.
> * src/uniq.c (find_field): Stop processing loop when end of line
> is reached.  Before this fix, 'uniq -f 100 /etc/passwd'
> would run for a very long time.
> ---
>  NEWS   |2 ++
>  src/uniq.c |2 +-
>  2 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 9ccad63..3ec35c7 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -13,6 +13,8 @@ GNU coreutils NEWS-*- 
> outline -*-
>rm -f no longer fails for EINVAL or EILSEQ on file systems that
>reject file names invalid for that file system.
>
> +  uniq -f NUM no longer tries to process fields after end of line.

I pushed that, then realized that we hadn't noted the origin
of the bug:

>From 34ece0b0a1e19f1faef8edfe3d59c5a2f5209369 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 18 Jan 2011 09:45:00 +0100
Subject: [PATCH] doc: update NEWS

* NEWS: Note when the uniq bug was introduced.
It was mine, commit 1d9b3de9, "uniq: remove redundant test".
---
 NEWS |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index 3ec35c7..e420cd9 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,7 @@ GNU coreutils NEWS-*- 
outline -*-
   reject file names invalid for that file system.

   uniq -f NUM no longer tries to process fields after end of line.
+  [bug introduced in coreutils-7.0]


 * Noteworthy changes in release 8.9 (2011-01-04) [stable]
--
1.7.3.5

[coreutils] [PATCH] tests: avoid FP failure in new du test

2011-01-18 Thread Jim Meyering

I noticed that the new du test would occasionally fail
because du would complete too quickly.
Making the tree about ten times larger solved the problem:
(5x was not enough; see comments below)

>From 7e72d4fe9da159d42efe0845b0f36e2ea9d7c3b1 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 18 Jan 2011 22:32:33 +0100
Subject: [PATCH] tests: avoid FP failure in new du test

* tests/du/move-dir-while-traversing: Create a larger tree to
avoid a false-positive failure due to du terminating before
the rename is triggered.
---
 tests/du/move-dir-while-traversing |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
index fe1615c..9605831 100755
--- a/tests/du/move-dir-while-traversing
+++ b/tests/du/move-dir-while-traversing
@@ -58,9 +58,16 @@ notifier.loop()
 EOF
 chmod a+x inotify-watch-for-dir-access.py

-long=d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
 t=T/U
-mkdir -p $t/3/a/b/c/$long d2 || framework_failure
+mkdir d2 || framework_failure
+long=d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
+# One iteration of this loop creates a tree with which
+# du sometimes completes its traversal before the above rename.
+# Even 5 iterations was not enough in 2 of 7 "make -j20 check" runs on a
+# 6/12-core system.  However, using "10", I saw no failure in 20 trials.
+for i in $(seq 10); do
+  mkdir -p $t/3/a/b/c/$i/$long || framework_failure
+done
 timeout 6 ./inotify-watch-for-dir-access.py $t/3/a/b > start-msg &

 # Wait for the watcher to start...
--
1.7.3.5

[coreutils] [PATCH] maint: use slightly more efficient process in README-release

2011-01-18 Thread Jim Meyering

This makes the release process more efficient on a multi-core system:

>From 948828b208e4cd59ceb903afbb46f55c09d51e18 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 4 Jan 2011 12:47:24 +0100
Subject: [PATCH] maint: use slightly more efficient process in README-release

* README-release: Run cheaper root-only tests first.
Use half of processing units (not just 1) for the expensive tests.
---
 README-release |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/README-release b/README-release
index b2bd34c..729ab89 100644
--- a/README-release
+++ b/README-release
@@ -29,13 +29,15 @@ FIXME: enable excluded programs like arch? to get their 
manual pages?
   Run the following on at least one SELinux-enabled (enforcing) and
   one non-SELinux system:

-make distcheck
-make -j1 check RUN_VERY_EXPENSIVE_TESTS=yes RUN_EXPENSIVE_TESTS=yes
-sudo env PATH="$PATH" NON_ROOT_USERNAME=$USER make -k check-root
-
-  Note the -j1 above.  If you use -jN, for larger N, some of the expensive
-  tests are likely to interfere with concurrent performance-measuring or
-  timing-sensitive tests, resulting in spurious failures.
+n=$(( ($(nproc) + 1) / 2 ))
+sudo env PATH="$PATH" NON_ROOT_USERNAME=$USER make -k -j$(nproc) 
check-root\
+  && make distcheck \
+  && make -j$n check RUN_VERY_EXPENSIVE_TESTS=yes RUN_EXPENSIVE_TESTS=yes
+
+  Note that the use of -j$n tells make to use approximately half of the
+  available processing units.  If you use -jN, for larger N, some of the
+  expensive tests are likely to interfere with concurrent performance-measuring
+  or timing-sensitive tests, resulting in spurious failures.

   If "make distcheck" doesn't run "make syntax-check" for you, then run
   it manually:
--
1.7.3.5

[coreutils] Re: gnulib/coreutils: Error on coreutils git submodule update

2011-01-20 Thread Jim Meyering

Rob Vermaas wrote:
> Hi Jim,
>
> sorry for mailing you directly, but I'm having trouble getting the
> latest coreutils code. I get this error on 'git submodule update':
>
> fatal: reference is not a tree: 568779402439589832419c90e3f8193496a39613
> Unable to checkout '568779402439589832419c90e3f8193496a39613' in
> submodule path 'gnulib'
>
> This has to do with this commit:
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blobdiff;f=gnulib;h=568779402439589832419c90e3f8193496a39613;hp=104aca8300d01a80889f70f41a4dce7e044f9476;hb=5c75d1da85d8da86e7a0083d4fbbd8d4de50305b;hpb=34ece0b0a1e19f1faef8edfe3d59c5a2f5209369
>
> Somehow the commit is not in the gnulib git repository. Any idea
> what's going on there?

Hi Rob,

Thanks for the heads-up.
That is indeed my mistake.

I ran "git syncsub" vs. a local gnulib repository
in which I had committed-but-not-pushed a change.

We now have a public-submodule-commit make target in the gnulib-provided
maint.mk, and it checks for just that condition.  However, it is currently
run only at release time.  This makes it apparent that we should check
this all the time...  At least via "make distcheck", and probably via a
server-side hook to prevent the offending "push" as well.

I've just updated (careful to use a public commit,
not the mountlist pair that is waiting for an ACK ;-)
and pushed the fix:

>From 634440238940bd4b216af62c88658073a84e69ec Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Thu, 20 Jan 2011 10:00:42 +0100
Subject: [PATCH] build: update gnulib submodule to latest

The previous gnulib submodule reference was to a non-public commit.
Reported by Rob Vermaas.
---
 gnulib |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gnulib b/gnulib
index 5687794..680c7ff 16
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 568779402439589832419c90e3f8193496a39613
+Subproject commit 680c7ff005a2eb96b0903e5200cc28de273555c7
--
1.7.3.5

[coreutils] Re: gnulib/coreutils: Error on coreutils git submodule update

2011-01-20 Thread Jim Meyering

Rob Vermaas wrote:

> Hi Jim,
>
>> -Subproject commit 568779402439589832419c90e3f8193496a39613
>> +Subproject commit 680c7ff005a2eb96b0903e5200cc28de273555c7
>
> Did you make the same mistake again? or is it me this time?
>
> fatal: reference is not a tree: 680c7ff005a2eb96b0903e5200cc28de273555c7
> Unable to checkout '680c7ff005a2eb96b0903e5200cc28de273555c7' in
> submodule path 'gnulib'

Hmm...

I was careful to put my local gnulib repo into the right state,
then ran "git syncsub" in coreutils, committed, and pushed.

However, I now see that is not always adequate:
"git syncsub" did not do what I expected, and I did
not rerun "make public-submodule-commit" to verify.
It still fails.

The problem is that the gnulib submodule under my coreutils
build directory had a stray commit, and that was causing each syncsub
to do a non-ff update and thus to create a merge commit.

Thanks for the quick feedback!

I've just confirmed that this does solve the problem,
and *then* pushed it.

>From e0c6272ac38a8eaba49b31adb4415e31159b6dd4 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Thu, 20 Jan 2011 10:50:11 +0100
Subject: [PATCH] build: update gnulib submodule to latest

The previous gnulib submodule reference was *still* to a
non-public commit.  My submodule had a stray commit, so
the reference was always to a local merge commit.
Reported by Rob Vermaas.
---
 gnulib |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gnulib b/gnulib
index 680c7ff..ff4bb04 16
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 680c7ff005a2eb96b0903e5200cc28de273555c7
+Subproject commit ff4bb04bec0c8d7834bfb0590d6de08fa0ceefd0
--
1.7.3.5

Re: [coreutils] Re: gnulib/coreutils: Error on coreutils git submodule update

2011-01-20 Thread Jim Meyering

Eric Blake wrote:

> On 01/20/2011 02:56 AM, Jim Meyering wrote:
>> However, I now see that is not always adequate:
>> "git syncsub" did not do what I expected, and I did
>> not rerun "make public-submodule-commit" to verify.
>> It still fails.
>
> Can you configure branch.master.rebase = true in your gnulib submodule
> to avoid creating merge commits?

Thanks.
I'll try that.

> Meanwhile, I've just modified my ~/.gitconfig to alter my syncsub alias
> to make it more apparent to me if this situation happens:
>
> [alias]
> syncsub = git submodule foreach git --ff-only origin master
>
> Note the addition of --ff-only.

You must mean this (i.e., don't remove the "pull"):

syncsub = git submodule foreach git pull --ff-only origin master

[coreutils] two patches required for fedora rawhide

2011-01-24 Thread Jim Meyering

I built on a just-updated rawhide system and encountered two problems.
First, split got a new warning about a used-uninitialized "files" variable.

  split.c: In function 'main':
  split.c:663:9: error: 'files' may be used uninitialized in this function \
[-Werror=uninitialized]

That's spurious, so I added an IF_LINT initializer.

The next was a test failure due to fstatfs now failing with ENOSYS
(presumably due to the fact this is with linux-2.3.38 RC)
when applied to an input file descriptor that's open on a pipe.
This failure caused a diagnostic that made a test fail.
Since the fstatfs call is solely to determine whether the input
file is remote (not appropriate for inotify), we can treat this
particular failure (ENOSYS) just like the absence of inotify support.

>From 15ea577af7f04c0e9ddbd88fa72768e79a933b60 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 24 Jan 2011 09:38:40 +0100
Subject: [PATCH 2/2] split: avoid a new, spurious warning from gcc-4.6.0

* src/split.c (lines_rr) [IF_LINT]: Initialize files, now that
rawhide's gcc-4.6.0 would otherwise warn about use-uninitialized.
---
 src/split.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/split.c b/src/split.c
index c11c032..364576a 100644
--- a/src/split.c
+++ b/src/split.c
@@ -660,7 +660,7 @@ lines_rr (uintmax_t k, uintmax_t n, char *buf, size_t 
bufsize)
 {
   bool file_limit;
   size_t i_file;
-  of_t *files;
+  of_t *files IF_LINT (= NULL);
   uintmax_t line_no;

   if (k)
--
1.7.3.5.38.gb312b

>From 8264fc615a1beb82e062949e5cf17c852fcfaaaf Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 24 Jan 2011 09:37:10 +0100
Subject: [PATCH 1/2] tail: avoid new diagnostic when applying -f to a pipe on 
linux-2.3.38

* src/tail.c (fremote): Do not print a diagnostic when
fstatfs (pipe_FD, &buf) fails, as it now does on linux-2.3.38.
This avoids the spurious failure of tests/misc/tail's f-pipe-1
test, when running in input-from-pipe mode.
---
 src/tail.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/tail.c b/src/tail.c
index 9682a53..bcd2d99 100644
--- a/src/tail.c
+++ b/src/tail.c
@@ -887,8 +887,11 @@ fremote (int fd, const char *name)
   int err = fstatfs (fd, &buf);
   if (err != 0)
 {
-  error (0, errno, _("cannot determine location of %s. "
- "reverting to polling"), quote (name));
+  /* On at least linux-2.6.38, fstatfs fails with ENOSYS when FD
+ is open on a pipe.  Treat that like a remote file.  */
+  if (errno != ENOSYS)
+error (0, errno, _("cannot determine location of %s. "
+   "reverting to polling"), quote (name));
 }
   else
 {
--
1.7.3.5.38.gb312b

[coreutils] [PATCH 1/2] tests: avoid rare FP failure in new du test

2011-01-25 Thread Jim Meyering

I saw this test fail again.
Tripling the size of the tree seems to be enough...

While waiting for 200 trials to complete I suspended the job,
and upon resumption, one iteration failed.  The second patch
prevents that sort of problem by making the test script ignore
TSTP and STOP signals.

>From 5c501f6fba5c1e2c4b45ce3f44b2da58d559ebe7 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 25 Jan 2011 11:09:27 +0100
Subject: [PATCH 1/2] tests: avoid rare FP failure in new du test

* tests/du/move-dir-while-traversing: Create an even larger tree
to avoid a false-positive failure due to du terminating before
the rename is triggered.
---
 tests/du/move-dir-while-traversing |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
index 9605831..c711e02 100755
--- a/tests/du/move-dir-while-traversing
+++ b/tests/du/move-dir-while-traversing
@@ -63,9 +63,11 @@ mkdir d2 || framework_failure
 long=d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
 # One iteration of this loop creates a tree with which
 # du sometimes completes its traversal before the above rename.
-# Even 5 iterations was not enough in 2 of 7 "make -j20 check" runs on a
+# Five iterations was not enough in 2 of 7 "make -j20 check" runs on a
 # 6/12-core system.  However, using "10", I saw no failure in 20 trials.
-for i in $(seq 10); do
+# Using 10 iterations was not enough, either.
+# Using 30, I saw no failure in 200 trials.
+for i in $(seq 30); do
   mkdir -p $t/3/a/b/c/$i/$long || framework_failure
 done
 timeout 6 ./inotify-watch-for-dir-access.py $t/3/a/b > start-msg &
--
1.7.3.5.38.gb312b


>From 56699864f63fe69259d4a5c1f94498762d8df78d Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 25 Jan 2011 12:36:22 +0100
Subject: [PATCH 2/2] tests: avoid FP failure due to suspension

* tests/du/move-dir-while-traversing: Prohibit suspension,
to avoid false-positive failure.
---
 tests/du/move-dir-while-traversing |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
index c711e02..2a60d6e 100755
--- a/tests/du/move-dir-while-traversing
+++ b/tests/du/move-dir-while-traversing
@@ -70,6 +70,10 @@ long=d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
 for i in $(seq 30); do
   mkdir -p $t/3/a/b/c/$i/$long || framework_failure
 done
+
+# Prohibit suspension, which could otherwise cause a timeout-induced FP 
failure.
+trap '' STOP TSTP
+
 timeout 6 ./inotify-watch-for-dir-access.py $t/3/a/b > start-msg &

 # Wait for the watcher to start...
--
1.7.3.5.38.gb312b

bug#6131: [PATCH]: fiemap support for efficient sparse file copy

2011-01-25 Thread Jim Meyering

jeff.liu wrote:
> Jim Meyering wrote:
>> jeff.liu wrote:
>>> AFAICS, the tests passed on all filesystems except ext4,
>>
>> Really?
>> The vast majority of my testing is with ext4 on Fedora 14, and I have seen
>> no failure -- otherwise I would have mentioned that as a known problem.
>
> I have mentioned this issue at:
> http://osdir.com/ml/bug-coreutils-gnu/2010-09/msg00092.html
>
> "make test against cp/sparse-fiemap failed at the extent compare
> stage, but the file content is
> identical to each other by comparing those two files "j1/j2" manually.
> Is it make sense if we verify them through diff(1) since the testing
> file is in small size?"

No.  The whole point of the test is to verify that the extents have
been preserved in the copy.  Diff doesn't know about extents.

>> What type of system/kernel are you using?
> 2.6.33-RC3 && 2.6.36
>> Was your ext4 partition created long ago?  With what options?
> fiemap copy works well if run `cp' against physical ext4 partition.
>> Did "make check" fail?  If so, please provide details.
> Yeah, I will show the detail of 'make check' at below.

What version of filefrag are you using?
Mine comes from e2fsprogs-1.41.12-6.fc14.x86_64

> btw, I just checked out the new branch and tried to compile it but ran
> into an error:
> date.c:30:28: error: parse-datetime.h: No such file or directory
> date.c: In function 'batch_convert':
> date.c:284: warning: implicit declaration of function 'parse_datetime'
>
> I guess 'parse-datetime.h' is shipped with gnulib? For now, I can not
> pull the latest gnulib code
> since the remote host does not response.

Did you run ./bootstrap ?
That is a requirement whenever the coreutils
pulls in a change to the gnulib submodule.

> For a quick reply, I ran 'make check' against the previous code base
> before your latest commit.
>
> sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes
...
> + filefrag -v j2
...
> + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
> @a and @b have different lengths, even after adjustment
> + fail=1

[coreutils] Re: [PATCH 1/2] tests: avoid rare FP failure in new du test

2011-01-25 Thread Jim Meyering

Andreas Schwab wrote:

>> +# Prohibit suspension, which could otherwise cause a
>> timeout-induced FP failure.
>> +trap '' STOP TSTP
>
> SIGSTOP cannot be caught, blocked, or ignored.

Right, of course.  Thanks:

>From d255e40c371d3d132ac6529a7e3468dab6fdabfd Mon Sep 17 00:00:00 2001
From: Andreas Schwab 
Date: Tue, 25 Jan 2011 18:29:07 +0100
Subject: [PATCH] tests: minor correction

* tests/du/move-dir-while-traversing: Ignoring SIGTSTP is enough;
don't also attempt to ignore SIGSTOP, it cannot be handled or ignored.
Spotted by Andreas Schwab.
---
 tests/du/move-dir-while-traversing |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/du/move-dir-while-traversing 
b/tests/du/move-dir-while-traversing
index 2a60d6e..68302b8 100755
--- a/tests/du/move-dir-while-traversing
+++ b/tests/du/move-dir-while-traversing
@@ -72,7 +72,7 @@ for i in $(seq 30); do
 done

 # Prohibit suspension, which could otherwise cause a timeout-induced FP 
failure.
-trap '' STOP TSTP
+trap '' TSTP

 timeout 6 ./inotify-watch-for-dir-access.py $t/3/a/b > start-msg &

--
1.7.3.5.38.gb312b

[coreutils] [PATCH] tests: don't hide all trace of the vc_exe_in_TESTS test

2011-01-25 Thread Jim Meyering

FYI,

I was tempted to remove this rule altogether, since
test scripts no longer have to be executable.
However, it's not that expensive and does serve to
ensure consistency, and as a useful cross-check.

>From 034e496e983d286011fa5f6058b541b163ba9afd Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Tue, 25 Jan 2011 14:29:07 +0100
Subject: [PATCH] tests: don't hide all trace of the vc_exe_in_TESTS test

There was a non-negligible delay after running a single test.
Now, you'll know why when you see this test's name.
* tests/check.mk (vc_exe_in_TESTS): Don't @-hide commands.
Use $(AM_V_GEN) instead.
---
 tests/check.mk |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/check.mk b/tests/check.mk
index cddfe89..f931806 100644
--- a/tests/check.mk
+++ b/tests/check.mk
@@ -20,8 +20,8 @@
 _v = TESTS
 _w = root_tests
 vc_exe_in_TESTS: Makefile
-   @rm -f t1 t2
-   @if test -d $(top_srcdir)/.git && test $(srcdir) = .; then  \
+   $(AM_V_GEN)rm -f t1 t2; \
+   if test -d $(top_srcdir)/.git && test $(srcdir) = .; then   \
  { sed -n '/^$(_v) =[   ]*\\$$/,/[^\]$$/p' \
$(srcdir)/Makefile.am   \
| sed 's/^  *//;/^\$$.*/d;/^$(_v) =/d'; \
--
1.7.3.5.38.gb312b

[coreutils] coreutils-8.10 next week

2011-01-28 Thread Jim Meyering

I would like to release coreutils-8.10 next week, including the changes
on the fiemap-copy-2 branch (shortly to be extended).  I'll make a test
release as soon as the FIEMAP changes have been merged to master.

If anyone has additional changes that they would like to see included,
please let us know.

bug#6131: [PATCH]: fiemap support for efficient sparse file copy

2011-01-28 Thread Jim Meyering

Jeff liu wrote:
> Now  make check passed against the following combination:
> 1. Refresh installed host in Ubuntu10.0.4,
> filefrag comes from E2fsprogs 1.41.11 && Kernel: 2.6.32-16
> 2. filefrag in e2fsprogs-1.4.12 && kernel-2.6.36.
[passes]

Glad to here it passes for you, now.
FYI, I have spent pretty much time on cp over the last
couple days, factoring out the hole-inducing code and
making extent_copy use it.  Part of the motivation was
to fix cp --sparse=always, which was broken on the branch.
It would not induce holes when going through extent_copy.
I've added a couple more tests and will post the series as
soon I've cleaned things up a little more.

[coreutils] [PATCH] tests: remove obsolete uses of "$$" in temporary file names

2011-01-28 Thread Jim Meyering

FYI, here's some clean-up I pushed along the way...

>From 68a734d66bbe72d5d4133c386fa77069dd77fb8c Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Fri, 28 Jan 2011 19:23:21 +0100
Subject: [PATCH] tests: remove obsolete uses of "$$" in temporary file names

Those were useful when tests might have been run in the same
directory and in parallel.  Now, each test is run in a newly-
created empty directory.
* tests/cp/backup-1: Remove obsolete uses of "$$".
* tests/cp/same-file: Likewise.
* tests/dd/misc: Likewise.
* tests/mv/part-symlink: Likewise.
* tests/mv/to-symlink: Likewise.
* tests/touch/fail-diag: Likewise.
---
 tests/cp/backup-1 |4 +---
 tests/cp/same-file|9 +++--
 tests/dd/misc |8 
 tests/mv/part-symlink |9 +++--
 tests/mv/to-symlink   |4 +---
 tests/touch/fail-diag |   10 +++---
 6 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/tests/cp/backup-1 b/tests/cp/backup-1
index a3ed9c3..34670bd 100755
--- a/tests/cp/backup-1
+++ b/tests/cp/backup-1
@@ -21,10 +21,8 @@
 print_ver_ cp

 suffix=.b
-file=b1.$$
+file=F
 file_backup="$file$suffix"
-temp_files="$file $file_backup"
-rm -f $temp_files

 echo test > $file || fail=1

diff --git a/tests/cp/same-file b/tests/cp/same-file
index 7bb605a..da4fce3 100755
--- a/tests/cp/same-file
+++ b/tests/cp/same-file
@@ -39,10 +39,7 @@ test $hard_link_to_symlink_does_the_deref = yes \
 && remove_these_sed='/^0 -[bf]*l .*sl1 ->/d' \
 || remove_these_sed='/^ELIDE NO TEST OUTPUT/d'

-actual=actual-$$
-expected=expected-$$
-
-exec 3>&1 1> $actual
+exec 3>&1 1> actual

 # FIXME: This should be bigger: like more than 8k
 contents=XYZ
@@ -130,7 +127,7 @@ for args in 'foo symlink' 'symlink foo' 'foo foo' 'sl1 sl2' 
'foo hardlink'; do
   echo
 done

-cat <<\EOF | sed "$remove_these_sed" > $expected
+cat <<\EOF | sed "$remove_these_sed" > expected
 1 [cp: `foo' and `symlink' are the same file] (foo symlink -> foo)
 1 -d [cp: `foo' and `symlink' are the same file] (foo symlink -> foo)
 1 -f [cp: `foo' and `symlink' are the same file] (foo symlink -> foo)
@@ -218,6 +215,6 @@ EOF

 exec 1>&3 3>&-

-compare $expected $actual 1>&2 || fail=1
+compare expected actual 1>&2 || fail=1

 Exit $fail
diff --git a/tests/dd/misc b/tests/dd/misc
index f6efde2..820984c 100755
--- a/tests/dd/misc
+++ b/tests/dd/misc
@@ -20,10 +20,10 @@
 . "${srcdir=.}/init.sh"; path_prepend_ ../src
 print_ver_ dd

-tmp_in=dd-in.$$
-tmp_in2=dd-in2.$$
-tmp_sym=dd-sym.$$
-tmp_out=dd-out.$$
+tmp_in=dd-in
+tmp_in2=dd-in2
+tmp_sym=dd-sym
+tmp_out=dd-out

 warn=0
 echo data > $tmp_in || framework_failure
diff --git a/tests/mv/part-symlink b/tests/mv/part-symlink
index de8ece0..ec523b8 100755
--- a/tests/mv/part-symlink
+++ b/tests/mv/part-symlink
@@ -45,10 +45,7 @@ pwd_tmp=`pwd`
 # Exercise those four cases for each of
 # cp and mv, with lots of combinations of options.

-actual=actual-$$
-expected=expected-$$
-
-exec 1> $actual
+exec 1> actual

 # FIXME: This should be bigger: like more than 8k
 contents=XYZ
@@ -153,7 +150,7 @@ done
 test $fail = 1 &&
   { (exit 1); exit; }

-cat <<\EOF > $expected
+cat <<\EOF > expected
 1 cp loc_reg rem_sl
  [cp: `loc_reg' and `rem_sl' are the same file]
  (loc_reg) (rem_sl -> dir/loc_reg)
@@ -259,6 +256,6 @@ cat <<\EOF > $expected
 EOF

 # Redirect to stderr, since stdout is already taken.
-compare $expected $actual 1>&2 || fail=1
+compare expected actual 1>&2 || fail=1

 Exit $fail
diff --git a/tests/mv/to-symlink b/tests/mv/to-symlink
index ffc291c..600fde3 100755
--- a/tests/mv/to-symlink
+++ b/tests/mv/to-symlink
@@ -24,10 +24,8 @@ cleanup_() { rm -rf "$other_partition_tmpdir"; }

 rem_file="$other_partition_tmpdir/file"
 rem_symlink="$other_partition_tmpdir/symlink"
-file=to-sym-$$
+file=to-sym

-
-rm -f $file || framework_failure
 echo local > $file || framework_failure
 echo remote > $rem_file || framework_failure
 ln -s $rem_file $rem_symlink || framework_failure
diff --git a/tests/touch/fail-diag b/tests/touch/fail-diag
index a4c81c6..a2613fe 100755
--- a/tests/touch/fail-diag
+++ b/tests/touch/fail-diag
@@ -20,15 +20,11 @@
 print_ver_ touch
 skip_if_root_

-d1=no-$$
-dir=/$d1/such-dir
-# Ensure that $d1 doesn't already exist.
-ls -d $d1 2> /dev/null && framework_failure
+file=/no-such-dir/file

-
-touch $dir > out 2>&1 && fail=1
+touch $file > out 2>&1 && fail=1
 cat < exp
-touch: cannot touch \`$dir': No such file or directory
+touch: cannot touch \`$file': No such file or directory
 EOF

 compare out exp || fail=1
--
1.7.3.5.44.g960a

[coreutils] [PATCH] copy, tee: assume EINTR is always defined: remove #ifdefs

2011-01-30 Thread Jim Meyering

FYI, I'm pushing this to the fiemap-copy branch
that will shortly be pulled to master.

>From 372479c4afb7f3da2cd72e200a2afe65d718cb42 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Sun, 30 Jan 2011 16:12:56 +0100
Subject: [PATCH] copy, tee: assume EINTR is always defined: remove #ifdefs

Don't use "#ifdef EINTR".  dd.c has been doing that since 2004.
* src/copy.c (sparse_copy): Remove #ifdef...#endif around EINTR use.
* src/tee.c (tee_files): Remove #ifdef...#endif around EINTR use.
If we need it, add something like this in system.h:
/* When EINTR is not defined, define it to an improbable value
   so that each use does not have to be #ifdef'd.  */
 #ifndef EINTR
 # define EINTR 88
 #endif
---
 src/copy.c |2 --
 src/tee.c  |2 --
 2 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 96bb35b..4e73e1b 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -160,10 +160,8 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
   ssize_t n_read = read (src_fd, buf, MIN (max_n_read, buf_size));
   if (n_read < 0)
 {
-#ifdef EINTR
   if (errno == EINTR)
 continue;
-#endif
   error (0, errno, _("reading %s"), quote (src_name));
   return false;
 }
diff --git a/src/tee.c b/src/tee.c
index 0499442..0518b07 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -183,10 +183,8 @@ tee_files (int nfiles, const char **files)
   while (1)
 {
   bytes_read = read (0, buffer, sizeof buffer);
-#ifdef EINTR
   if (bytes_read < 0 && errno == EINTR)
 continue;
-#endif
   if (bytes_read <= 0)
 break;

--
1.7.3.5.44.g960a

Re: [PATCH] copy, tee: assume EINTR is always defined: remove #ifdefs

2011-01-30 Thread Jim Meyering

FYI, I've just adjusted the mailman's subject_prefix setting
so that it no longer prepends "[coreutils] " to each Subject:
on this list.

Re: [coreutils] coreutils-8.10 next week

2011-01-30 Thread Jim Meyering

Pádraig Brady wrote:
> On 28/01/11 18:10, Jim Meyering wrote:
>> I would like to release coreutils-8.10 next week, including the changes
>> on the fiemap-copy-2 branch (shortly to be extended).  I'll make a test
>> release as soon as the FIEMAP changes have been merged to master.
>>
>> If anyone has additional changes that they would like to see included,
>> please let us know.
>
> I intend to apply the attached:
>
> join: ensure --header skips the order check with empty files
> join: don't report disorder against an empty file
> join: add -o 'auto' to output a constant number of fields per line
> doc: add alternatives for field processing not supported by cut

Thanks!  Those all look good.

I've pulled the fiemap-copy series onto master and pushed the result,
but have not done any portability testing yet.

Re: [coreutils] [PATCH] copy, tee: assume EINTR is always defined: remove #ifdefs

2011-01-31 Thread Jim Meyering

Eric Blake wrote:

> On 01/30/2011 08:40 AM, Jim Meyering wrote:
>> FYI, I'm pushing this to the fiemap-copy branch
>> that will shortly be pulled to master.
>>
>>>From 372479c4afb7f3da2cd72e200a2afe65d718cb42 Mon Sep 17 00:00:00 2001
>> From: Jim Meyering 
>> Date: Sun, 30 Jan 2011 16:12:56 +0100
>> Subject: [PATCH] copy, tee: assume EINTR is always defined: remove #ifdefs
>>
>> Don't use "#ifdef EINTR".  dd.c has been doing that since 2004.
>> * src/copy.c (sparse_copy): Remove #ifdef...#endif around EINTR use.
>> * src/tee.c (tee_files): Remove #ifdef...#endif around EINTR use.
>> If we need it, add something like this in system.h:
>> /* When EINTR is not defined, define it to an improbable value
>>so that each use does not have to be #ifdef'd.  */
>
> Actually, the gnulib errno module (which we already use) guarantees
> pretty much ALL posix-defined errno values.

As you probably know, EINTR is not on gnulib's list.

> There's probably other
> #ifdef E* that you could scrub.

Actually no.  At least not that I could see:

  $ git grep -E 'ifn?def E' src|sort -u
  src/expr.c:#ifdef EVAL_TRACE
  src/extent-scan.h:#ifndef EXTENT_SCAN_H
  src/install.c:#ifdef ENABLE_MATCHPATHCON
  src/stty.c:#ifdef ECHOCTL
  src/stty.c:#ifdef ECHOKE
  src/stty.c:#ifdef ECHOPRT
  src/system.h:#ifndef ENODATA
  src/true.c:#ifndef EXIT_STATUS

minor test improvements, clean-up

2011-01-31 Thread Jim Meyering

FYI,

>From 040b00ef19fb1bbc0f7d3e5bec6ff82e19a92d26 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 13:35:55 +0100
Subject: [PATCH 1/8] tests: remove duplicate fiemap-perf test

* tests/cp/fiemap-perf: Copy block-comparing code from sparse-fiemap.
* tests/cp/sparse-fiemap: The same test was here, alongside a much
more involved test.  Remove it, now that it is in its own file.
---
 tests/cp/fiemap-perf   |4 
 tests/cp/sparse-fiemap |   13 +
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/tests/cp/fiemap-perf b/tests/cp/fiemap-perf
index 6c588cb..bd8fab8 100755
--- a/tests/cp/fiemap-perf
+++ b/tests/cp/fiemap-perf
@@ -29,4 +29,8 @@ timeout 10 truncate -s1T f || framework_failure_
 # Nothing can read (much less write) that many bytes in so little time.
 timeout 10 cp f f2 || fail=1

+# Ensure that the sparse file copied through fiemap has the same size
+# in bytes as the original.
+test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
+
 Exit $fail
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index b6b1103..aecf9d2 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -1,7 +1,7 @@
 #!/bin/sh
 # Test cp --sparse=always through fiemap copy

-# Copyright (C) 2010 Free Software Foundation, Inc.
+# Copyright (C) 2010, 2011 Free Software Foundation, Inc.

 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -47,17 +47,6 @@ else
 skip_test_ "insufficient mount/ext4 support"
 fi

-# Create a 1TiB sparse file
-dd if=/dev/zero of=sparse bs=1k count=1 seek=1G || framework_failure
-
-# It takes many minutes to copy this sparse file using the old method.
-# By contrast, it takes far less than 1 second using FIEMAP-copy.
-timeout 10 cp --sparse=always sparse fiemap || fail=1
-
-# Ensure that the sparse file copied through fiemap has the same size
-# in bytes as the original.
-test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
-
 # =
 # Ensure that we exercise the FIEMAP-copying code enough
 # to provoke at least two iterations of the do...while loop
--
1.7.3.5.44.g960a


>From 08b7bce540f12595f9995851338bb765783d264c Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 13:40:26 +0100
Subject: [PATCH 2/8] tests: modernize sparse-fiemap test

* tests/cp/sparse-fiemap: Use print_ver_, not open-coded VERBOSE test.
---
 tests/cp/sparse-fiemap |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index aecf9d2..c873645 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -16,12 +16,8 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.

-if test "$VERBOSE" = yes; then
-  set -x
-  cp --version
-fi
-
 . "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ cp

 if df -T -t btrfs -t xfs -t ext4 -t ocfs2 . ; then
   : # Current dir is on a partition with working extents.  Good!
--
1.7.3.5.44.g960a


>From 31dc21f693b5dc9c267be89934ea4cb306806e39 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 13:42:49 +0100
Subject: [PATCH 3/8] maint: update copyright year lists in new files

* src/extent-scan.h: Update copyright year list.
* src/extent-scan.c: Likewise.
* tests/cp/sparse-fiemap: Likewise.
---
 src/extent-scan.c  |2 +-
 src/extent-scan.h  |2 +-
 tests/cp/sparse-fiemap |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index 3bb0d53..9f38877 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -1,5 +1,5 @@
 /* extent-scan.c -- core functions for scanning extents
-   Copyright (C) 2010 Free Software Foundation, Inc.
+   Copyright (C) 2010-2011 Free Software Foundation, Inc.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
diff --git a/src/extent-scan.h b/src/extent-scan.h
index ac9e500..4724b25 100644
--- a/src/extent-scan.h
+++ b/src/extent-scan.h
@@ -1,5 +1,5 @@
 /* core functions for efficient reading sparse files
-   Copyright (C) 2010 Free Software Foundation, Inc.
+   Copyright (C) 2010-2011 Free Software Foundation, Inc.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index c873645..7e1932b 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -1,7 +1,7 @@
 #!/bin/sh
 # Test cp --sparse=always through fiemap copy

-# Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+# Copyright (C) 2010-2011 Free Software Foundation, Inc.

 # This program is free so

[PATCH 7/8] cp: fix copying a sparse file to a pipe

2011-01-31 Thread Jim Meyering

I discovered that my recent changes had broken a corner of cp.
Here's a fix and a test to exercise the bug.

>From cc0645fc643d3ea5a06f12f5fad784a2b8097888 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 18:25:58 +0100
Subject: [PATCH 7/8] cp: fix copying a sparse file to a pipe

The recent FIEMAP-related changes made it so the unusual case of
copying a sparse file to a non-regular destination (e.g., a pipe)
would erroneously write one byte too many to that destination.
That happened because extent_copy assumed that it could use lseek
to obtain the number of bytes written to the output file descriptor.
That was valid only for regular files.
* src/copy.c (sparse_copy): Add a parameter, to be used by extent_copy,
but not by reg_copy.  Adjust callers.
(extent_copy): Maintain new local, dest_pos, using new arg, n_read.
Don't call lseek on dest_fd; use new var, dest_pos, instead.
(copy_reg): Add unused arg.
---
 src/copy.c |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index d32a11c..8ba09e0 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -139,15 +139,18 @@ utimens_symlink (char const *file, struct timespec const 
*timespec)
BUF must have sizeof(uintptr_t)-1 bytes of additional space
beyond BUF[BUF_SIZE-1].
Set *LAST_WRITE_MADE_HOLE to true if the final operation on
-   DEST_FD introduced a hole.  */
+   DEST_FD introduced a hole.  Set *TOTAL_N_READ to the number of
+   bytes read.  */
 static bool
 sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
  bool make_holes,
  char const *src_name, char const *dst_name,
- uintmax_t max_n_read, bool *last_write_made_hole)
+ uintmax_t max_n_read, off_t *total_n_read,
+ bool *last_write_made_hole)
 {
   typedef uintptr_t word;
   *last_write_made_hole = false;
+  *total_n_read = 0;

   while (max_n_read)
 {
@@ -164,6 +167,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
   if (n_read == 0)
 break;
   max_n_read -= n_read;
+  *total_n_read += n_read;

   if (make_holes)
 {
@@ -313,6 +317,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
   off_t last_ext_start = 0;
   uint64_t last_ext_len = 0;

+  /* Keep track of the output position.
+ We may need this at the end, for a final ftruncate.  */
+  off_t dest_pos = 0;
+
   extent_scan_init (src_fd, &scan);

   bool wrote_hole_at_eof = true;
@@ -378,10 +386,14 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
   last_ext_start = ext_start;
   last_ext_len = ext_len;

+  off_t n_read;
   if ( ! sparse_copy (src_fd, dest_fd, buf, buf_size,
-  make_holes, src_name, dst_name, ext_len,
+  make_holes, src_name, dst_name,
+  ext_len, &n_read,
   &wrote_hole_at_eof))
 return false;
+
+  dest_pos = ext_start + n_read;
 }

   /* Release the space allocated to scan->ext_info.  */
@@ -398,11 +410,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
  In addition, if the final extent was a block of zeros at EOF and we've
  just converted them to a hole in the destination, we must call ftruncate
  here in order to record the proper length in the destination.  */
-  off_t dest_len = lseek (dest_fd, 0, SEEK_CUR);
-  if ((dest_len < src_total_size || wrote_hole_at_eof)
+  if ((dest_pos < src_total_size || wrote_hole_at_eof)
   && (make_holes
   ? ftruncate (dest_fd, src_total_size)
-  : ! write_zeros (dest_fd, src_total_size - dest_len)))
+  : ! write_zeros (dest_fd, src_total_size - dest_pos)))
 {
   error (0, errno, _("failed to extend %s"), quote (dst_name));
   return false;
@@ -981,9 +992,11 @@ copy_reg (char const *src_name, char const *dst_name,
   goto close_src_and_dst_desc;
 }

+  off_t n_read;
   bool wrote_hole_at_eof;
   if ( ! sparse_copy (source_desc, dest_desc, buf, buf_size,
-  make_holes, src_name, dst_name, UINTMAX_MAX,
+  make_holes, src_name, dst_name,
+  UINTMAX_MAX, &n_read,
   &wrote_hole_at_eof)
|| (wrote_hole_at_eof &&
! sparse_copy_finalize (dest_desc, dst_name)))
--
1.7.3.5.44.g960a


>From 43d739e99c508a7c0fdfabc3c1aae844b029766f Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 17:18:13 +0100
Subject: [PATCH 8/8] tests: exercise a rarely-used corner of copy.c

* tests/cp/sparse-to-pipe: New test.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am   |1 +
 tests/cp/sparse-to-pipe |   31 +++
 2 files chan

RFC: new cp option: --efficient-sparse=HOW

2011-01-31 Thread Jim Meyering

Now that we have can read sparse files efficiently,
what if I want to copy a 20PiB sparse file, and yet I want to
be sure that it does so efficiently.  Few people can afford
to wait around while a normal processor and storage system process
that much raw data.  But if it's a sparse file and the src and dest
file systems have the right support (FIEMAP ioctl), then it'll be
copied in the time it takes to make a few syscalls.

Currently, when the efficient sparse copy fails, cp falls back
on the regular, expensive, read-every-byte approach.

This proposal adds an option, --efficient-sparse=required,
to make cp fail if the initial attempt to read the sparse file fails,
rather than resorting to the regular (very slow in the above case) copy
procedure.

The default is --efficient-sparse=auto, and for symmetry,
I've provided --efficient-sparse=never, in case someone finds
a reason to want to skip the ioctl.

You can demonstrate this new feature on a tmpfs file system,
since it supports sparse files, but not the FIEMAP ioctl:

$ cd /dev/shm
$ truncate -s128K k
$ cp --efficient=required k kk
cp: unable to read sparse `k' efficiently
[Exit 1]

Here's a preliminary patch
(not including texinfo changes)
I'll add tests, too, of course.

Feedback on the option name or anything else would be most welcome.
The sooner the better.


diff --git a/src/copy.c b/src/copy.c
index 04c678d..72425af 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -305,8 +305,8 @@ write_zeros (int fd, uint64_t n_bytes)
copy, and thus makes copying sparse files much more efficient.
Upon a successful copy, return true.  If the initial extent scan
fails, set *NORMAL_COPY_REQUIRED to true and return false.
-   Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
-   return false.  */
+   Upon any other failure, give a diagnostic, set *NORMAL_COPY_REQUIRED
+   to false and return false.  */
 static bool
 extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
  off_t src_total_size, bool make_holes,
@@ -931,6 +931,7 @@ copy_reg (char const *src_name, char const *dst_name,
   /* Deal with sparse files.  */
   bool make_holes = false;

+  bool src_is_sparse = false;
   if (S_ISREG (sb.st_mode))
 {
   /* Even with --sparse=always, try to create holes only
@@ -943,9 +944,13 @@ copy_reg (char const *src_name, char const *dst_name,
  blocks.  If the file has fewer blocks than would normally be
  needed for a file of its size, then at least one of the blocks in
  the file is a hole.  */
+
   if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
   && ST_NBLOCKS (src_open_sb) < src_open_sb.st_size / 
ST_NBLOCKSIZE)
-make_holes = true;
+{
+  make_holes = true;
+  src_is_sparse = true;
+}
 #endif
 }

@@ -977,18 +982,30 @@ copy_reg (char const *src_name, char const *dst_name,
   buf_alloc = xmalloc (buf_size + buf_alignment_slop);
   buf = ptr_align (buf_alloc, buf_alignment);

-  bool normal_copy_required;
+  bool normal_copy_required = true;
   /* Perform an efficient extent-based copy, falling back to the
  standard copy only if the initial extent scan fails.  If the
  '--sparse=never' option is specified, write all data but use
  any extents to read more efficiently.  */
-  if (extent_copy (source_desc, dest_desc, buf, buf_size,
-   src_open_sb.st_size, make_holes,
-   src_name, dst_name, &normal_copy_required))
+
+  if (x->sparse_efficiency != SPARSE_EFF_NEVER
+  && extent_copy (source_desc, dest_desc, buf, buf_size,
+  src_open_sb.st_size, make_holes,
+  src_name, dst_name, &normal_copy_required))
 goto preserve_metadata;

   if (! normal_copy_required)
 {
+  /* extent_copy already diagnosed the failure */
+  return_val = false;
+  goto close_src_and_dst_desc;
+}
+
+  /* extent_copy failed, and we are instructed not to fall-back */
+  if (src_is_sparse && x->sparse_efficiency == SPARSE_EFF_REQUIRED)
+{
+  error (0, 0, _("unable to read sparse %s efficiently"),
+ quote (src_name));
   return_val = false;
   goto close_src_and_dst_desc;
 }
@@ -2519,6 +2536,7 @@ cp_options_default (struct cp_options *x)
 #else
   x->chown_privileges = x->owner_privileges = (geteuid () == 0);
 #endif
+  x->sparse_efficiency = SPARSE_EFF_AUTO;
 }

 /* Return true if it's OK for chown to fail, where errno is
diff --git a/src/copy.h b/src/copy.h
index 5014ea9..fab131b 100644
--- a/src/copy.h
+++ b/src/copy.h
@@ -22,6 +22,22 @@
 # include 
 # include "hash.h"

+/* Control efficient reading of sparse files.  On some systems, you can
+   use the FIEMAP ioctl to read only the non-sparse parts of a file.

Re: RFC: new cp option: --efficient-sparse=HOW

2011-01-31 Thread Jim Meyering

Jim Meyering wrote:

> Now that we have can read sparse files efficiently,
> what if I want to copy a 20PiB sparse file, and yet I want to
> be sure that it does so efficiently.  Few people can afford
> to wait around while a normal processor and storage system process
> that much raw data.  But if it's a sparse file and the src and dest
> file systems have the right support (FIEMAP ioctl), then it'll be
> copied in the time it takes to make a few syscalls.
>
> Currently, when the efficient sparse copy fails, cp falls back
> on the regular, expensive, read-every-byte approach.
>
> This proposal adds an option, --efficient-sparse=required,
> to make cp fail if the initial attempt to read the sparse file fails,
> rather than resorting to the regular (very slow in the above case) copy
> procedure.
>
> The default is --efficient-sparse=auto, and for symmetry,
> I've provided --efficient-sparse=never, in case someone finds
> a reason to want to skip the ioctl.
>
> You can demonstrate this new feature on a tmpfs file system,
> since it supports sparse files, but not the FIEMAP ioctl:
>
> $ cd /dev/shm
> $ truncate -s128K k
> $ cp --efficient=required k kk
> cp: unable to read sparse `k' efficiently
> [Exit 1]
>
> Here's a preliminary patch
> (not including texinfo changes)
> I'll add tests, too, of course.

And NEWS.

Here's that same patch, but now with a proper ChangeLog:

>From c83ea420c64169a7db58189cce6d3e755eb7b717 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 23:13:36 +0100
Subject: [PATCH] cp: support new option: --efficient-sparse=HOW

Now that we have can read sparse files efficiently,
what if I want to copy a 20PiB sparse file, and yet I want to
be sure that it does so efficiently?  Few people can afford
to wait around while a normal processor and storage system process
that much raw data.  But if it's a sparse file and the src and dest
file systems have the right support (FIEMAP ioctl), then it'll be
copied in the time it takes to make a few syscalls.

Currently, when the efficient sparse copy fails, cp falls back
on the regular, expensive, read-every-byte approach.

This proposal adds an option, --efficient-sparse=required,
to make cp fail if the initial attempt to read the sparse file fails,
rather than resorting to the regular (very slow in the above case) copy
procedure.

The default is --efficient-sparse=auto, and for symmetry,
I've provided --efficient-sparse=never, in case someone finds
a reason to want to skip the ioctl.

You can demonstrate this new feature on a tmpfs file system,
since it supports sparse files, but not the FIEMAP ioctl:

$ cd /dev/shm
$ truncate -s128K k
$ cp --efficient=required k kk
cp: unable to read sparse `k' efficiently
[Exit 1]

* src/copy.h (enum Sparse_efficiency): Declare.
(struct cp_options) [sparse_efficiency]: New member.
* src/copy.c (word, cp_options_default):
(extent_copy): Add description for a parameter.
(copy_reg): Remember result of src_is_sparse heuristic...
and test that when extent_copy fails.
Don't call extent_copy for SPARSE_EFF_NEVER.
(cp_options_default): Initialize new member.
* src/cp.c (eff_sparse_type, long_opts, main): Support new options.
(usage): Document them.
---
 src/copy.c |   32 +---
 src/copy.h |   19 +++
 src/cp.c   |   36 
 3 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 04c678d..72425af 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -305,8 +305,8 @@ write_zeros (int fd, uint64_t n_bytes)
copy, and thus makes copying sparse files much more efficient.
Upon a successful copy, return true.  If the initial extent scan
fails, set *NORMAL_COPY_REQUIRED to true and return false.
-   Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
-   return false.  */
+   Upon any other failure, give a diagnostic, set *NORMAL_COPY_REQUIRED
+   to false and return false.  */
 static bool
 extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
  off_t src_total_size, bool make_holes,
@@ -931,6 +931,7 @@ copy_reg (char const *src_name, char const *dst_name,
   /* Deal with sparse files.  */
   bool make_holes = false;

+  bool src_is_sparse = false;
   if (S_ISREG (sb.st_mode))
 {
   /* Even with --sparse=always, try to create holes only
@@ -943,9 +944,13 @@ copy_reg (char const *src_name, char const *dst_name,
  blocks.  If the file has fewer blocks than would normally be
  needed for a file of its size, then at least one of the blocks in
  the file is a hole.  */
+
   if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
   && ST_

Re: RFC: new cp option: --efficient-sparse=HOW

2011-01-31 Thread Jim Meyering

Eric Blake wrote:

> On 01/31/2011 02:46 PM, Jim Meyering wrote:
>> Now that we have can read sparse files efficiently,
>> what if I want to copy a 20PiB sparse file, and yet I want to
>> be sure that it does so efficiently.  Few people can afford
>> to wait around while a normal processor and storage system process
>> that much raw data.  But if it's a sparse file and the src and dest
>> file systems have the right support (FIEMAP ioctl), then it'll be
>> copied in the time it takes to make a few syscalls.
>>
>> Currently, when the efficient sparse copy fails, cp falls back
>> on the regular, expensive, read-every-byte approach.
>>
>> This proposal adds an option, --efficient-sparse=required,
>> to make cp fail if the initial attempt to read the sparse file fails,
>> rather than resorting to the regular (very slow in the above case) copy
>> procedure.
>>
>> The default is --efficient-sparse=auto, and for symmetry,
>> I've provided --efficient-sparse=never, in case someone finds
>> a reason to want to skip the ioctl.
>
> Conversely, what happens if I have a file that contains large blocks of
> zeros but is NOT fully sparse (plausible, since we're still facing the
> fact that it is still not easy to punch holes into existing files when
> data in that portion of the file is no longer needed)?  Does all the new
> fiemap code still have the ability for me to request that the cp code
> specifically look for large blocks of zero in the source, rather than
> trusting the fiemap, so that I can create a copy that is more sparse
> than the original?  Does that also need a tunable; and if so, should we
> try to combine it into this tunable or is it orthogonal?

It's orthogonal.

--sparse=always still does the hole-punching, independently
of whether we're copying normally or via the efficient FIEMAP-based
code.

E.g., if you have a sparse file, where one non-sparse chunk
contains all-zero blocks (currently 32KiB minimum), then --sparse=always
will convert those blocks to holes, with or without
--efficient-sparse=never.

--efficient-sparse=... controls efficiency while reading
--sparse=...   controls hole-punching (or preserving)

BTW, that the existing hole-punching behavior works for no
sequence shorter than 32KiB is a bug that I will fix very soon.
I think that was introduced as an unwanted side-effect when
increasing buffer size for efficiency.

Thanks for the feedback.

[PATCH] cp: always initialize extent_copy's output parameter

2011-01-31 Thread Jim Meyering

Here's another bug fix:

>From f8f57dd76ba98793e131a88beb138a32d54e0aa9 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 31 Jan 2011 20:55:34 +0100
Subject: [PATCH] cp: always initialize extent_copy's output parameter

* src/copy.c (extent_copy): Otherwise it would be used uninitialized.
---
 src/copy.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 8ba09e0..04c678d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -323,6 +323,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,

   extent_scan_init (src_fd, &scan);

+  *require_normal_copy = false;
   bool wrote_hole_at_eof = true;
   do
 {
--
1.7.3.5.44.g960a

[PATCH] tests: correct part of fiemap-perf

2011-02-02 Thread Jim Meyering

I noticed this while looking at the log of an unrelated test failure:

>From 4e9ab7d3d87c24ff0671fc7c3be8c0a0873009ab Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Wed, 2 Feb 2011 12:02:22 +0100
Subject: [PATCH] tests: correct part of fiemap-perf

* tests/cp/fiemap-perf: Correct erroneous added test.
Since nonexistent names were used, the final test ended up
being "test =", which would always "succeed".
---
 tests/cp/fiemap-perf |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/cp/fiemap-perf b/tests/cp/fiemap-perf
index 6227243..7369a7d 100755
--- a/tests/cp/fiemap-perf
+++ b/tests/cp/fiemap-perf
@@ -31,6 +31,6 @@ timeout 10 cp f f2 || fail=1

 # Ensure that the sparse file copied through fiemap has the same size
 # in bytes as the original.
-test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
+test "$(stat --printf %s f)" = "$(stat --printf %s f2)" || fail=1

 Exit $fail
--
1.7.3.5.44.g960a

new snapshot available: coreutils-8.9.77-4e9ab7

2011-02-02 Thread Jim Meyering

Here's a snapshot of what should soon be tagged as coreutils-8.10.
I expect to release 8.10 on Friday, so any testing you can do
between now and then would be most welcome.

Since this changes how "cp" handles sparse files, it is particularly
useful to build and run the tests on different types of file systems.
You'll see significant benefit when copying sparse files on recent linux
kernels (2.6.27 or newer) when using a source file system that supports
the FIEMAP ioctl, mainly ext4, btrfs, xfs, ocfs2, gfs2.  ext2 and ext3
are supposed to have FIEMAP support, too, but I haven't tested those.
If you test with btrfs, you should expect one or two test failure unless
you're using the very latest from fedora rawhide, in which case you'll
see only 1 test failure (cp/sparse-fiemap; I'll work around that tomorrow).

coreutils snapshot: (.gz files are here, too)
  http://meyering.net/cu/coreutils-ss.tar.xz  4.6 MB
  http://meyering.net/cu/coreutils-ss.tar.xz.sig
  http://meyering.net/cu/coreutils-8.9.77-4e9ab7.tar.xz

There are .gz and .sig files here, too:
  http://people.redhat.com/meyering/cu/coreutils-ss.tar.xz
  http://people.redhat.com/meyering/cu/coreutils-8.9.77-4e9ab7.tar.xz

Here's the NEWS so far:

---
** Bug fixes

  du would abort with a failed assertion when two conditions are met:
  part of the hierarchy being traversed is moved to a higher level in the
  directory tree, and there is at least one more command line directory
  argument following the one containing the moved sub-tree.
  [bug introduced in coreutils-5.1.0]

  join --header now skips the ordering check for the first line
  even if the other file is empty.  [bug introduced in coreutils-8.5]

  rm -f no longer fails for EINVAL or EILSEQ on file systems that
  reject file names invalid for that file system.

  uniq -f NUM no longer tries to process fields after end of line.
  [bug introduced in coreutils-7.0]

** New features

  cp now copies sparse files efficiently on file systems with FIEMAP
  support (ext4, btrfs, xfs, ocfs2).  Before, it had to read 2^20 bytes
  when copying a 1MiB sparse file.  Now, it copies bytes only for the
  non-sparse sections of a file.  Similarly, to induce a hole in the
  output file, it had to detect a long sequence of zero bytes.  Now,
  it knows precisely where each hole in an input file is, and can
  reproduce them efficiently in the output file.  mv also benefits
  when it resorts to copying, e.g., between file systems.

  join now supports -o 'auto' which will automatically infer the
  output format from the first line in each file, to ensure
  the same number of fields are output for each line.

** Changes in behavior

  join no longer reports disorder when one of the files is empty.
  This allows one to use join as a field extractor like:
  join -a1 -o 1.3,1.1 - /dev/null

---

Changes in coreutils since v8.9:

Andreas Schwab (1):
  tests: minor correction

Jie Liu (3):
  cp: copy sparse files efficiently using the FIEMAP ioctl
  tests: add a new test for FIEMAP-copy
  copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl

Jim Meyering (56):
  post-release administrivia
  du: don't abort when a subdir is renamed during traversal
  build: update gnulib submodule to latest
  doc: clean up HACKING guidelines
  tests: add a test for today's uniq bug
  uniq: replace a wasteful loop with simple calculation
  doc: show how to shred more efficiently
  doc: update NEWS
  build: update gnulib submodule to latest
  tests: avoid FP failure in new du test
  maint: use slightly more efficient process in README-release
  build: update gnulib submodule to latest
  build: update gnulib submodule to latest
  doc: fix wording in warning about potential conflict with built-in
  tail: avoid new diagnostic when applying -f to a pipe on linux-2.3.38
  split: avoid a new, spurious warning from gcc-4.6.0
  tests: avoid rare FP failure in new du test
  tests: avoid FP failure due to suspension
  tests: don't hide all trace of the vc_exe_in_TESTS test
  tests: remove obsolete uses of "$$" in temporary file names
  tests: exercise more of the new FIEMAP copying code
  tests: accommodate varying filefrag -v "flags" output
  tests: relax the root-tests cross-check
  copy.c: adjust comments, tweak semantics
  fiemap.h: include , not 
  build: distribute new file, fiemap.h
  build: distribute new test script, filefrag-extent-compare
  fiemap copy: don't let write failure go unreported; adjust style, etc.
  extent-scan: adjust naming and formatting
  copy: remove else-after-goto and adjust indentation
  copy: call extent_copy also when make_holes is false, ...

Re: RFC: new cp option: --efficient-sparse=HOW

2011-02-03 Thread Jim Meyering

Pádraig Brady wrote:
> On 31/01/11 21:46, Jim Meyering wrote:
>> Now that we have can read sparse files efficiently,
>> what if I want to copy a 20PiB sparse file, and yet I want to
>> be sure that it does so efficiently.  Few people can afford
>> to wait around while a normal processor and storage system process
>> that much raw data.  But if it's a sparse file and the src and dest
>> file systems have the right support (FIEMAP ioctl), then it'll be
>> copied in the time it takes to make a few syscalls.
>>
>> Currently, when the efficient sparse copy fails, cp falls back
>> on the regular, expensive, read-every-byte approach.
>>
>> This proposal adds an option, --efficient-sparse=required,
>> to make cp fail if the initial attempt to read the sparse file fails,
>> rather than resorting to the regular (very slow in the above case) copy
>> procedure.
>>
>> The default is --efficient-sparse=auto, and for symmetry,
>> I've provided --efficient-sparse=never, in case someone finds
>> a reason to want to skip the ioctl.
>
> I'm 50:50 on whether we should add this option.
> If a user doesn't want to copy the file if it's going
> to be too slow, then what other options do they have?
> Also the sparseness of a file can change over time.
>
> Given the fairly specialized need for such an option,
> perhaps it's best to foist to other tools, like:
>
> test $(du f) -lt $limit && df -T -t myfs f && cp f f2
>
> or perhaps
>
> timeout $time_limit cp f f2 || { rm f2; echo "oops" >&2; }

[Rats!  I thought I had sent this at 5pm two days ago
 and happen upon the "unsent" buffer only now.  ]

Hi Pádraig,

Thanks for the feedback.
I think we'll want *some* option, eventually, if only to tell
cp to choose how to treat a sparse file:

  1) copy the file (using extents, if possible) and introducing holes
 wherever possible.  This produces essentially the same result as
 cp in coreutils-8.9 and earlier.

  2) copy the file as precisely as possible.  This is possible now,
 when the FIEMAP ioctl works and when we can induce holes in
 a destination file.

For the imminent coreutils-8.10 release, I'll refrain from adding any
new command-line option, and let the default be #1, as it is with the
code now on master.

Does anyone know how to determine if a file system (say the one with ".")
supports the FIEMAP ioctl, but without compiling/running a C program?
I.e., via perl or python?  I've written a tiny C program that works
and a Perl one that is supposed to be identical (at least the buffer
arg has identical content), yet does not work.
The goal is to make the tests more robust by replacing the kludgey
FS-name-based test with an actual syscall-based test.

Jim

two portability fixes

2011-02-04 Thread Jim Meyering

The first is just to fix a failing test.
For mv/i-3, the output file, "out" is empty on FreeBSD 8.1
Also, (and this is telling), it seems that no one is testing
on non-linux kernels.  On the first one I tried, just before
I was planning to release, "make check" reported 30+ failing tests,
all having to do with the new copying code.

>From ee74c7d36ddfc3758b2cdec540aab478eb724572 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Fri, 4 Feb 2011 14:59:03 +0100
Subject: [PATCH 1/2] tests: skip mv/i-3 on FreeBSD to avoid spurious failure

* tests/mv/i-3: Skip when uname -s reports FreeBSD.
---
 tests/mv/i-3 |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tests/mv/i-3 b/tests/mv/i-3
index 80db5e8..7fba139 100755
--- a/tests/mv/i-3
+++ b/tests/mv/i-3
@@ -23,6 +23,8 @@ require_controlling_input_terminal_
 skip_if_root_
 trap '' TTIN # Ignore SIGTTIN

+test "$(uname -s)" = FreeBSD && skip_ "known spurious failure on FreeBSD"
+
 touch f g h i || framework_failure
 chmod 0 g i || framework_failure

--
1.7.4.2.g597a6


>From d5fa424a9c25db98619c47007f548aaef6155fce Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Fri, 4 Feb 2011 15:01:39 +0100
Subject: [PATCH 2/2] cp: avoid spurious failure on any non-linux kernel

* src/extent-scan.c (extent_scan_read) [!linux]: Always set
scan->initial_scan_failed so caller knows not to report the failure.
---
 src/extent-scan.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index 9f38877..1ba59db 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -110,6 +110,7 @@ extent_scan_read (struct extent_scan *scan)
 extern bool
 extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED)
 {
+  scan->initial_scan_failed = true;
   errno = ENOTSUP;
   return false;
 }
--
1.7.4.2.g597a6

Re: { mv test test; } should not fail

2011-02-04 Thread Jim Meyering

Krzysztof Żelechowski wrote:
>> At any rate, I don't see any reason for coreutils to change its
>> behavior; although I might be persuaded otherwise if someone writes up a
>> patch (including testsuite and documentation),
>
> Whereas I hoped to be persuaded that failing makes sense and is *not*
> counter-intuitive.  Of course, changing something just because is
> harder than doing it right from the onset (example: Polish
> orthography).
>
>> and in particular, you
>> MUST ensure that coreutils does NOT regress on:
>>
>> ln a b
>> mv a b

In the implementation of mv, it is very easy to make
the subtle, but fatal (to those afflicted) mistake of
allowing the above to result in data loss.  But that is incidental,
because we're talking about the case in which the two arguments
refer not just to the same file, but that are identical strings.

There is also a lot of variation in how systems implement the
rename syscall.  With GNU mv, you get a consistent, and imho,
important failure, insulating applications from the underlying
syscall.  Consistency is a small part of the answer.

Here's the real reason:

I have seen no reason to make "mv a a" a successful no-op,
yet I have seen numerous occasions where GNU mv's implementation
exposed a problem in a script that does e.g., "mv $a $b" or
when someone does "mv foo !$" on the command line.  Not reporting
an error for something so unusual can be fatal if it makes you
think you've just moved a precious file to a safe place and you're
all clear to nuke the containing directory.

> I have observed that in the case { mv a a; } mv does not call rename
> at all, as if trying to be smarter than the operating system.  I am
> curious why it was implemented like this.

coreutils-8.10 released [stable]

2011-02-04 Thread Jim Meyering

This is to announce coreutils-8.10, a stable release.

There have been some minor bug fixes, along with two new features.  The
join feature is enabled via a new option, "-o auto".  The cp feature makes
copying sparse files much more efficient on several common file systems.
It takes advantage of a feature that was introduced in linux-2.6.27.
The improvement affects the default code path, so if you're looking
for risk potential, this is it.  It uses the feature if available,
and otherwise resorts to using the old, less-efficient copying code.

See NEWS below for a brief summary.

For a summary of changes and contributors, see:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=shortlog;h=v8.10
or run this command from a git-cloned coreutils directory:
  git shortlog v8.9..v8.10

To summarize the many gnulib-related changes, run these commands from
a git-cloned coreutils directory:
  git checkout v8.10
  git submodule summary v8.9

Jim [on behalf of the coreutils maintainers]


Here's the GNU Coreutils home page:
http://www.gnu.org/software/coreutils/

Here are the compressed sources:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.10.tar.gz   (11MB)
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.10.tar.xz   (4.6MB)

Here are the GPG detached signatures[*]:
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.10.tar.gz.sig
  http://ftp.gnu.org/gnu/coreutils/coreutils-8.10.tar.xz.sig

If you prefer a mirror, you can use links like these:
  http://ftpmirror.gnu.org/coreutils/coreutils-8.10.tar.xz
  http://ftpmirror.gnu.org/coreutils/coreutils-8.10.tar.gz

[*] You can use either of the above signature files to verify that
the corresponding file (without the .sig suffix) is intact.  First,
be sure to download both the .sig file and the corresponding tarball.
Then, run a command like this:

  gpg --verify coreutils-8.10.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 000B

and rerun the `gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.68.24-5f61
  Automake 1.11a
  Gnulib v0.0-4800-ga036b76
  Bison 2.4.3

=
NEWS

* Noteworthy changes in release 8.10 (2011-02-04) [stable]

** Bug fixes

  du would abort with a failed assertion when two conditions are met:
  part of the hierarchy being traversed is moved to a higher level in the
  directory tree, and there is at least one more command line directory
  argument following the one containing the moved sub-tree.
  [bug introduced in coreutils-5.1.0]

  join --header now skips the ordering check for the first line
  even if the other file is empty.  [bug introduced in coreutils-8.5]

  rm -f no longer fails for EINVAL or EILSEQ on file systems that
  reject file names invalid for that file system.

  uniq -f NUM no longer tries to process fields after end of line.
  [bug introduced in coreutils-7.0]

** New features

  cp now copies sparse files efficiently on file systems with FIEMAP
  support (ext4, btrfs, xfs, ocfs2).  Before, it had to read 2^20 bytes
  when copying a 1MiB sparse file.  Now, it copies bytes only for the
  non-sparse sections of a file.  Similarly, to induce a hole in the
  output file, it had to detect a long sequence of zero bytes.  Now,
  it knows precisely where each hole in an input file is, and can
  reproduce them efficiently in the output file.  mv also benefits
  when it resorts to copying, e.g., between file systems.

  join now supports -o 'auto' which will automatically infer the
  output format from the first line in each file, to ensure
  the same number of fields are output for each line.

** Changes in behavior

  join no longer reports disorder when one of the files is empty.
  This allows one to use join as a field extractor like:
  join -a1 -o 1.3,1.1 - /dev/null

-
also posted as: http://savannah.gnu.org/forum/forum.php?forum_id=6711


pgpHB3c8ClZZI.pgp
Description: PGP signature

Re: Generating pseudo-random integers

2011-02-05 Thread Jim Meyering

Melikamp T. Medley wrote:

> Thanks!
>
> I think I want to write a utility that prints pseudo-random
> integers (I have in CL, but I like it fast, so this time in C),
> and link it with coreutils. I looked at Paul Eggert's work,
> and I won't be able to do any better than that if I spend
> years on it :) I want to print uniformly distributed integers
> fast, formatted, in bulk if needed. And I want to call it roll.
> Let me know if you have a word of advice.
>
> On 02/04/2011 11:35 PM, Eric Blake wrote:
>> Using just coreutils, though, I'm afraid your options are slim.  You
>> could use 'seq 0 9 | sort -R | head -n1' (or use shuf instead of sort
>> -R) to generate one random number at a time, but that gets expensive.
>> Or, if you don't mind limiting yourself to 62**6 values (instead of the
>> more traditional 2**32), you could do 'mkdir tmp; mktemp -u -p tmp
>> XX'.  But that's the extent of coreutils' randomness exposed to the
>> user.

It's hard to justify writing a new program in C
when you can do the job with a Perl one-liner:

  $ perl -le 'foreach (1..20) { print int rand 1000 }'
  423
  619
  917
  396
  957
  345
  892
  13
  424
  519
  898
  689
  208
  383
  605
  196
  384
  893
  618
  344

However, adding an *option* to shuf might make sense,
since you can already give shuf both a range and a count
of how many numbers to print.

Re: coreutils-8.10 released [stable]

2011-02-05 Thread Jim Meyering

Dmitry V. Levin wrote:
> On Fri, Feb 04, 2011 at 08:20:39PM +0100, Jim Meyering wrote:
>> This is to announce coreutils-8.10, a stable release.
>>
>> There have been some minor bug fixes, along with two new features.  The
>> join feature is enabled via a new option, "-o auto".  The cp feature makes
>> copying sparse files much more efficient on several common file systems.
>> It takes advantage of a feature that was introduced in linux-2.6.27.
>> The improvement affects the default code path, so if you're looking
>> for risk potential, this is it.  It uses the feature if available,
>> and otherwise resorts to using the old, less-efficient copying code.
>
> tests/cp/fiemap-perf fails with EFBIG on tmpfs:
> $ truncate -s1T f
> truncate: failed to truncate `f' at 1099511627776 bytes: File too large
>
> Reducing the size from 1T to 256G makes the test pass.

Thanks for the report.
The current df-based guard for the cp/FIEMAP tests
is a kludge^W heuristic that will soon be replaced by
Pádraig's python-based test:

http://thread.gmane.org/gmane.comp.gnu.coreutils.general/821/focus=839

Re: coreutils-8.10 released [stable]

2011-02-05 Thread Jim Meyering

Pádraig Brady wrote:
> On 05/02/11 07:30, Jim Meyering wrote:
>> Dmitry V. Levin wrote:
>>> On Fri, Feb 04, 2011 at 08:20:39PM +0100, Jim Meyering wrote:
>>>> This is to announce coreutils-8.10, a stable release.
>>>>
>>>> There have been some minor bug fixes, along with two new features.  The
>>>> join feature is enabled via a new option, "-o auto".  The cp feature makes
>>>> copying sparse files much more efficient on several common file systems.
>>>> It takes advantage of a feature that was introduced in linux-2.6.27.
>>>> The improvement affects the default code path, so if you're looking
>>>> for risk potential, this is it.  It uses the feature if available,
>>>> and otherwise resorts to using the old, less-efficient copying code.
>>>
>>> tests/cp/fiemap-perf fails with EFBIG on tmpfs:
>>> $ truncate -s1T f
>>> truncate: failed to truncate `f' at 1099511627776 bytes: File too large
>>>
>>> Reducing the size from 1T to 256G makes the test pass.
>>
>> Thanks for the report.
>> The current df-based guard for the cp/FIEMAP tests
>> is a kludge^W heuristic that will soon be replaced by
>> Pádraig's python-based test:
>>
>> http://thread.gmane.org/gmane.comp.gnu.coreutils.general/821/focus=839
>
> Yep, just did that, and got a couple of failures for sparse-fiemap
> on and ext3 and loopback ext4 file systems.
> On a very quick glance, I think cp is OK and that the filefrag
> matching is a bit brittle.
> Attached are filefrag outputs.

I saw similar differences, but I think they were due to the fact that
the kernel had not yet forced cp's metadata update to disk when filefrag
does its FIEMAP ioctl

Uncommenting the "sync" just after the "cp" in the sparse-fiemap test
solved the problem (for me it arose only on rawhide's btrfs) but made
the test take a lot more time, as mentioned in the comment.

Instead of that sync, using filefrag's -s option may
have the same result without the unwanted overhead.

> Filesystem type is: ef53
> File size of j2 is 2048 (2 blocks, blocksize 1024)
>  ext logical physical expected length flags
>0   00   2 unknown,delalloc,eof
> j2: 1 extent found
>
> Filesystem type is: ef53
> File size of j1 is 2048 (2 blocks, blocksize 1024)
>  ext logical physical expected length flags
> j1: 1 extent found

Re: Generating pseudo-random integers

2011-02-05 Thread Jim Meyering

Pádraig Brady wrote:
> On 05/02/11 07:51, Jim Meyering wrote:
>> Melikamp T. Medley wrote:
>>
>>> Thanks!
>>>
>>> I think I want to write a utility that prints pseudo-random
>>> integers (I have in CL, but I like it fast, so this time in C),
>>> and link it with coreutils. I looked at Paul Eggert's work,
>>> and I won't be able to do any better than that if I spend
>>> years on it :) I want to print uniformly distributed integers
>>> fast, formatted, in bulk if needed. And I want to call it roll.
>>> Let me know if you have a word of advice.
>>>
>>> On 02/04/2011 11:35 PM, Eric Blake wrote:
>>>> Using just coreutils, though, I'm afraid your options are slim.  You
>>>> could use 'seq 0 9 | sort -R | head -n1' (or use shuf instead of sort
>>>> -R) to generate one random number at a time, but that gets expensive.
>>>> Or, if you don't mind limiting yourself to 62**6 values (instead of the
>>>> more traditional 2**32), you could do 'mkdir tmp; mktemp -u -p tmp
>>>> XX'.  But that's the extent of coreutils' randomness exposed to the
>>>> user.
>>
>> It's hard to justify writing a new program in C
>> when you can do the job with a Perl one-liner:
>>
>>   $ perl -le 'foreach (1..20) { print int rand 1000 }'
...
>> However, adding an *option* to shuf might make sense,
>> since you can already give shuf both a range and a count
>> of how many numbers to print.
>
> Sorry for being dense, but what option shuf is needed? formatting?

shuf -i A-B prints a permutation of a set of integers.
You can make it print a subset of that permutation, but it's
still a subset of a permutation.  I.e., when the input is a range,
the same number cannot appear more than once.

A new option would enable shuf to print N randomly selected
elements, but with repetition.  Hence N could be larger than B-A+1,
(in general, N could be larger than the number of inputs)
and the output is no longer a permutation or subset of one.
Since it doesn't need to save state, it can be more efficient.

> Though on trying shuf to generate the numbers I notice:
>
> $ shuf -i0-$((2**31)) -n1
> 1241475845
> $ shuf -i0-$((2**31)) -n2
> shuf: memory exhausted

For small N and large B-A, that code should be improved.
It certainly doesn't need to allocate so much space.

[PATCH] copy: don't let a failed lseek go undiagnosed

2011-02-05 Thread Jim Meyering

FYI,

>From 9f618068755b51d19b22c52bc4a2f8084946948e Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Sat, 5 Feb 2011 22:40:57 +0100
Subject: [PATCH] copy: don't let a failed lseek go undiagnosed

Upon failed lseek, sparse_copy_finalize would mistakenly return true.
Admittedly, that is very unlikely, since that particular lseek
is attempted only if the preceding call to sparse_copy induced
a hole at EOF (via lseek on the destination FD).  However, now
that sparse_copy has an output parameter, N_READ, there is no
longer any reason to call lseek (fd, 0, SEEK_CUR), so...
* src/copy.c (sparse_copy_finalize): Remove the function.
(copy_reg): Call ftruncate with n_read, rather than
sparse_copy_finalize with its now-unnecessary lseek.
Lasse Collin spotted the bug in sparse_copy_finalize.
---
 src/copy.c |   18 ++
 1 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f429c00..9182c16 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -233,21 +233,6 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
   return true;
 }

-/* If the file ends with a `hole' (i.e., if sparse_copy set wrote_hole_at_eof),
-   call this function to record the length of the output file.  */
-static bool
-sparse_copy_finalize (int dest_fd, char const *dst_name)
-{
-  off_t len = lseek (dest_fd, 0, SEEK_CUR);
-  if (0 <= len && ftruncate (dest_fd, len) < 0)
-{
-  error (0, errno, _("truncating %s"), quote (dst_name));
-  return false;
-}
-
-  return true;
-}
-
 /* Perform the O(1) btrfs clone operation, if possible.
Upon success, return 0.  Otherwise, return -1 and set errno.  */
 static inline int
@@ -1000,8 +985,9 @@ copy_reg (char const *src_name, char const *dst_name,
   UINTMAX_MAX, &n_read,
   &wrote_hole_at_eof)
|| (wrote_hole_at_eof &&
-   ! sparse_copy_finalize (dest_desc, dst_name)))
+   ftruncate (dest_desc, n_read) < 0))
 {
+  error (0, errno, _("failed to extend %s"), quote (dst_name));
   return_val = false;
   goto close_src_and_dst_desc;
 }
--
1.7.4.2.g597a6

[PATCH] maint: move di-set and ino-map modules from ./gl to gnulib

2011-02-07 Thread Jim Meyering

FYI, this removes the modules from gl/ and updates gnulib
to the latest (which now has these two modules added):

>From beaf631292bd74050a57b86aa3042ebaa12cf840 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 7 Feb 2011 16:24:10 +0100
Subject: [PATCH] maint: move di-set and ino-map modules from ./gl to gnulib

* gl/lib/di-set.c: Remove file.
* gl/lib/di-set.h: Likewise.
* gl/lib/ino-map.c: Likewise.
* gl/lib/ino-map.h: Likewise.
* gl/modules/di-set: Likewise.
* gl/modules/di-set-tests: Likewise.
* gl/modules/ino-map: Likewise.
* gl/modules/ino-map-tests: Likewise.
* gl/tests/test-di-set.c: Likewise.
* gl/tests/test-ino-map.c: Likewise.
* gnulib: Update to latest, now that these two modules are there.
---
 gl/lib/di-set.c  |  259 --
 gl/lib/di-set.h  |   14 ---
 gl/lib/ino-map.c |  164 -
 gl/lib/ino-map.h |   14 ---
 gl/modules/di-set|   24 -
 gl/modules/di-set-tests  |   10 --
 gl/modules/ino-map   |   24 -
 gl/modules/ino-map-tests |   10 --
 gl/tests/test-di-set.c   |   65 
 gl/tests/test-ino-map.c  |   62 ---
 gnulib   |2 +-
 11 files changed, 1 insertions(+), 647 deletions(-)
 delete mode 100644 gl/lib/di-set.c
 delete mode 100644 gl/lib/di-set.h
 delete mode 100644 gl/lib/ino-map.c
 delete mode 100644 gl/lib/ino-map.h
 delete mode 100644 gl/modules/di-set
 delete mode 100644 gl/modules/di-set-tests
 delete mode 100644 gl/modules/ino-map
 delete mode 100644 gl/modules/ino-map-tests
 delete mode 100644 gl/tests/test-di-set.c
 delete mode 100644 gl/tests/test-ino-map.c

diff --git a/gl/lib/di-set.c b/gl/lib/di-set.c
deleted file mode 100644
index 5e839a3..000
--- a/gl/lib/di-set.c
+++ /dev/null
@@ -1,259 +0,0 @@
-/* Set operations for device-inode pairs stored in a space-efficient manner.
-
-   Copyright 2009-2011 Free Software Foundation, Inc.
-
-   This program is free software: you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
-
-/* written by Paul Eggert and Jim Meyering */
-
-#include 
-#include "di-set.h"
-
-#include "hash.h"
-#include "ino-map.h"
-
-#include 
-#include 
-
-/* The hash package hashes "void *", but this package wants to hash
-   integers.  Use integers that are as large as possible, but no
-   larger than void *, so that they can be cast to void * and back
-   without losing information.  */
-typedef size_t hashint;
-#define HASHINT_MAX ((hashint) -1)
-
-/* Integers represent inode numbers.  Integers in the range
-   1..(LARGE_INO_MIN-1) represent inode numbers directly.  (The hash
-   package does not work with null pointers, so inode 0 cannot be used
-   as a key.)  To find the representations of other inode numbers, map
-   them through INO_MAP.  */
-#define LARGE_INO_MIN (HASHINT_MAX / 2)
-
-/* Set operations for device-inode pairs stored in a space-efficient
-   manner.  Use a two-level hash table.  The top level hashes by
-   device number, as there are typically a small number of devices.
-   The lower level hashes by mapped inode numbers.  In the typical
-   case where the inode number is positive and small, the inode number
-   maps to itself, masquerading as a void * value; otherwise, its
-   value is the result of hashing the inode value through INO_MAP.  */
-
-/* A pair that maps a device number to a set of inode numbers.  */
-struct di_ent
-{
-  dev_t dev;
-  struct hash_table *ino_set;
-};
-
-/* A two-level hash table that manages and indexes these pairs.  */
-struct di_set
-{
-  /* Map device numbers to sets of inode number representatives.  */
-  struct hash_table *dev_map;
-
-  /* If nonnull, map large inode numbers to their small
- representatives.  If null, there are no large inode numbers in
- this set.  */
-  struct ino_map *ino_map;
-
-  /* Cache of the most recently allocated and otherwise-unused storage
- for probing this table.  */
-  struct di_ent *probe;
-};
-
-/* Hash a device-inode-set entry.  */
-static size_t
-di_ent_hash (void const *x, size_t table_size)
-{
-  struct di_ent const *p = x;
-  dev_t dev = p->dev;
-
-  /* When DEV is wider than size_t, exclusive-OR the words of DEV into H.
- This avoids loss of info, without applying % to the wider type,
- which could be quite slow on some systems.  */
-  size_t h = dev;
-  unsigned int i;
-  un

bug#7993: Acknowledgement (cut segmentation fault with unbounded ranges)

2011-02-07 Thread Jim Meyering

> Please ignore this. It's a duplicate submission of bug# 7992

Ok.  marking it closed.

bug#7992: cut segmentation fault with unbounded ranges

2011-02-07 Thread Jim Meyering

Paul Marinescu wrote:
> In coreutils 8.9 (latest), the following commands trigger an invalid
> memory access.
>
> cut -c1234567890- --output-d=: foo
> cut -f1234567890- --output-d=: foo
> cut -b1234567890- --output-d=: foo
>
> The number 1234567890 is just a random number 'big enough' to make the
> invalid access generate a segmentation fault but the invalid access
> happens for values as low as 8 (valgrind)
>
> The problem is that ranges going to end of line (i.e., 'x-') are not
> taken into account when calculating the size of the printable_field
> vector, but their lower bound is used as an index on line 525:
>
>   if (output_delimiter_specified
>   && !complement
>   && eol_range_start && !is_printable_field (eol_range_start))

Thanks a lot for the report.
Here's a fix:

>From 43be5f4911f252ac298ac19865487f543c12db02 Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 7 Feb 2011 08:29:33 +0100
Subject: [PATCH] cut: don't segfault for large unbounded range

* src/cut.c (set_fields): When computing the maximum range endpoint,
take into consideration the start of any unbounded range, like "999-".
* NEWS (Bug fixes): Mention it.
* tests/misc/cut (big-unbounded-b,c,f): Add tests.
Reported by Paul Marinescu in http://debbugs.gnu.org/7993
The bug was introduced on 2004-12-04 via commit 7380cf79.
---
 NEWS   |6 ++
 src/cut.c  |2 ++
 tests/misc/cut |4 
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index 9c5a5a8..a367d8d 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU coreutils NEWS-*- 
outline -*-

 * Noteworthy changes in release ?.? (-??-??) [?]

+** Bug fixes
+
+  cut could segfault when invoked with a user-specified output
+  delimiter and an unbounded range like "-f1234567890-".
+  [bug introduced in coreutils-5.3.0]
+

 * Noteworthy changes in release 8.10 (2011-02-04) [stable]

diff --git a/src/cut.c b/src/cut.c
index 3f8e3e6..e2fe851 100644
--- a/src/cut.c
+++ b/src/cut.c
@@ -496,6 +496,8 @@ set_fields (const char *fieldstr)
   if (rp[i].hi > max_range_endpoint)
 max_range_endpoint = rp[i].hi;
 }
+  if (max_range_endpoint < eol_range_start)
+max_range_endpoint = eol_range_start;

   /* Allocate an array large enough so that it may be indexed by
  the field numbers corresponding to all finite ranges
diff --git a/tests/misc/cut b/tests/misc/cut
index 4353994..c905ba9 100755
--- a/tests/misc/cut
+++ b/tests/misc/cut
@@ -150,6 +150,10 @@ my @Tests =
{ERR=>$no_endpoint}],
   ['inval5', '-f', '1-,-', {IN=>''}, {OUT=>''}, {EXIT=>1}, 
{ERR=>$no_endpoint}],
   ['inval6', '-f', '-1,-', {IN=>''}, {OUT=>''}, {EXIT=>1}, 
{ERR=>$no_endpoint}],
+  # This would evoke a segfault from 5.3.0..6.10
+  ['big-unbounded-b', '--output-d=:', '-b1234567890-', {IN=>''}, {OUT=>''}],
+  ['big-unbounded-c', '--output-d=:', '-c1234567890-', {IN=>''}, {OUT=>''}],
+  ['big-unbounded-f', '--output-d=:', '-f1234567890-', {IN=>''}, {OUT=>''}],
  );

 @Tests = triple_test \@Tests;
--
1.7.4.2.g597a6

[PATCH] di-set: provide a lookup method

2011-02-07 Thread Jim Meyering

I'm about to propose a patch to patch that requires this
new method, so add it here, and I will shortly move the
di-set and ino-map modules to gnulib.

>From b9fc790ddc0db2522b0d20d08fd2b8d40ef0fa4e Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Mon, 7 Feb 2011 15:46:09 +0100
Subject: [PATCH] di-set: provide a lookup method

This is required for patch, and hence is about to move to gnulib.
* gl/lib/di-set.c (di_set_lookup): New function.
* gl/lib/di-set.h: Declare it.
* gl/tests/test-di-set.c (main): Exercise it.

The bug was introduced on 2004-12-04 via commit 7380cf79.
---
 gl/lib/di-set.c|   22 ++
 gl/lib/di-set.h|2 ++
 gl/tests/test-di-set.c |2 ++
 3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/gl/lib/di-set.c b/gl/lib/di-set.c
index 05d24d6..5e839a3 100644
--- a/gl/lib/di-set.c
+++ b/gl/lib/di-set.c
@@ -235,3 +235,25 @@ di_set_insert (struct di_set *dis, dev_t dev, ino_t ino)
   /* Put I into the inode set.  */
   return hash_insert0 (ino_set, (void *) i, NULL);
 }
+
+/* Look up the DEV,INO pair in the set DIS.
+   If found, return 1; if not found, return 0.
+   Upon any failure return -1.  */
+int
+di_set_lookup (struct di_set *dis, dev_t dev, ino_t ino)
+{
+  hashint i;
+
+  /* Map the device number to a set of inodes.  */
+  struct hash_table *ino_set = map_device (dis, dev);
+  if (! ino_set)
+return -1;
+
+  /* Map the inode number to a small representative I.  */
+  i = map_inode_number (dis, ino);
+  if (i == INO_MAP_INSERT_FAILURE)
+return -1;
+
+  /* Perform the look-up.  */
+  return !!hash_lookup (ino_set, (void const *) i);
+}
diff --git a/gl/lib/di-set.h b/gl/lib/di-set.h
index d30a31e..f05e876 100644
--- a/gl/lib/di-set.h
+++ b/gl/lib/di-set.h
@@ -10,3 +10,5 @@
 struct di_set *di_set_alloc (void);
 int di_set_insert (struct di_set *, dev_t, ino_t) _ATTRIBUTE_NONNULL_ (1);
 void di_set_free (struct di_set *) _ATTRIBUTE_NONNULL_ (1);
+int di_set_lookup (struct di_set *dis, dev_t dev, ino_t ino)
+  _ATTRIBUTE_NONNULL_ (1);;
diff --git a/gl/tests/test-di-set.c b/gl/tests/test-di-set.c
index 7fca4d4..5de8da2 100644
--- a/gl/tests/test-di-set.c
+++ b/gl/tests/test-di-set.c
@@ -42,10 +42,12 @@ main (void)
   struct di_set *dis = di_set_alloc ();
   ASSERT (dis);

+  ASSERT (di_set_lookup (dis, 2, 5) == 0); /* initial lookup fails */
   ASSERT (di_set_insert (dis, 2, 5) == 1); /* first insertion succeeds */
   ASSERT (di_set_insert (dis, 2, 5) == 0); /* duplicate fails */
   ASSERT (di_set_insert (dis, 3, 5) == 1); /* diff dev, duplicate inode is ok 
*/
   ASSERT (di_set_insert (dis, 2, 8) == 1); /* same dev, different inode is ok 
*/
+  ASSERT (di_set_lookup (dis, 2, 5) == 1); /* now, the lookup succeeds */

   /* very large (or negative) inode number */
   ASSERT (di_set_insert (dis, 5, (ino_t) -1) == 1);
--
1.7.4.2.g597a6

Re: coreutils-8.10 released [stable]

2011-02-08 Thread Jim Meyering

Pádraig Brady wrote:
> On 05/02/11 13:59, Jim Meyering wrote:
>> Pádraig Brady wrote:
>>> Yep, just did that, and got a couple of failures for sparse-fiemap
>>> on and ext3 and loopback ext4 file systems.
>>> On a very quick glance, I think cp is OK and that the filefrag
>>> matching is a bit brittle.
>>> Attached are filefrag outputs.
>>
>> I saw similar differences, but I think they were due to the fact that
>> the kernel had not yet forced cp's metadata update to disk when filefrag
>> does its FIEMAP ioctl
>>
>> Uncommenting the "sync" just after the "cp" in the sparse-fiemap test
>> solved the problem (for me it arose only on rawhide's btrfs) but made
>> the test take a lot more time, as mentioned in the comment.
>
> It seems that the sync is needed for ext4 loop back also, on my system.
> However for ext3, the extents between src and dst still don't match up.
> My systems uses a 4K block size and is 2.6.35.10-72.fc14.i686
> For now, I've disabled (indirectly) using ext3 for fiemap-perf and
> sparse-fiemap in the attached.
>
>> Instead of that sync, using filefrag's -s option may
>> have the same result without the unwanted overhead.
>
> That doesn't work actually which baffles me.
> I got the e2fsprogs source rpm to verify that
> FIEMAP_FLAG_SYNC was set, so I guess this is a kernel issue
> (on ext4 loop back at least).
> Though I was able to get a a working sync restricted to the file
> by using dd in the attached patch.
> Even with that, the test only takes about 10s on my old laptop, so
> I didn't bother tagging the test as EXPENSIVE.

Good to hear it.

> Subject: [PATCH] test: improve the fiemap_capable_ check
>
> * tests/cp/fiemap-2: Enable the fiemap check for files, which
> will enable the test on ext3.
> * tests/cp/fiemap-perf: Comment why we're not enabling for ext3.
> * tests/cp/sparse-fiemap: Ditto.
> * tests/fiemap-capable: A new python script to determine
> if a specified path supports fiemap.
> * tests/init.cfg (fiemap_capable_): Use the new python script.
> * tests/Makefile.am (EXTRA_DIST): Include the new python script.

Very nice.
You're welcome to push that.

I've run all tests only on F14/ext4 for now,
but did confirm that your new fiemap-capable script
does the job on Solaris 10 and 11 and on freebsd 8.1.

Do you think it's worth resorting to the FS-name-based
test when python is not available?

Re: coreutils-8.10 released [stable]

2011-02-08 Thread Jim Meyering

Pádraig Brady wrote:
> On 08/02/11 08:42, Jim Meyering wrote:
>> Do you think it's worth resorting to the FS-name-based
>> test when python is not available?
>
> I don't think so, because df can currently allow the test to proceed 
> erroneously:
> http://lists.gnu.org/archive/html/coreutils/2011-02/msg00018.html

That's fine (I did see that).

> Also the overlap of fiemap capable systems with python should be good.

However, when python is not available, we would skip with this
misleading diagnostic:

   fiemap_capable_ . \
 || skip_ "this file system lacks FIEMAP support"

Re: [PATCH] copy: adjust fiemap handling of sparse files

2011-02-09 Thread Jim Meyering

Pádraig Brady wrote:
> I was looking at adding fallocate() to copy.c,
> now that the fiemap code has gone in and
> I noticed that if there was allocated space
> at the end of a file, not accounted for
> in st_size, then any holes would not be detected.

Good point.

> In what other cases does the sparse detection
> heuristic fail BTW?

There are probably a few, but none that I know of.

> Anwyay, we don't need the heuristic with fiemap,
> so I changed accordingly in the attached.
...
> Subject: [PATCH] copy: adjust fiemap handling of sparse files
>
> Don't depend on heuristics to detect sparse files
> if fiemap is available.  Also don't introduce new
> holes unless --sparse=always has been specified.

Good change, in principle.

> * src/copy.c (extent_copy): Pass the user specified
> sparse mode, and handle as described above.
> Also a redundant lseek has been suppressed when
> there is no hole between two extents.

Could that be done in two separate patches?
I haven't looked closely, but when I tested it on x86_64 (F14),
it failed like this:

FAIL: cp/sparse-to-pipe (exit: 1)
=
...
+ truncate -s1M sparse
+ timeout 10 cat pipe
+ cp sparse pipe
cp: failed to extend `pipe': Invalid argument
+ fail=1
+ cmp sparse copy
cmp: EOF on copy
+ fail=1

Sounds like the change provokes an lseek on the output FD,
even though it's a pipe.

[PATCH] tests: avoid gross inefficiency in "make test"

2011-02-09 Thread Jim Meyering

Running "make -j25 check" on a nominal-12-core F14 system would
cause serious difficulty leading to an OOM kill -- and this is brand new.
It worked fine yesterday.  I tracked it down to all of the make processes
working on the "built_programs.list" (in src/Makefile.am) rule

built_programs.list:
@echo $(bin_PROGRAMS) $(bin_SCRIPTS) | tr ' ' '\n' \
  | sed -e 's,$(EXEEXT)$$,,' | $(ASSORT) -u | tr '\n' ' '

Which made me realize we were running that submake over 400 times,
once per test scripts (including skipped ones).  That's well worth
avoiding, even if it means a new temporary file.

I don't know the root cause of the OOM-kill (preceded by interminable
minutes of a seemingly hung and barely responsive system) or why it started
happening today (afaics, none of the programs involved was updated),
but this does fix it...

>From 8af39566de60b106146eea757f2611352f39544f Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Wed, 9 Feb 2011 17:08:58 +0100
Subject: [PATCH] tests: avoid gross inefficiency in "make test"

Do not run a sub-make to set up the environment for each
and every test script.  Instead, run it just once and store
the result in a file.
* tests/check.mk (built_programs): Remove definition.
(.built-programs): New rule to create the temporary file.
(CLEANFILES): Arrange to remove it.
(TESTS_ENVIRONMENT): Simply cat .built-programs, rather than
running the sub-make.
* .gitignore: Ignore it.
---
 .gitignore |1 +
 tests/check.mk |   11 ---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/.gitignore b/.gitignore
index 7fead3d..138e72a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -76,6 +76,7 @@
 /stamp-h1
 /tests/*/*.log
 /tests/t?
+/tests/.built-programs
 /tests/test-suite.log
 ID
 Makefile
diff --git a/tests/check.mk b/tests/check.mk
index f931806..b80dce8 100644
--- a/tests/check.mk
+++ b/tests/check.mk
@@ -40,8 +40,13 @@ vc_exe_in_TESTS: Makefile
 check: vc_exe_in_TESTS
 .PHONY: vc_exe_in_TESTS

-built_programs = \
-  (cd $(top_builddir)/src && MAKEFLAGS= $(MAKE) -s built_programs.list)
+CLEANFILES =
+CLEANFILES += .built-programs
+check-am: .built-programs
+.built-programs:
+   $(AM_V_GEN)cd $(top_builddir)/src   \
+&& MAKEFLAGS= $(MAKE) -s built_programs.list   \
+  > $@-t && mv $@-t $@

 # Note that the first lines are statements.  They ensure that environment
 # variables that can perturb tests are unset or set to expected values.
@@ -76,7 +81,7 @@ TESTS_ENVIRONMENT =   \
   abs_top_builddir='$(abs_top_builddir)'   \
   abs_top_srcdir='$(abs_top_srcdir)'   \
   abs_srcdir='$(abs_srcdir)'   \
-  built_programs="`$(built_programs)`" \
+  built_programs="`cat .built-programs`"   \
   host_os=$(host_os)   \
   host_triplet='$(host_triplet)'   \
   srcdir='$(srcdir)'   \
--
1.7.4.2.g597a6

parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...

2011-02-09 Thread Jim Meyering

Jim Meyering wrote:
> Running "make -j25 check" on a nominal-12-core F14 system would
> cause serious difficulty leading to an OOM kill -- and this is brand new.
> It worked fine yesterday.  I tracked it down to all of the make processes
> working on the "built_programs.list" (in src/Makefile.am) rule
>
> built_programs.list:
>   @echo $(bin_PROGRAMS) $(bin_SCRIPTS) | tr ' ' '\n' \
> | sed -e 's,$(EXEEXT)$$,,' | $(ASSORT) -u | tr '\n' ' '
>
> Which made me realize we were running that submake over 400 times,
> once per test scripts (including skipped ones).  That's well worth
> avoiding, even if it means a new temporary file.
>
> I don't know the root cause of the OOM-kill (preceded by interminable
> minutes of a seemingly hung and barely responsive system) or why it started
> happening today (afaics, none of the programs involved was updated),
> but this does fix it...

FYI,
I've tracked this down a little further.
The horrid performance (hung system and eventual OOM-kill)
are related to the use of sort above.  This is the definition:

ASSORT = LC_ALL=C sort

If I revert my earlier patch and instead simply
insist that sort not do anything in parallel,

ASSORT = LC_ALL=C sort --parallel=1

then there is no hang, and things finish in relatively good time.

I don't have a good stand-alone reproducer yet
and am out of time for today.

[PATCH] tests: print "python missing:..." diagnostic where more will see it

2011-02-09 Thread Jim Meyering

FYI,

>From c91b2a3c3b3e6ed88312ce104984f8c3778c9dea Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Wed, 9 Feb 2011 08:29:38 +0100
Subject: [PATCH] tests: print "python missing:..." diagnostic where more will 
see it

* tests/init.cfg (fiemap_capable_): Print with warn_, so that the
diagnostic shows up alongside the corresponding SKIP message.
---
 tests/init.cfg |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/init.cfg b/tests/init.cfg
index ceb9448..eb3feaa 100644
--- a/tests/init.cfg
+++ b/tests/init.cfg
@@ -301,7 +301,7 @@ require_proc_pid_status_()
 fiemap_capable_()
 {
   if ! python < /dev/null; then
-echo 'fiemap_capable_: python missing: assuming not fiemap capable' 1>&2
+warn_ 'fiemap_capable_: python missing: assuming not fiemap capable'
 return 1
   fi
   python $abs_srcdir/fiemap-capable "$@"
--
1.7.4.2.g597a6

Re: [PATCH] tests: avoid gross inefficiency in "make test"

2011-02-10 Thread Jim Meyering

Jim Meyering wrote:
> Running "make -j25 check" on a nominal-12-core F14 system would
> cause serious difficulty leading to an OOM kill -- and this is brand new.
> It worked fine yesterday.  I tracked it down to all of the make processes
> working on the "built_programs.list" (in src/Makefile.am) rule
...
> -built_programs = \
> -  (cd $(top_builddir)/src && MAKEFLAGS= $(MAKE) -s built_programs.list)
> +CLEANFILES =
> +CLEANFILES += .built-programs
> +check-am: .built-programs
> +.built-programs:
> + $(AM_V_GEN)cd $(top_builddir)/src   \
> +&& MAKEFLAGS= $(MAKE) -s built_programs.list \
> +  > $@-t && mv $@-t $@

I fixed the above rule last night (it puts the .built-programs
file in ../src rather than in "." where it's required), but somehow I
pushed the original.

Here's the fix:

>From 15571e0c8f4be1170184388f9acd8259ba75b22f Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Thu, 10 Feb 2011 10:01:23 +0100
Subject: [PATCH] tests: fix bug in preceding check.mk change

* tests/check.mk (.built-programs): Run cd'd submake in a subshell
so the redirected output ends up in the current directory, not ../src.
---
 tests/check.mk |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/check.mk b/tests/check.mk
index b80dce8..1e3ca85 100644
--- a/tests/check.mk
+++ b/tests/check.mk
@@ -44,8 +44,8 @@ CLEANFILES =
 CLEANFILES += .built-programs
 check-am: .built-programs
 .built-programs:
-   $(AM_V_GEN)cd $(top_builddir)/src   \
-&& MAKEFLAGS= $(MAKE) -s built_programs.list   \
+   $(AM_V_GEN)(cd $(top_builddir)/src  \
+&& MAKEFLAGS= $(MAKE) -s built_programs.list)  \
   > $@-t && mv $@-t $@

 # Note that the first lines are statements.  They ensure that environment
--
1.7.4.2.g597a6

Re: [PATCH] copy: adjust fiemap handling of sparse files

2011-02-10 Thread Jim Meyering

Pádraig Brady wrote:
> On 09/02/11 23:35, Pádraig Brady wrote:
 Subject: [PATCH] copy: adjust fiemap handling of sparse files

 Don't depend on heuristics to detect sparse files
 if fiemap is available.  Also don't introduce new
 holes unless --sparse=always has been specified.
>
 * src/copy.c (extent_copy): Pass the user specified
 sparse mode, and handle as described above.
 Also a redundant lseek has been suppressed when
 there is no hole between two extents.
>>>
>>> Could that be done in two separate patches?
>
> Here is the first patch split out to suppress redundant lseek()s.
> It's expanded now to suppress lseek in the source as well as dest.
>
> $ fallocate -l $((1<<29)) /tmp/t.f
> $ strace -e _llseek cp-before /tmp/t.f /dev/null 2>&1 | wc -l
> 180
> $ strace -e _llseek cp /tmp/t.f /dev/null 2>&1 | wc -l
> 0
>
> Hmm, the above suggests to me that we could use the
> FIEMAP_EXTENT_UNWRITTEN flag, so as to skip the read()
> for these allocated but unwritten (zero) extents.
> We could treat such extents like holes, except we
> would only generate holes in the dest with SPARSE_ALWAYS.
> I'll do patch for that this evening.
>
> Anyway the following small reorg will also simplify
> possible future changes to introduce fallocate().
>
> Note the attached mostly changes indenting,
> with the significant change being just:
>
> $ git diff -w --no-ext-diff HEAD~
>
> diff --git a/src/copy.c b/src/copy.c
> index 9182c16..5b6ffe3 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -334,7 +334,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
> buf_size,
>  {
>off_t ext_start = scan.ext_info[i].ext_logical;
>uint64_t ext_len = scan.ext_info[i].ext_length;
> +  uint64_t hole_size = ext_start - last_ext_start - last_ext_len;
>
> +  if (hole_size)
> +{
>if (lseek (src_fd, ext_start, SEEK_SET) < 0)
>  {
>error (0, errno, _("cannot lseek %s"), quote (src_name));
> @@ -356,11 +359,6 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t 
> buf_size,
>/* When not inducing holes and when there is a hole between
>   the end of the previous extent and the beginning of the
>   current one, write zeros to the destination file.  */
> -  if (last_ext_start + last_ext_len < ext_start)
> -{
> -  uint64_t hole_size = (ext_start
> -- last_ext_start
> -- last_ext_len);
>if (! write_zeros (dest_fd, hole_size))
>  {
>error (0, errno, _("%s: write failed"), quote 
> (dst_name));

Thanks!  that does the job and passes the tests.

While using --sparse=always is a lot less common,
it looks like there's room for improvement there, too.
I.e., we should be able to eliminate all of these
lseek calls on the output descriptor there, too:

$ strace -e lseek cp --sparse=always /tmp/t.f /tmp/t2 2>&1|wc -l
16384

Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...

2011-02-10 Thread Jim Meyering

Jim Meyering wrote:
> Jim Meyering wrote:
>> Running "make -j25 check" on a nominal-12-core F14 system would
>> cause serious difficulty leading to an OOM kill -- and this is brand new.
>> It worked fine yesterday.  I tracked it down to all of the make processes
>> working on the "built_programs.list" (in src/Makefile.am) rule
>>
>> built_programs.list:
>>  @echo $(bin_PROGRAMS) $(bin_SCRIPTS) | tr ' ' '\n' \
>>| sed -e 's,$(EXEEXT)$$,,' | $(ASSORT) -u | tr '\n' ' '
>>
>> Which made me realize we were running that submake over 400 times,
>> once per test scripts (including skipped ones).  That's well worth
>> avoiding, even if it means a new temporary file.
>>
>> I don't know the root cause of the OOM-kill (preceded by interminable
>> minutes of a seemingly hung and barely responsive system) or why it started
>> happening today (afaics, none of the programs involved was updated),
>> but this does fix it...
>
> FYI,
> I've tracked this down a little further.
> The horrid performance (hung system and eventual OOM-kill)
> are related to the use of sort above.  This is the definition:
>
> ASSORT = LC_ALL=C sort
>
> If I revert my earlier patch and instead simply
> insist that sort not do anything in parallel,
>
> ASSORT = LC_ALL=C sort --parallel=1
>
> then there is no hang, and things finish in relatively good time.
>
> I don't have a good stand-alone reproducer yet
> and am out of time for today.

Well, right after writing that, I realized the key to the misbehavior:
sort was reading from a *pipe*:

# This finishes right away, reading from input file "k":
seq 99 >k && for i in $(seq 33); do LC_ALL=C timeout 1 sort k > /dev/null & done

# When reading from a pipe, it's a very different story:
# Without the "timeout 7" prefix, the following would render an N-core
# system (5/dev/null & done

Occasionally, the above jobs all complete quickly.

My first question was why were *any* processes being spawned to handle
such a small input file.  The first answer is in the first hunk:

diff --git a/src/sort.c b/src/sort.c
index 13954cb..b9316e7 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -112,9 +112,8 @@ struct rlimit { size_t rlim_cur; };
 /* Heuristic value for the number of lines for which it is worth
creating a subthread, during an internal merge sort, on a machine
that has processors galore.  Currently this number is just a guess.
-   This value must be at least 4.  We don't know of any machine where
-   this number has any practical effect.  */
-enum { SUBTHREAD_LINES_HEURISTIC = 4 };
+   This value must be at least 4.  */
+enum { SUBTHREAD_LINES_HEURISTIC = 32 * 1024 };

 /* The number of threads after which there are
diminishing performance gains.  */

The old definition of SUBTHREAD_LINES_HEURISTIC meant that any group
of 5 or more lines would be split in two and sorted via two (or more)
separate threads.  Thus, with just 40 lines, you could get the maximum
of 8 threads working.  That is obviously not efficient, unless lines are
so pathologically long that the cost of comparing two of them approaches
the cost of creating a new process.

With the above, sort would use a more reasonable number.
Tests on high-end hardware and using very short lines
suggest that a value like 200,000 would still be conservative.

But even with that change, the above for-loop would sometimes hang.
Pursuing it further, I discovered that sort was allocating 130MB
(INPUT_FILE_SIZE_GUESS * 129, with the former being 1MiB) of space
for its input buffer, simply because it was reading from a pipe.

diff --git a/src/sort.c b/src/sort.c
index 13954cb..92c8d4e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -316,7 +315,7 @@ static size_t merge_buffer_size = MAX 
(MIN_MERGE_BUFFER_SIZE, 256 * 1024);
 static size_t sort_size;

 /* The guessed size for non-regular files.  */
-#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
+#define INPUT_FILE_SIZE_GUESS (64 * 1024)

 /* Array of directory names in which any temporary files are to be created. */
 static char const **temp_dirs;

With that, the while-loop does not hang any more.
Note again that this is on a multi-core system.

I'll investigate more tomorrow.

Re: [PATCH] copy: adjust fiemap handling of sparse files

2011-02-13 Thread Jim Meyering

Pádraig Brady wrote:
> Unfortunately, after checking BTRFS I see that fiemap
> behaves differently to EXT4. IMHO the EXT4 operation
> seems correct, and gives full info about the structure
> of a file, which cp for example can use to efficiently
> and accurately reproduce the structure at the destination.
>
> On EXT4 (on kernel-2.6.35.11-83.fc14.i686) there are no extents
> returned for holes in a file.
>
> However on btrfs it does return an extent for holes?
> So with btrfs there is no way to know an extent
> is allocated but unwritten (zero), so one must
> read and write all the data, rather than just
> fallocating the space in the destination.
>
> One can also see this with the following on btrfs:
>
> $ fallocate -l 1 unwritten
> $ truncate -s 1 sparse
> $ dd count=1000 bs=10 if=/dev/zero of=zero
>
> $ filefrag -vs *
> Filesystem type is: 9123683e
> File size of sparse is 1 (24415 blocks, blocksize 4096)
>  ext logical physical expected length flags
>0   00   24415 unwritten,eof
> sparse: 1 extent found
> File size of unwritten is 1 (24415 blocks, blocksize 4096)
>  ext logical physical expected length flags
>0   068160   12207
>1   122079256080366  12208 eof
> unwritten: 2 extents found
> File size of zeros is 1 (24415 blocks, blocksize 4096)
>  ext logical physical expected length flags
>0   019360   20678
>1   206784376040037   3737 eof
> zeros: 2 extents found
>
> So is this expected?

I heard it's a bug in F14's btrfs.
To test btrfs properly, I had to use rawhide/F15.

> Has this already been changed to match ext4?

Yes, with a new-enough kernel.

> For my own reference, I can probably skip performance
> tests on (older) btrfs by checking `filefrag` output.
> Also in `cp`, if we see an "unwritten extent" we should
> probably create a hole rather than an empty allocation
> by default.  It's better to decrease file allocation
> than increase it.

Makes sense.

Re: coreutils-8.10 released [stable]

2011-02-13 Thread Jim Meyering

Pádraig Brady wrote:

> On 08/02/11 10:14, Pádraig Brady wrote:
>> On 08/02/11 08:42, Jim Meyering wrote:
>>> Do you think it's worth resorting to the FS-name-based
>>> test when python is not available?
>>
>> I don't think so, because df can currently allow the test to proceed
>> erroneously:
>> http://lists.gnu.org/archive/html/coreutils/2011-02/msg00018.html
>> Also the overlap of fiemap capable systems with python should be good.
>
> Here's a further update to the fiemap checking in tests,
> which will enable those tests on BTRFS.
> I'm holding off on this however until we figure out
> exactly what to do about the BTRFS vs EXT4 fiemap differences.
...
> Subject: [PATCH] test: support more file systems in the cp fiemap tests
>
> * tests/cp/fiemap-perf: Check for fiemap support against a file
> rather than a directory to enable tests on BTRFS for example.
> Explicity disable the test on ext3 or file systems where we
> can't determine the type.
> * tests/cp/sparse-fiemap: Likewise.
> * tests/init.cfg: Comment that BTRFS only supports fiemap
> for regular files.

Looks good.
Running tests where they are less likely to malfunction...

Thanks!

[PATCH] stdbuf: avoid even the appearance of a possible use-after-free

2011-02-18 Thread Jim Meyering

Steve Grubb reported a possible use-after-free,
but it looks like it can happen only in case of a
damaged or incomplete installation.

Here's a proposed fix:

>From 2895f44e891472c8e86a87989e0e2d41585b006f Mon Sep 17 00:00:00 2001
From: Jim Meyering 
Date: Fri, 18 Feb 2011 23:29:14 +0100
Subject: [PATCH] stdbuf: avoid even the appearance of a possible use-after-free

There was an execution path by which "libstdbuf" could be used after
being freed, but that would happen only if there were no libstdbuf.so
alongside the stdbuf program and there had been an installation error
leading to absence of the file, PKGLIBDIR/libstdbuf.so.
* src/stdbuf.c (set_LD_PRELOAD): Rearrange loop to make it perfectly
clear that there is no possibility of use-after-free.
Steve Grubb reported this possible use-after-free of "libstdbuf".
---
 src/stdbuf.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/stdbuf.c b/src/stdbuf.c
index dce338f..607859c 100644
--- a/src/stdbuf.c
+++ b/src/stdbuf.c
@@ -209,7 +209,7 @@ set_LD_PRELOAD (void)
   char const *const *path = search_path;
   char *libstdbuf;

-  do
+  while (true)
 {
   struct stat sb;

@@ -224,8 +224,11 @@ set_LD_PRELOAD (void)
   if (stat (libstdbuf, &sb) == 0)   /* file_exists  */
 break;
   free (libstdbuf);
+
+  ++path;
+  if ( ! *path)
+error (EXIT_CANCELED, 0, _("failed to find %s"), quote (LIB_NAME));
 }
-  while (*++path);

   /* FIXME: Do we need to support libstdbuf.dll, c:, '\' separators etc?  */

--
1.7.4.1.16.g759e8

Re: git-version-gen headaches

2011-02-19 Thread Jim Meyering

Mike Frysinger wrote:
> the current git-version-gen script and its usage in packages such as coreutils
> is causing me headaches wrt distros.  it currently assumes one of two states:
>   - the current tree is a valid git repo belonging to the related project
>   - the current tree is not a git repo (all the way back to the root)
>
> the reports i'm getting are that some people like to maintain some of the
> higher directories in git (perhaps even /), and so when you unpack a coreutils
> release and attempt to compile/install it, this script is triggered since git
> will walk back from $PWD to / to find a .git dir.
>
> this results either in incorrect warnings (during install), or incorrect
> forcing of autotools to be regenerated (during maintainer/dist targets).
>
> further, some of the commands used in git-version-gen require write access to
> the .git repo (i.e. git update-index --refresh), which our package management
> tools flag as a violation (after all, building/installing packages in $PWD
> generally should not be updating files outside of $PWD).

Thanks for the report.

I've included a patch below that makes the script skip the -dirty
check (index-updating part) when appropriate.  If that's not enough,
please demonstrate precisely what's going wrong.

diff --git a/build-aux/git-version-gen b/build-aux/git-version-gen
index 68c7d64..686f703 100755
--- a/build-aux/git-version-gen
+++ b/build-aux/git-version-gen
@@ -1,6 +1,6 @@
 #!/bin/sh
 # Print a version string.
-scriptversion=2011-01-04.17; # UTC
+scriptversion=2011-02-19.19; # UTC

 # Copyright (C) 2007-2011 Free Software Foundation, Inc.
 #
@@ -80,6 +80,7 @@ nl='

 # Avoid meddling by environment variable of the same name.
 v=
+v_from_git=

 # First see if there is a tarball-only version file.
 # then try "git describe", then default.
@@ -134,24 +135,30 @@ then
 # Change the first '-' to a '.', so version-comparing tools work properly.
 # Remove the "g" in git describe's output string, to save a byte.
 v=`echo "$v" | sed 's/-/./;s/\(.*\)-g/\1-/'`;
+v_from_git=1
 else
 v=UNKNOWN
 fi

 v=`echo "$v" |sed 's/^v//'`

-# Don't declare a version "dirty" merely because a time stamp has changed.
-git update-index --refresh > /dev/null 2>&1
+# Test whether to append the "-dirty" suffix only if the version
+# string we're using came from git.  I.e., skip the test if it's "UNKNOWN"
+# or if it came from .tarball-version.
+if test -n "$v_from_git"; then
+  # Don't declare a version "dirty" merely because a time stamp has changed.
+  git update-index --refresh > /dev/null 2>&1

-dirty=`exec 2>/dev/null;git diff-index --name-only HEAD` || dirty=
-case "$dirty" in
-'') ;;
-*) # Append the suffix only if there isn't one already.
-case $v in
-  *-dirty) ;;
-  *) v="$v-dirty" ;;
-esac ;;
-esac
+  dirty=`exec 2>/dev/null;git diff-index --name-only HEAD` || dirty=
+  case "$dirty" in
+  '') ;;
+  *) # Append the suffix only if there isn't one already.
+  case $v in
+*-dirty) ;;
+*) v="$v-dirty" ;;
+  esac ;;
+  esac
+fi

 # Omit the trailing newline, so that m4_esyscmd can use the result directly.
 echo "$v" | tr -d "$nl"

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1165 matches

Mail list logo