bug#7525: bug in sort command

2010-12-01 Thread Eric Blake
On 12/01/2010 06:44 AM, Kielbasiewicz, Peter wrote:
> Hello,
> there seems to be a bug in Ubuntu's 10.10  sort command.
> I suspect that it defaults to the  -f option  now which I think is wrong.

Thanks for the report.  However, this is not a bug in sort, but a
problem of your current choice of locale.  It is also a FAQ:

http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

> e.g.
> {  echo a
>echo j
>echo A
>echo i
>echo AA
>echo B
> } | sort

Running with the recently introduced 'sort --debug' option sheds some
light on your situation:

$ printf 'a\nj\nA\ni\nAA\nB\n' | src/sort --debug
src/sort: using `en_US.UTF-8' sorting rules
a
_
A
_
AA
__
B
_
i
_
j
_

ebl...@office (0 0) ~/coreutils
$ printf 'a\nj\nA\ni\nAA\nB\n' | LC_ALL=C src/sort --debug
src/sort: using simple byte comparison
A
_
AA
__
B
_
a
_
i
_
j
_


-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


bug#7525: bug in sort command

2010-12-01 Thread Paul Eggert
"sort --help" says:

  *** WARNING ***
  The locale specified by the environment affects sort order.
  Set LC_ALL=C to get the traditional sort order that uses
  native byte values.

and this most likely explains your situation.





bug#7523: chmod example in docs

2010-12-01 Thread Eric Blake
[re-adding the list]

On 12/01/2010 09:24 AM, nik...@email.com wrote:
> Hi Eric,
> 
> 
> As much as I would love to contribute code to the open source community,
> unfortunately I have no idea how to code.

Even so, your suggestions in English are a good start for telling us
what you found to be lacking.

> 
> 'chown' has a very easy to understand example style to reflect off.
> 
> 
> Also, another stupid thing in the 'chmod' manual is the following:
> 
> ***
> 
> The format of a symbolic mode is  [ugoa...][[+-=][perms...]...],  where
>perms  is  either zero or more letters from the set rwxXst, or a
> single
>letter from the set ugo.  Multiple symbolic modes can be  given,
> sepa‐
>   rated by commas.
> 
>A  combination  of the letters ugoa controls which users' access
> to the
>file will be changed: the user who owns it  (u),  other  users 
> in  the
>file's group (g), other users not in the file's group (o), or all
> users
>(a).  If none of these are given, the effect is as if a were
> given, but
>   bits that are set in the umask are not affected.
> 
> ***
> 
> The above states three dots after 'ugoa' ([ugoa...]). From my understand
> this parameter has the options of 'u' 'g' 'o' 'a' only, therefore, there
> should not be three dots (...) in [ugoa...] as all the parameter options
> have been specified. This was a little bit confusing at the start.

Actually, it _should_ be 3 dots, because our convention is that 3 dots
imply that you can repeat one of the earlier items more than once.  That is:

chmod go-rw file

or even the (redundant) version:

chmod ggoo-rrww file

are both perfectly acceptable (multiple instances from the set [ugoa],
then [-], then multiple instances from the set of [PERMS]).

> 
> GNU manuals are full of this weird kinda logic, or I am not
> understanding something.

Hmm; I just noticed that 'info coreutils "File permissions"' gives a
much better overview of chmod arguments.  I stand corrected on my
earlier claim; the manual already states this under 'info coreutils chmod':

   If used, MODE specifies the new file mode bits.  For details, see
the section on *note File permissions::.

I suggest you read that chapter.

> I've gotta give it to Microsoft, they get their manuals right. With GNU
> it feels like I've bought an awesome product from China, only to find
> the user manual is in broken English. It is essential that the Linux
> community have the same high standards with user manuals like does
> Microsoft, if we are to win the Windows users over.

The man pages assume you already know Unix-like operations.  The info
pages, on the other hand, should cater to new users; if you have
suggestions on how we can improve that, we are all ears.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


bug#7523: chmod example in docs

2010-12-01 Thread Paul Eggert
>> > I've gotta give it to Microsoft, they get their manuals right.

Sorry, but I had to laugh at that one.

Here's a quote from a Microsoft manual on this very topic:

  chmod   A UNIX command meaning "change module."

which is bogus, of course: even someone with only passing acquaintance
with chmod should know that the "mod" doesn't stand for module.
(Even Wikipedia's page on chmod is better that Microsoft's on this point.)

A note at the start of that Microsoft page says:

  "No warranty is made as to technical accuracy."

which is a remark that you can take the bank, when reading
Microsoft manuals.

I'm not saying GNU manuals are perfect.  Far from it!  But
we should not aim for being merely as good as Microsoft manuals,
as that would be far too low a target.  In many cases they're
not much better than Wikipedia, and all too often they're worse.

My sources:

http://technet.microsoft.com/en-us/library/cc749930.aspx
http://en.wikipedia.org/wiki/Chmod





bug#7529: Bug#605639: deal better with different filesystem timestamp resolutions

2010-12-01 Thread jidanni
X-Debbugs-cc: bug-coreutils@gnu.org, bug-m...@gnu.org
Package: coreutils
Version: 8.5-1

man cp says:
`-u'
`--update'
 Do not copy a non-directory that has an existing destination with
 the same or newer modification time.  If time stamps are being
 preserved, the comparison is to the source time stamp truncated to
 the resolutions of the destination file system and of the system
 calls used to update time stamps; this avoids duplicate work if
 several `cp -pu' commands are executed with the same source and
 destination.

But it seems that isn't working too much/well,

$ touch /tmp/f
$ /bin/cp -avu /tmp/f .
`/tmp/f' -> `./f'
$ /bin/cp -avu /tmp/f .
`/tmp/f' -> `./f'
$ /bin/cp -avu /tmp/f .
`/tmp/f' -> `./f'
$ ls -l --full-time f /tmp/f
-rw-r--r-- 1 jidanni jidanni 0 2010-12-02 08:25:47.682527260 +0800 /tmp/f
-rw-r--r-- 1 jidanni jidanni 0 2010-12-02 08:25:47.0 +0800 f
$ mount
/dev/sda6 on /home type ext3 (rw)
tmpfs on /tmp type tmpfs (rw)

It might work great f -> /tmp/f, but not the other way around.

By the way, make(1) lacks any of this time comparison resolution
machinery at all! I'll CC them.







bug#7529: Bug#605639: deal better with different filesystem timestamp resolutions

2010-12-01 Thread Paul Eggert
Good eye!  Thanks for the bug report and example.  I installed
the following one-byte patch into gnulib; please give it a try.
It should propagate into coreutils the next time coreutils
updates from gnulib.

A test case for this would require two file systems, one with
finer-grained time stamps than the other, where we can create
files in the latter.  I suspect this goes beyond what coreutils's
test cases can easily do.

>From 409c6b774c25afce33f8b67fbf7af3eb3304f6cf Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 1 Dec 2010 21:25:56 -0800
Subject: [PATCH] utimecmp: fine-grained src to nearby coarse-grained dest

* lib/utimecmp.c (utimecmp): When UTIMECMP_TRUNCATE_SOURCE is set,
and the source is on a file system with higher-resolution time
stamps, than the destination, and _PC_TIMESTAMP_RESOLUTION does
not work, and the time stamps are close together, the algorithm to
determine the exact resolution from the read-back mtime was buggy:
it had a "!=" where it should have had an "==".  This bug has been
in the code ever since it was introduced to gnulib.
Problem reported by Dan Jacobson in
.
---
 ChangeLog  |   14 ++
 lib/utimecmp.c |2 +-
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d4eb684..67e2977 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2010-12-01  Paul Eggert  
+
+   utimecmp: fine-grained src to nearby coarse-grained dest
+
+   * lib/utimecmp.c (utimecmp): When UTIMECMP_TRUNCATE_SOURCE is set,
+   and the source is on a file system with higher-resolution time
+   stamps, than the destination, and _PC_TIMESTAMP_RESOLUTION does
+   not work, and the time stamps are close together, the algorithm to
+   determine the exact resolution from the read-back mtime was buggy:
+   it had a "!=" where it should have had an "==".  This bug has been
+   in the code ever since it was introduced to gnulib.
+   Problem reported by Dan Jacobson in
+   .
+
 2010-11-30  Bruno Haible  
 
strerror_r-posix: Fix autoconf test.
diff --git a/lib/utimecmp.c b/lib/utimecmp.c
index 63a0c9a..8c3ca65 100644
--- a/lib/utimecmp.c
+++ b/lib/utimecmp.c
@@ -325,7 +325,7 @@ utimecmp (char const *dst_name,
 
 res = SYSCALL_RESOLUTION;
 
-for (a /= res; a % 10 != 0; a /= 10)
+for (a /= res; a % 10 == 0; a /= 10)
   {
 if (res == BILLION)
   {
-- 
1.7.2






bug#7489: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors

2010-12-01 Thread Paul Eggert
On 11/30/2010 10:16 PM, Paul Eggert wrote:

> Invoke MAX_MERGE(total, level) with level == 15.
> 2 << level yields 65536, and 65536 * 65536 overflows to zero.

I managed to reproduce this bug on a (faked) host with
32768 processors, using a command like this:

  seq 10 | sort --parallel=32768 -S 10G

The result was a floating point exception (actually, a division
by zero) and 'sort' crashed.

However, the bug is timing dependent and is very hard to
reproduce.  I tried many more times to reproduce it, and
they all failed.

This proved to my satisfaction that it is a real bug, though,
so I pushed the following patch.

>From 1561c2b228d93a049e527824e14ad4fe8c256b52 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 1 Dec 2010 21:50:00 -0800
Subject: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors

* src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine
with (say) 32-bit int and 64-bit size_t and when level == 15.
Without this fix, on such a machine with 32768 or more processors,
the level computation could overflow on large input, and this
would result in division by zero.
---
 src/sort.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 1aa1eb4..5c368cd 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -107,7 +107,7 @@ struct rlimit { size_t rlim_cur; };
 /* Maximum number of lines to merge every time a NODE is taken from
the MERGE_QUEUE.  Node is at LEVEL in the binary merge tree,
and is responsible for merging TOTAL lines. */
-#define MAX_MERGE(total, level) ((total) / ((2 << level) * (2 << level)) + 1)
+#define MAX_MERGE(total, level) (((total) >> (2 * ((level) + 1))) + 1)
 
 /* Heuristic value for the number of lines for which it is worth
creating a subthread, during an internal merge sort, on a machine
-- 
1.7.2