[bug #22109] Enhancement request: tr should be able to take multiple consecutive actions

2008-01-25 Thread James Youngman

Follow-up Comment #1, bug #22109 (project coreutils):

Could you provide some more justification than would be nice please?

Implementing the change you suggest would significantly complicate the
implementation of tr, which is already quite fearsome; see for example the
recent discussion of the correct processing of tr [:lower:] [:upper:] in
locales where the number of upper-case letters is different to the number of
lower-case letters.   

Such additional complexity doesn't come for free.  Somebody has to implement
it, the coreutils maintainers have to take the time to understand the new
implementation, they have to maintain and fix bugs in it.   Then the Info
documentation and the manual page need to be updated to explain the changed
functionality in such a way that any corner-cases are identified.   For
example, what does this do?  

tr -dc ABC -s

Does that do the same as this?

tr -d ABC -c -s

What about tr -d ABC -c -s 123?  Is 123 SET1 or SET2?  

I think the documentation task will be non-trivial.  Non-trivial
documentation will probably result, making it harder for people to understand.
 So everybody gets harder-to-understand documentation in the name of a
convenience feature.

The complex documentation and more complex implementation opens an
opportunity for the documentation to describe something slightly different to
the implementation.  When somebody notices the problem, which should get
fixed?

But this feature could indeed be convenient.  It could be sufficiently
convenient for many people to use it a lot.   Who will field their questions
when their script doesn't work - or silently does something different - on
other systems?

In summary, maybe nobody should change any piece of software, ever.   Except
maybe to fix bugs.  Carefully.  :)


___

Reply to this item at:

  http://savannah.gnu.org/bugs/?22109

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort: memory exhausted with 50GB file

2008-01-25 Thread Paul Eggert
Leo Butler [EMAIL PROTECTED] writes:

 I don't know if this is relevant, but I have extracted the 2nd through 1000th 
 character in the 50GB file, and there appears to be garbage (unprintable 
 chars) 
 in the first line. The remainder of the extract looks fine. Moreover, I split 
 the file into 500MB chunks, sorted these and then merge sorted the pairs. It 
 appears that the 500MB chunks produced by split have been stripped of '\n' 
 and 
 are garbage, as are the sorted files.

Hmm, it sounds like your input data has some very long lines, then.
That would explain at least part of your problem, then.  'sort' needs
to keep at least two lines in main memory to compare them: if single
input lines are many gigabytes long, then 'sort' must consume many
gigabytes of memory, regardless of what parameter you specify with '-S'.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort: memory exhausted with 50GB file

2008-01-25 Thread Paul Eggert
Leo Butler [EMAIL PROTECTED] writes:

 sort rapidly chews up about 40-50% of total physical memory

That's weird.  It shouldn't do that.  It doesn't do that on my machine
(Debian stable x86, coreutils 6.10, compiled with GCC 4.2.2).  Memory
usage goes up to 250 MB (as requested) and stays there.  'sort'
creates temp files of size 201,061,873 bytes, each containing
2,545,087 lines.  Here's the command I used to try to reproduce the
problem:

  awk 'BEGIN {for (i=0;i467289720;i++) print -16 -2 -14 -5 1 1 0 
0.3080808080808057 0 0.1540404040404028 0.3904338415207971}' |
  sort -S 250M -k 6,6n -k 7,7n -k 8,8n -k 9,9n -k 10,10n -k 11,11n -T /tmp -T 
$HOME/junk -o /tmp/foo

I suppose it could be a locale problem.  What's the output of the
locale command?

Can you run 'sort' under GDB, and see what the stack backtrace looks
like when 'sort' fails?

Are you compiling for x86 or x86-64?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: new snapshot [Re: coreutils 6.9.92 fail to configure on *bsd

2008-01-25 Thread Jim Meyering
Elias Pipping [EMAIL PROTECTED] wrote:
 On Wed, Jan 23, 2008 at 01:40:22PM +0100, Jim Meyering wrote:
 If that's the problem, here's an untested fix:

 Unfortunately, that doesn't seem to help.

Thanks for checking.
That suggests there's a more fundamental problem.

Please do this as root:

  cd coreutils-6.10/src
  ./id -a
  ./rm -rf f g
  echo a  f
  ./chown +0:+0 f
  ls -ld . f
  ./cp f g
  ls -l g

and look at the output.
The final ls should show g with group root.
If not, please repeat but with this in place of the
./cp command above:

  strace -o log ./cp f g

and then send the output as well as the contents of log
to the list.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[PATCH] Add jfbterm to the dircolors TERM list

2008-01-25 Thread Mike Frysinger
---
 src/dircolors.hin |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/dircolors.hin b/src/dircolors.hin
index 838fa8f..3fb5a2f 100644
--- a/src/dircolors.hin
+++ b/src/dircolors.hin
@@ -30,6 +30,7 @@ TERM dtterm
 TERM eterm-color
 TERM gnome
 TERM gnome-256color
+TERM jfbterm
 TERM konsole
 TERM kterm
 TERM linux
-- 
1.5.3.8



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: signbit glitch (coreutils 6.10, Solaris 8 sparc, GCC 4.2.2)

2008-01-25 Thread Bruno Haible
Paul Eggert wrote:
 I ran into the following minor glitch when compiling coreutils 6.10 on
 Solaris 8 sparc with GCC 4.2.2:
 
 vasnprintf.c: In function 'vasnprintf':
 vasnprintf.c:2196: warning: implicit declaration of function 'signbit'

Thanks for reporting this.

 Here is a patch to the gnulib signbit module to work around this
 problem:
 
 2008-01-24  Paul Eggert  [EMAIL PROTECTED]
 
   * m4/signbit.m4 (gl_SIGNBIT): Check that signbit is a macro.
   This suppresses a warning when compiling coreutils 6.10 with
   GCC 4.2.2 on Solaris 8.  In that combination, math.h does not
   define signbit, but GCC implements it internally (there is no
   library function) and issues a warning.

If GCC implements it as a built-in, with your patch, we'll reject the
built-in and provide substitutes in the form of functions. But it's
more efficient to use the GCC built-ins then. The warning goes away
if one uses __builtin_signbit instead of signbit.

I'm applying this:

2008-01-25  Paul Eggert  [EMAIL PROTECTED]
Bruno Haible  [EMAIL PROTECTED]

* m4/signbit.m4 (gl_SIGNBIT): Require a macro definition. Test whether
the GCC builtins for signbits are present and set
REPLACE_SIGNBIT_USING_GCC if so.
* lib/math.in.h (signbit): Define using GCC builtins if
REPLACE_SIGNBIT_USING_GCC is set.
* m4/math_h.m4 (gl_MATH_H_DEFAULTS): Initialize
REPLACE_SIGNBIT_USING_GCC.
* modules/math (Makefile.am): Substitute REPLACE_SIGNBIT_USING_GCC.

*** lib/math.in.h.orig  2008-01-26 02:25:45.0 +0100
--- lib/math.in.h   2008-01-26 00:41:20.0 +0100
***
*** 353,358 
--- 353,366 
  
  
  #if @GNULIB_SIGNBIT@
+ # if @REPLACE_SIGNBIT_USING_GCC@
+ #  undef signbit
+/* GCC 4.0 and newer provides three built-ins for signbit.  */
+ #  define signbit(x) \
+(sizeof (x) == sizeof (long double) ? __builtin_signbitl (x) : \
+ sizeof (x) == sizeof (double) ? __builtin_signbit (x) : \
+ __builtin_signbitf (x))
+ # endif
  # if @REPLACE_SIGNBIT@
  #  undef signbit
  extern int gl_signbitf (float arg);
*** m4/math_h.m4.orig   2008-01-26 02:25:45.0 +0100
--- m4/math_h.m42008-01-26 00:40:25.0 +0100
***
*** 1,5 
! # math_h.m4 serial 8
! dnl Copyright (C) 2007 Free Software Foundation, Inc.
  dnl This file is free software; the Free Software Foundation
  dnl gives unlimited permission to copy and/or distribute it,
  dnl with or without modifications, as long as this notice is preserved.
--- 1,5 
! # math_h.m4 serial 9
! dnl Copyright (C) 2007-2008 Free Software Foundation, Inc.
  dnl This file is free software; the Free Software Foundation
  dnl gives unlimited permission to copy and/or distribute it,
  dnl with or without modifications, as long as this notice is preserved.
***
*** 36,65 
GNULIB_TRUNCF=0;   AC_SUBST([GNULIB_TRUNCF])
GNULIB_TRUNCL=0;   AC_SUBST([GNULIB_TRUNCL])
dnl Assume proper GNU behavior unless another module says otherwise.
!   HAVE_DECL_ACOSL=1;AC_SUBST([HAVE_DECL_ACOSL])
!   HAVE_DECL_ASINL=1;AC_SUBST([HAVE_DECL_ASINL])
!   HAVE_DECL_ATANL=1;AC_SUBST([HAVE_DECL_ATANL])
!   HAVE_DECL_COSL=1; AC_SUBST([HAVE_DECL_COSL])
!   HAVE_DECL_EXPL=1; AC_SUBST([HAVE_DECL_EXPL])
!   HAVE_DECL_FREXPL=1;   AC_SUBST([HAVE_DECL_FREXPL])
!   HAVE_DECL_LDEXPL=1;   AC_SUBST([HAVE_DECL_LDEXPL])
!   HAVE_DECL_LOGL=1; AC_SUBST([HAVE_DECL_LOGL])
!   HAVE_DECL_SINL=1; AC_SUBST([HAVE_DECL_SINL])
!   HAVE_DECL_SQRTL=1;AC_SUBST([HAVE_DECL_SQRTL])
!   HAVE_DECL_TANL=1; AC_SUBST([HAVE_DECL_TANL])
!   HAVE_DECL_TRUNC=1;AC_SUBST([HAVE_DECL_TRUNC])
!   HAVE_DECL_TRUNCF=1;   AC_SUBST([HAVE_DECL_TRUNCF])
!   HAVE_DECL_TRUNCL=1;   AC_SUBST([HAVE_DECL_TRUNCL])
!   REPLACE_CEILF=0;  AC_SUBST([REPLACE_CEILF])
!   REPLACE_CEILL=0;  AC_SUBST([REPLACE_CEILL])
!   REPLACE_FLOORF=0; AC_SUBST([REPLACE_FLOORF])
!   REPLACE_FLOORL=0; AC_SUBST([REPLACE_FLOORL])
!   REPLACE_FREXP=0;  AC_SUBST([REPLACE_FREXP])
!   REPLACE_FREXPL=0; AC_SUBST([REPLACE_FREXPL])
!   REPLACE_ISFINITE=0;   AC_SUBST([REPLACE_ISFINITE])
!   REPLACE_LDEXPL=0; AC_SUBST([REPLACE_LDEXPL])
!   REPLACE_ROUND=0;  AC_SUBST([REPLACE_ROUND])
!   REPLACE_ROUNDF=0; AC_SUBST([REPLACE_ROUNDF])
!   REPLACE_ROUNDL=0; AC_SUBST([REPLACE_ROUNDL])
!   REPLACE_SIGNBIT=0;AC_SUBST([REPLACE_SIGNBIT])
  ])
--- 36,66 
GNULIB_TRUNCF=0;   AC_SUBST([GNULIB_TRUNCF])
GNULIB_TRUNCL=0;   AC_SUBST([GNULIB_TRUNCL])
dnl Assume proper GNU behavior unless another module says otherwise.
!   HAVE_DECL_ACOSL=1;   AC_SUBST([HAVE_DECL_ACOSL])
!   HAVE_DECL_ASINL=1;   AC_SUBST([HAVE_DECL_ASINL])
!   HAVE_DECL_ATANL=1;   AC_SUBST([HAVE_DECL_ATANL])
!   HAVE_DECL_COSL=1;AC_SUBST([HAVE_DECL_COSL])
!   HAVE_DECL_EXPL=1;AC_SUBST([HAVE_DECL_EXPL])
!   HAVE_DECL_FREXPL=1;  

[bug #22121] chcon help output refers to lchown system call

2008-01-25 Thread Göran Uddeborg

URL:
  http://savannah.gnu.org/bugs/?22121

 Summary: chcon help output refers to lchown system call
 Project: GNU Core Utilities
Submitted by: goeran
Submitted on: fredag 2008-01-25 den 21:35
Category: None
Severity: 3 - Normal
  Item Group: None
  Status: None
 Privacy: Public
 Assigned to: None
 Open/Closed: Open
 Discussion Lock: Any

___

Details:

In the description of the -h flag to chcon, it says it is available only on
systems with the lchown system call.  That appears to be a bit too much
copy-and-paste from the chown command.  chcon would use the lsetxattr system
call instead, I believe.




___

Reply to this item at:

  http://savannah.gnu.org/bugs/?22121

___
  Meddelandet skickades via/av Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Options --enable{,-no}-install-program

2008-01-25 Thread James Youngman
On Jan 8, 2008 5:46 PM, Jim Meyering [EMAIL PROTECTED] wrote:

 If you're really motivated, there's another minor problem:
 the include/exclude mechanism operates only on bin_PROGRAMS,
 and not bin_SCRIPTS.  That means --enable-no-install-program=groups
 doesn't work, since groups is a script.

I had a go at this and didn't come up with a result I found pleasing.
So instead I took a sledgehammer to the walnut.  The patch is attached
(sorry, but it's safer that way because of my non-configurable
tab-mangling mail client) and the ChangeLog entry is appended.
Please note that I added a NEWS file entry; no doubt you will need to
fix the version number in there, the file format demands one but I had
no idea what to use.

If you don't like the change, let me know and I'll resubmit just the
segv fix as a separate patch.

Thanks,
James.

2008-01-25  James Youngman  [EMAIL PROTECTED]

Replace groups.sh with groups.c.
* src/groups.c (main): New file, replacing groups.sh.
* src/group-list.c, src/group-list.h: New files, factored out of id.c,
implementing the functionality that id and groups have in common.
* src/id.c (print_full_info): Avoid a segfault when trying to print
an error message if getgroups fails.
(print_group_list): Move to group-list.c.
(print_group): Likewise.
* man/Makefile.am: When building groups.1, obtain the help text
from src/groups.c, not src/groups.sh.
* doc/coreutils.texi (groups: Print group names a user is in):
Explain why groups and groups $(id -un) give different results
in existing login sessions after you change the group database.
(id: Print user identity): Likewise for id.
From 0b4479f064d5b66431376cb4dac506155b2b97fa Mon Sep 17 00:00:00 2001
From: James Youngman [EMAIL PROTECTED]
Date: Fri, 25 Jan 2008 16:05:52 +
Subject: [PATCH] Replace groups.sh with groups.c.

2008-01-25  James Youngman  [EMAIL PROTECTED]

	Replace groups.sh with groups.c.
	* src/groups.c (main): New file, replacing groups.sh.
	* src/group-list.c, src/group-list.h: New files, factored out of id.c,
	implementing the functionality that id and groups have in common.
	* src/id.c (print_full_info): Avoid a segfault when trying to print
	an error message if getgroups fails.
	(print_group_list): Move to group-list.c.
	(print_group): Likewise.
	* man/Makefile.am: When building groups.1, obtain the help text
	from src/groups.c, not src/groups.sh.
	* doc/coreutils.texi (groups: Print group names a user is in):
	Explain why groups and groups $(id -un) give different results
	in existing login sessions after you change the group database.
	(id: Print user identity): Likewise for id.
---
 NEWS   |9 +++
 doc/coreutils.texi |   17 ++-
 man/Makefile.am|2 +-
 src/Makefile.am|   21 ++--
 src/group-list.c   |  130 
 src/group-list.h   |   21 +++
 src/groups.c   |  152 
 src/groups.sh  |   84 -
 src/id.c   |  105 +---
 9 files changed, 360 insertions(+), 181 deletions(-)
 create mode 100644 src/group-list.c
 create mode 100644 src/group-list.h
 create mode 100644 src/groups.c
 delete mode 100755 src/groups.sh

diff --git a/NEWS b/NEWS
index e9fcd61..e1d34e0 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,14 @@
 GNU coreutils NEWS-*- outline -*-
 
+* Noteworthy changes in release 6.11 (-??-??) [beta]
+
+** Bug fixes
+
+   configure --enable-no-install-program=groups now works.  
+
+   groups -- foo no longer generates a spurious error about the
+   nonexistent group --.
+
 * Noteworthy changes in release 6.10 (-??-??) [stable]
 
 ** Bug fixes
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 3785eae..0736b29 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -12221,6 +12221,12 @@ Print only the user ID.
 
 @exitstatus
 
+Primary and supplementary groups for a process are normally inherited
+from its parent and are usually unchanged since login.  This means
+that if you change the group database after logging in, @command{id}
+will not reflect your changes within your existing login session.  
+Running @command{id} with a user argument causes the user and group
+database to be consulted afresh, and so will give a different result.
 
 @node logname invocation
 @section @command{logname}: Print current login name
@@ -12270,7 +12276,8 @@ options}.
 groups for each given @var{username}, or the current process if no names
 are given.  If more than one name is given, the name of each user is
 printed before
-the list of that user's groups.  Synopsis:
+the list of that user's groups and the user name is separated from the
+group list by a colon.  Synopsis:
 
 @example
 groups [EMAIL PROTECTED]@dots{}
@@ -12278,6 +12285,14 @@ groups [EMAIL PROTECTED]@dots{}
 
 The 

Re: [bug #22109] Enhancement request: tr should be able to take multiple consecutive actions

2008-01-25 Thread Bauke Jan Douma

Richard Neill wrote on 25-01-08 14:57:

Only that multiple instances of tr, piped together is rather ugly, and 


I, personally, strongly disagree.

bjd


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[bug #22109] Enhancement request: tr should be able to take multiple consecutive actions

2008-01-25 Thread Eric Blake

Update of bug #22109 (project coreutils):

Severity:  3 - Normal = 1 - Wish   
  Status:None = Wont Fix   
 Open/Closed:Open = Closed 

___

Follow-up Comment #2:

Closed per response on the mailing list.

___

Reply to this item at:

  http://savannah.gnu.org/bugs/?22109

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[bug #22017] zic can generate files which crash the linux kernel

2008-01-25 Thread Eric Blake

Update of bug #22017 (project coreutils):

  Status:None = Invalid
 Open/Closed:Open = Closed 


___

Reply to this item at:

  http://savannah.gnu.org/bugs/?22017

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: reversing the order in which lines appear

2008-01-25 Thread Pádraig Brady
Francky Leyn wrote:
 Hello,
 
 I'm seeking for some way to revert the order of lines. The first line
 read thus must get out last and the last read must be output first.
 
 Although it's easy to write a program for it, I'm wondering if this
 is not possible with the UNIX toolbox. Any ideas?
 
 I first tought of something like cat -r, where the r stand for reverse.
 Is this a good suggestion, or a bad one?

I bet you'll laugh :)

`tac` is the command you want.

In general this sort of question is well answered by `apropos reverse`

thanks,
Pádraig.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: reversing the order in which lines appear

2008-01-25 Thread Eric Blake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Francky Leyn on 1/25/2008 8:49 AM:
| I first tought of something like cat -r, where the r stand for reverse.
| Is this a good suggestion, or a bad one?

So good, that it is already implemented, as 'tac'.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmgay84KuGfSFAYARAipOAJ0dUbAS63a6ugAMPYVw2QaFPFrRtwCgg9Lo
VJmgLFJ9rosKwBSGZZajg9A=
=ZpfB
-END PGP SIGNATURE-


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[bug #21999] rm fails completely on systems that don't have dirfd()

2008-01-25 Thread Jim Meyering

Update of bug #21999 (project coreutils):

  Status:None = Wont Fix   
 Open/Closed:Open = Closed 

___

Follow-up Comment #6:

as discussed via email, Alan has opted to implement dirfd in his experimental
C library, so I'm closing this.  You're welcome to reopen if you need
anything.

___

Reply to this item at:

  http://savannah.gnu.org/bugs/?21999

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Options --enable{,-no}-install-program

2008-01-25 Thread Jim Meyering
James Youngman [EMAIL PROTECTED] wrote:

 On Jan 8, 2008 5:46 PM, Jim Meyering [EMAIL PROTECTED] wrote:

 If you're really motivated, there's another minor problem:
 the include/exclude mechanism operates only on bin_PROGRAMS,
 and not bin_SCRIPTS.  That means --enable-no-install-program=groups
 doesn't work, since groups is a script.

 I had a go at this and didn't come up with a result I found pleasing.
 So instead I took a sledgehammer to the walnut.  The patch is attached
 (sorry, but it's safer that way because of my non-configurable
 tab-mangling mail client) and the ChangeLog entry is appended.
 Please note that I added a NEWS file entry; no doubt you will need to
 fix the version number in there, the file format demands one but I had
 no idea what to use.

 If you don't like the change, let me know and I'll resubmit just the
 segv fix as a separate patch.

 Thanks,
 James.

 2008-01-25  James Youngman  [EMAIL PROTECTED]

 Replace groups.sh with groups.c.
 * src/groups.c (main): New file, replacing groups.sh.
 * src/group-list.c, src/group-list.h: New files, factored out of id.c,
 implementing the functionality that id and groups have in common.
 * src/id.c (print_full_info): Avoid a segfault when trying to print
 an error message if getgroups fails.
 (print_group_list): Move to group-list.c.
 (print_group): Likewise.
 * man/Makefile.am: When building groups.1, obtain the help text
 from src/groups.c, not src/groups.sh.
 * doc/coreutils.texi (groups: Print group names a user is in):
 Explain why groups and groups $(id -un) give different results
 in existing login sessions after you change the group database.
 (id: Print user identity): Likewise for id.

Good idea!
Thanks for working on this.
I'll take a look in the next few days.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort: memory exhausted with 50GB file

2008-01-25 Thread Bob Proulx
Leo Butler wrote:
 -16 -2 -14 -5 1 1 0 0.3080808080808057 0 0.1540404040404028 0.3904338415207971

That should be fine.

 I have a dual processor machine, with each processor being an Intel Core 2 
 Duo E6850, rated at 3GHz and cache 4096 kB, with 3.8GB total physical 
 memory and 4GB swap space and two partitions on the hdd with 200GB and 
 140GB available space.

Sounds like a very nice machine.

 I am using sort v. 5.2.1 and v. 6.1  v. 6.9. The former is installed as 
 part of the RHEL OS and the latter two were compiled from the source at 
 http://ftp.gnu.org/gnu/coreutils/ with the gcc v. 3.4.6 compiler.

All good so far.  To nail down two more details, could you provide the
output of these commands?

  uname -a

  ldd --version | head -n1

  file /usr/bin/sort ./sort

That will give us the kernel and libc versions.  That last will report
whether the binary programs are 32-bit or 64-bit.

 When I attempt to sort the file, with a command like
 
 ./sort -S 250M -k 6,6n -k 7,7n -k 8,8n -k 9,9n -k 10,10n -k 11,11n -T /data 
 -T /data2 -o out.sort in.txt
 
 sort rapidly chews up about 40-50% of total physical memory (=1.5-1.9GB) at 
 which point the error message 'sort: memory exhausted' appears. This 
 appears to be independent of the parameter passed through the -S option.
 ...
 Is this an idiosyncratic problem?

That is very strange.  If by idiosyncratic do you mean is this
particular to your system?  Probably.  Because I have routinely sorted
large files without problem.  But that doesn't mean it isn't a bug.

At 50G the data file is very large compared to your 4G of physical
memory.  This means that sort cannot sort it in memory.  It will open
temporary files and sort a large chunk to one file and then another
and then another as a first pass splitting up the input file into many
sorted chunks.  As a second pass it will merge-sort the sorted chunks
together into the output file.

What is the output of this command on your system?

  sysctl vm.overcommit_memory

I am asking because by default the linux kernel overcommits memory and
does not return out of memory conditions.  Instead the process (or
some other one) is killed by the linux out-of-memory killer.  But
enterprise systems will be configured with overcommit disabled for
reliability reasons and that appears to be how your system is
configured because you wouldn't see a message about being out of
memory from sort otherwise.  (I always disable overcommit so as to
avoid the out-of-memory killer.)

Do you have user process limits active?  What is the output of this
command?

  ulimit -a

What does free say on your system?

  free

 I have read backlogs of the list and people report sort-ing 100GB
 files. Do you have any ideas?

Without doing a lot of debugging I am wondering if your choice of
locale setting is affecting this.  I doubt it because all of the sort
fields are numeric.  But because this is easy enough could you try
sorting using LC_ALL=C and see if that makes a difference?

  LC_ALL=C sort -k 6,6n -k 7,7n -k 8,8n -k 9,9n -k 10,10n -k 11,11n -T /data -T 
/data2 -o out.sort in.txt

Also could you determine how large the process is at the time that
sort reports running out of memory?  I am wondering if it is at a
magic number size such as 2G or 4G that could provide more insight
into the problem.

Bob


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils