Re: [PATCH] add new sort option --xargs (-x)
"Bo Borgerson" <[EMAIL PROTECTED]> wrote: > The number of inputs that can be handled by the sort utility is > currently limited by what may be passed in argv. > > Due to the nature of sort, this limit can't be stepped around with > `xargs' as it could be with some other utilities. > > My solution to this locally has been to add an option to the sort > utility, --xargs, which causes sort to treat STDIN as a source of > newline-separated arguments that supplement those on the command-line > (please see attached patch). I suppose you have a real application where this is useful? If so, please describe it -- motivation/justification helps ;-) > Is this an option that might be worth including in a future release? I think so. du and wc each have the --files0-from=F option, added for the same reason. Any such option in sort should have the same name and be implemented in the same way. [haven't forgotten about --nmerge. will get to it eventually ] ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
On Thu, Apr 3, 2008 at 12:18 PM, Jim Meyering <[EMAIL PROTECTED]> wrote: > I suppose you have a real application where this is useful? > If so, please describe it -- motivation/justification helps ;-) > Just a merge with a lot of source files. It's the same motivation as the nmerge patch. I've actually got another patch as well that I'll clean up and offer soon that allows a merge of > nmerge files to be divided among sub-processes whose output is then merged by the parent, which provides a performance benefit (if you've got the resources for it). I've got yet another patch that adds an option to open compressed files through a decompression program, so I don't have to set up fifos for a merge of gzipped files. I've been maintaining these patches against sort for a while, re-patching whenever a new release is published. I figured it would be worth a shot seeing if I could get any them incorporated into the package upstream. :) I'm also just generally interested in helping out with maintenance. I think I'm probably less experienced than most of the regular contributors but I can help with simple stuff and when it comes to coding I think doing is the best way of learning. > I think so. du and wc each have the --files0-from=F option, added for > the same reason. Any such option in sort should have the same name and > be implemented in the same way. That seems reasonable enough. Looks like readtokens0 does most of the work for me. :) How would you feel about also including a --filesn-from=F option to support pipelines like the one in my example where the input is newline separated? > [haven't forgotten about --nmerge. will get to it eventually ] Thanks. :) Bo ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
"Bo Borgerson" <[EMAIL PROTECTED]> wrote: >> I think so. du and wc each have the --files0-from=F option, added for >> the same reason. Any such option in sort should have the same name and >> be implemented in the same way. > > That seems reasonable enough. Looks like readtokens0 does most of the > work for me. :) > > How would you feel about also including a --filesn-from=F option to > support pipelines like the one in my example where the input is > newline separated? I'd rather not. Instead, just pipe your list through tr '\n' '\0' first. We had the same discussion back when I added --files0-from=F to du and wc. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
"Bo Borgerson" <[EMAIL PROTECTED]> wrote: > On Thu, Apr 3, 2008 at 12:18 PM, Jim Meyering <[EMAIL PROTECTED]> wrote: >> I suppose you have a real application where this is useful? >> If so, please describe it -- motivation/justification helps ;-) > > Just a merge with a lot of source files. It's the same motivation as > the nmerge patch. I've actually got another patch as well that I'll > clean up and offer soon that allows a merge of > nmerge files to be > divided among sub-processes whose output is then merged by the parent, > which provides a performance benefit (if you've got the resources for > it). I've got yet another patch that adds an option to open > compressed files through a decompression program, so I don't have to > set up fifos for a merge of gzipped files. Sounds interesting. I suppose it can work with an arbitrary decompressor? Note this relatively new option: --compress-program=PROG compress temporaries with PROG; decompress them with PROG -d ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
On Thu, Apr 3, 2008 at 2:05 PM, Jim Meyering <[EMAIL PROTECTED]> wrote: > Sounds interesting. > I suppose it can work with an arbitrary decompressor? > > Note this relatively new option: > > --compress-program=PROG compress temporaries with PROG; > decompress them with PROG -d > Yep. My current convention is: --magic-open=PROG[,PROG]... So if you want to merge a gzip'd file with a bzip2'd file you can use --magic-open=gzip,bzip2 (or just --magic-open, which enables all). For each regular file it checks magic and if it looks like a type that can be handled by one of PROG it opens a PROG -d -c -f (the -f is just in case the magic was a false-positive). Of course this re-introduces findprog into sort (for find_in_path), which may not be desirable. A convention more similar to that used for --compress-program would eliminate this (and the magic-checking), but limit a given merge to files compressed with a single program. Bo ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
Okay, here's a version that supports argument input in --files0-from=F style. Bo From 3108b79cbbb5d6c2fe3c2f8d5037f166cb0f1ca6 Mon Sep 17 00:00:00 2001 From: Bo Borgerson <[EMAIL PROTECTED]> Date: Thu, 3 Apr 2008 18:42:57 -0400 Subject: [PATCH] Add new sort option --files0-from=F src/sort.c: support new option tests/misc/sort-files0-from: test new option tests/misc/Makefile.am: indicate new test docs/coreutils.texti: explain new option NEWS: advertise new option Signed-off-by: Bo Borgerson <[EMAIL PROTECTED]> --- NEWS|5 ++ doc/coreutils.texi | 16 +++ src/sort.c | 58 +++- tests/misc/Makefile.am |1 + tests/misc/sort-files0-from | 105 +++ 5 files changed, 183 insertions(+), 2 deletions(-) create mode 100755 tests/misc/sort-files0-from diff --git a/NEWS b/NEWS index e208b30..492c4e9 100644 --- a/NEWS +++ b/NEWS @@ -55,6 +55,11 @@ GNU coreutils NEWS-*- outline -*- options --general-numeric-sort/-g, --month-sort/-M, --numeric-sort/-n and --random-sort/-R, resp. + sort accepts a new option, --files0-from=F, that specifies a file + containing a null-separated list of files to sort. This list is used + instead of filenames passed on the command-line to avoid problems with + maximum command-line (argv) length. + ** Improvements id and groups work around an AFS-related bug whereby those programs diff --git a/doc/coreutils.texi b/doc/coreutils.texi index ee7dbb2..5415394 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -3667,6 +3667,22 @@ Terminate with an error if @var{prog} exits with nonzero status. Whitespace and the backslash character should not appear in @var{prog}; they are reserved for future use. [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] including files from @command{du} +Rather than processing files named on the command line, process those +named in file @var{FILE}; each name is terminated by a null byte. +This is useful when the list of file names is so long that it may exceed +a command line length limitation. +In such cases, running @command{sort} via @command{xargs} is undesirable +because it splits the list into pieces and gives each piece to a different +instance of @command{sort}, with the resulting output being multiple sets +of sorted data concatenated together. +One way to produce a list of null-byte-terminated file names is with @sc{gnu} [EMAIL PROTECTED], using its @option{-print0} predicate. + +Do not specify any @var{FILE} on the command line when using this option. + @item -k @var{pos1}[,@var{pos2}] @itemx [EMAIL PROTECTED],@var{pos2}] @opindex -k diff --git a/src/sort.c b/src/sort.c index 8b2eec5..8342399 100644 --- a/src/sort.c +++ b/src/sort.c @@ -37,6 +37,7 @@ #include "posixver.h" #include "quote.h" #include "randread.h" +#include "readtokens0.h" #include "stdio--.h" #include "stdlib--.h" #include "strnumcmp.h" @@ -304,8 +305,9 @@ usage (int status) { printf (_("\ Usage: %s [OPTION]... [FILE]...\n\ + or: %s [OPTION]... --files0-from=F\n\ "), - program_name); + program_name, program_name); fputs (_("\ Write sorted concatenation of all FILE(s) to standard output.\n\ \n\ @@ -342,6 +344,9 @@ Other options:\n\ -C, --check=quiet, --check=silent like -c, but do not report first bad line\n\ --compress-program=PROG compress temporaries with PROG;\n\ decompress them with PROG -d\n\ + --files0-from=Fread input from the files specified by\n\ + NUL-terminated names in file F\n\ + -L, --max-line-length print the length of the longest line\n\ -k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1)\n\ -m, --merge merge already sorted files; do not sort\n\ "), stdout); @@ -395,7 +400,8 @@ enum CHECK_OPTION = CHAR_MAX + 1, COMPRESS_PROGRAM_OPTION, RANDOM_SOURCE_OPTION, - SORT_OPTION + SORT_OPTION, + FILES0_FROM_OPTION }; static char const short_options[] = "-bcCdfgik:mMno:rRsS:t:T:uy:z"; @@ -407,6 +413,7 @@ static struct option const long_options[] = {"compress-program", required_argument, NULL, COMPRESS_PROGRAM_OPTION}, {"dictionary-order", no_argument, NULL, 'd'}, {"ignore-case", no_argument, NULL, 'f'}, + {"files0-from", required_argument, NULL, FILES0_FROM_OPTION}, {"general-numeric-sort", no_argument, NULL, 'g'}, {"ignore-nonprinting", no_argument, NULL, 'i'}, {"key", required_argument, NULL, 'k'}, @@ -2752,6 +2759,8 @@ main (int argc, char **argv) bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL); bool obsolete_usage = (posix2_version () < 200112); char **files; + char *files_from = NULL; + struct Tokens tok; char const *outfile = NULL; initialize_main (&argc, &argv); @@ -2955,6 +2964,10 @@ main (int argc, char **argv)
Re: [PATCH] add new sort option --xargs (-x)
I had a capitalized error message in this patch. I also didn't use a correct commit message format. Thanks Bo From 9a37b547bcc892d1d5e2542c43d77b13497318db Mon Sep 17 00:00:00 2001 From: Bo Borgerson <[EMAIL PROTECTED]> Date: Thu, 3 Apr 2008 18:42:57 -0400 Subject: [PATCH] Add new sort option --files0-from=F * src/sort.c: support new option * tests/misc/sort-files0-from: test new option * tests/misc/Makefile.am: indicate new test * docs/coreutils.texti: explain new option * NEWS: advertise new option Signed-off-by: Bo Borgerson <[EMAIL PROTECTED]> --- NEWS|5 ++ doc/coreutils.texi | 16 +++ src/sort.c | 57 ++- tests/misc/Makefile.am |1 + tests/misc/sort-files0-from | 105 +++ 5 files changed, 182 insertions(+), 2 deletions(-) create mode 100755 tests/misc/sort-files0-from diff --git a/NEWS b/NEWS index e208b30..492c4e9 100644 --- a/NEWS +++ b/NEWS @@ -55,6 +55,11 @@ GNU coreutils NEWS-*- outline -*- options --general-numeric-sort/-g, --month-sort/-M, --numeric-sort/-n and --random-sort/-R, resp. + sort accepts a new option, --files0-from=F, that specifies a file + containing a null-separated list of files to sort. This list is used + instead of filenames passed on the command-line to avoid problems with + maximum command-line (argv) length. + ** Improvements id and groups work around an AFS-related bug whereby those programs diff --git a/doc/coreutils.texi b/doc/coreutils.texi index ee7dbb2..5415394 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -3667,6 +3667,22 @@ Terminate with an error if @var{prog} exits with nonzero status. Whitespace and the backslash character should not appear in @var{prog}; they are reserved for future use. [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] including files from @command{du} +Rather than processing files named on the command line, process those +named in file @var{FILE}; each name is terminated by a null byte. +This is useful when the list of file names is so long that it may exceed +a command line length limitation. +In such cases, running @command{sort} via @command{xargs} is undesirable +because it splits the list into pieces and gives each piece to a different +instance of @command{sort}, with the resulting output being multiple sets +of sorted data concatenated together. +One way to produce a list of null-byte-terminated file names is with @sc{gnu} [EMAIL PROTECTED], using its @option{-print0} predicate. + +Do not specify any @var{FILE} on the command line when using this option. + @item -k @var{pos1}[,@var{pos2}] @itemx [EMAIL PROTECTED],@var{pos2}] @opindex -k diff --git a/src/sort.c b/src/sort.c index 8b2eec5..c14a8d3 100644 --- a/src/sort.c +++ b/src/sort.c @@ -37,6 +37,7 @@ #include "posixver.h" #include "quote.h" #include "randread.h" +#include "readtokens0.h" #include "stdio--.h" #include "stdlib--.h" #include "strnumcmp.h" @@ -304,8 +305,9 @@ usage (int status) { printf (_("\ Usage: %s [OPTION]... [FILE]...\n\ + or: %s [OPTION]... --files0-from=F\n\ "), - program_name); + program_name, program_name); fputs (_("\ Write sorted concatenation of all FILE(s) to standard output.\n\ \n\ @@ -342,6 +344,8 @@ Other options:\n\ -C, --check=quiet, --check=silent like -c, but do not report first bad line\n\ --compress-program=PROG compress temporaries with PROG;\n\ decompress them with PROG -d\n\ + --files0-from=F read input from the files specified by\n\ +NUL-terminated names in file F\n\ -k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1)\n\ -m, --merge merge already sorted files; do not sort\n\ "), stdout); @@ -395,7 +399,8 @@ enum CHECK_OPTION = CHAR_MAX + 1, COMPRESS_PROGRAM_OPTION, RANDOM_SOURCE_OPTION, - SORT_OPTION + SORT_OPTION, + FILES0_FROM_OPTION }; static char const short_options[] = "-bcCdfgik:mMno:rRsS:t:T:uy:z"; @@ -407,6 +412,7 @@ static struct option const long_options[] = {"compress-program", required_argument, NULL, COMPRESS_PROGRAM_OPTION}, {"dictionary-order", no_argument, NULL, 'd'}, {"ignore-case", no_argument, NULL, 'f'}, + {"files0-from", required_argument, NULL, FILES0_FROM_OPTION}, {"general-numeric-sort", no_argument, NULL, 'g'}, {"ignore-nonprinting", no_argument, NULL, 'i'}, {"key", required_argument, NULL, 'k'}, @@ -2752,6 +2758,8 @@ main (int argc, char **argv) bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL); bool obsolete_usage = (posix2_version () < 200112); char **files; + char *files_from = NULL; + struct Tokens tok; char const *outfile = NULL; initialize_main (&argc, &argv); @@ -2955,6 +2963,10 @@ main (int argc, char **argv) compress_program =
Re: [PATCH] add new sort option --xargs (-x)
"Bo Borgerson" <[EMAIL PROTECTED]> wrote: > I had a capitalized error message in this patch. > I also didn't use a correct commit message format. Thanks for noticing and correcting. > Subject: [PATCH] Add new sort option --files0-from=F > > * src/sort.c: support new option > * tests/misc/sort-files0-from: test new option > * tests/misc/Makefile.am: indicate new test > * docs/coreutils.texti: explain new option s/texti/texi/ > * NEWS: advertise new option Please use capitals and periods in ChangeLogs. ;-) Follow existing style -- there are plenty of examples. > diff --git a/NEWS b/NEWS > index e208b30..492c4e9 100644 > --- a/NEWS > +++ b/NEWS > @@ -55,6 +55,11 @@ GNU coreutils NEWS-*- > outline -*- >options --general-numeric-sort/-g, --month-sort/-M, --numeric-sort/-n >and --random-sort/-R, resp. > > + sort accepts a new option, --files0-from=F, that specifies a file > + containing a null-separated list of files to sort. This list is used s/null/NUL/ > + instead of filenames passed on the command-line to avoid problems with > + maximum command-line (argv) length. > + > ** Improvements > >id and groups work around an AFS-related bug whereby those programs > diff --git a/doc/coreutils.texi b/doc/coreutils.texi > index ee7dbb2..5415394 100644 > --- a/doc/coreutils.texi > +++ b/doc/coreutils.texi > @@ -3667,6 +3667,22 @@ Terminate with an error if @var{prog} exits with > nonzero status. > Whitespace and the backslash character should not appear in > @var{prog}; they are reserved for future use. > > [EMAIL PROTECTED] [EMAIL PROTECTED] > [EMAIL PROTECTED] [EMAIL PROTECTED] > [EMAIL PROTECTED] including files from @command{du} s/du/sort/ If this text is verbatim or nearly identical to that for wc and/or du, please see if you can use a macro to factor it out. Hmm... I went to check and spotted the same error (use of `du' in wc's section) so went ahead and factored out the duplication. Now, your doc change will be to add this line: @files0fromOption{sort,} ... >--compress-program=PROG compress temporaries with PROG;\n\ >decompress them with PROG -d\n\ > + --files0-from=F read input from the files specified by\n\ > +NUL-terminated names in file F\n\ Split the string. Otherwise, your addition pushes its length beyond a portability limit whose exact number I forget but it's around 500. >-k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1)\n\ >-m, --merge merge already sorted files; do not sort\n\ > "), stdout); > @@ -395,7 +399,8 @@ enum >CHECK_OPTION = CHAR_MAX + 1, >COMPRESS_PROGRAM_OPTION, >RANDOM_SOURCE_OPTION, > - SORT_OPTION > + SORT_OPTION, > + FILES0_FROM_OPTION No big deal, but it's good practice to alphabetize. ... > diff --git a/tests/misc/sort-files0-from b/tests/misc/sort-files0-from > new file mode 100755 > index 000..a96ab1a > --- /dev/null > +++ b/tests/misc/sort-files0-from > @@ -0,0 +1,105 @@ > +#!/bin/sh > +# Test "sort --files0-from=F". > + > +# Copyright (C) 2002, 2003, 2005-2008 Free Software Foundation, Inc. This should have only 1 year number: 2008. If the file is based on some other, please indicate that. That will help me as reviewer, and future maintainers. E.g., I put this comment in the wc test of --files0-from: # This file bears a striking resemblance to tests/du/files0-from. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] add new sort option --xargs (-x)
On Sun, Apr 6, 2008 at 4:30 PM, Jim Meyering <[EMAIL PROTECTED]> wrote: > s/texti/texi/ > Please use capitals and periods in ChangeLogs. ;-) > s/null/NUL/ > Split the string. Otherwise, your addition pushes its length beyond > a portability limit whose exact number I forget but it's around 500. > No big deal, but it's good practice to alphabetize. > This should have only 1 year number: 2008. Thanks. I'll try to catch this sort of thing myself in the future. > Now, your doc change will be to add this line: > > @files0fromOption{sort,} I added an argument to the macro that specifies output for sub-lists, since it's 'a total' for wc and du, but 'sorted output' for sort. > If the file is based on some other, please indicate that. > That will help me as reviewer, and future maintainers. > > E.g., I put this comment in the wc test of --files0-from: > > # This file bears a striking resemblance to tests/du/files0-from. Unfortunately I didn't just copy one of the relevant test files. If it would be easier for maintenance to have a more direct copy I can redo it. I added a line at the top indicating that this test script covers a lot of the same ground as the wc-files0-from tests. Thanks, Bo From 404e23daf6874e4d36e2048de569bcac057b7400 Mon Sep 17 00:00:00 2001 From: Bo Borgerson <[EMAIL PROTECTED]> Date: Thu, 3 Apr 2008 18:42:57 -0400 Subject: [PATCH] Add new sort option --files0-from=F * src/sort.c: Support new option. * tests/misc/sort-files0-from: Test new option. * tests/misc/Makefile.am: Indicate new test. * docs/coreutils.texi: Explain new option. * NEWS: Advertise new option. Signed-off-by: Bo Borgerson <[EMAIL PROTECTED]> --- NEWS|5 ++ doc/coreutils.texi | 12 +++-- src/sort.c | 65 --- tests/misc/Makefile.am |1 + tests/misc/sort-files0-from | 106 +++ 5 files changed, 178 insertions(+), 11 deletions(-) create mode 100755 tests/misc/sort-files0-from diff --git a/NEWS b/NEWS index e208b30..492c4e9 100644 --- a/NEWS +++ b/NEWS @@ -55,6 +55,11 @@ GNU coreutils NEWS-*- outline -*- options --general-numeric-sort/-g, --month-sort/-M, --numeric-sort/-n and --random-sort/-R, resp. + sort accepts a new option, --files0-from=F, that specifies a file + containing a null-separated list of files to sort. This list is used + instead of filenames passed on the command-line to avoid problems with + maximum command-line (argv) length. + ** Improvements id and groups work around an AFS-related bug whereby those programs diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 5a6f2c3..9ac7bbf 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -3074,7 +3074,7 @@ Print only the newline counts. @opindex --max-line-length Print only the maximum line lengths. [EMAIL PROTECTED] files0fromOption{cmd,withTotalOption} [EMAIL PROTECTED] files0fromOption{cmd,withTotalOption,subListOutput} @itemx [EMAIL PROTECTED] @opindex [EMAIL PROTECTED] @cindex including files from @command{\cmd\} @@ -3084,13 +3084,13 @@ This is useful \withTotalOption\ when the list of file names is so long that it may exceed a command line length limitation. In such cases, running @command{\cmd\} via @command{xargs} is undesirable -because it splits the list into pieces and makes @command{\cmd\} print a -total for each sublist rather than for the entire list. +because it splits the list into pieces and makes @command{\cmd\} print +\subListOutput\ for each sublist rather than for the entire list. One way to produce a list of null-byte-terminated file names is with @sc{gnu} @command{find}, using its @option{-print0} predicate. Do not specify any @var{FILE} on the command line when using this option. @end macro [EMAIL PROTECTED],} [EMAIL PROTECTED],,a total} For example, to find the length of the longest line in any @file{.c} or @file{.h} file in the current hierarchy, do this: @@ -3670,6 +3670,8 @@ Terminate with an error if @var{prog} exits with nonzero status. Whitespace and the backslash character should not appear in @var{prog}; they are reserved for future use. [EMAIL PROTECTED],,sorted output} + @item -k @var{pos1}[,@var{pos2}] @itemx [EMAIL PROTECTED],@var{pos2}] @opindex -k @@ -9757,7 +9759,7 @@ Does not affect other symbolic links. This is helpful for finding out the disk usage of directories, such as @file{/usr/tmp}, which are often symbolic links. [EMAIL PROTECTED], with the @option{--total} (@option{-c}) option} [EMAIL PROTECTED], with the @option{--total} (@option{-c}) option,a total} @optHumanReadable diff --git a/src/sort.c b/src/sort.c index 8b2eec5..e67ce80 100644 --- a/src/sort.c +++ b/src/sort.c @@ -37,6 +37,7 @@ #include "posixver.h" #include "quote.h" #include "randread.h" +#include "readtokens0.h" #include "stdio--.h" #include "stdlib--.h" #include "strnumcmp.h" @@ -304,8 +305