[PATCH] sort: Add --threads option, which parallelizes internal sort.

Paul Eggert Wed, 25 Mar 2009 13:47:14 -0700

This patch is by Glen Lenker, Matt Pham, Benjamin Nuernberger, Sky
Lin, TaeSung Roh, and Paul Eggert.  It adds support for parallelism
within an internal sort.  On our simple tests on a 2-core desktop x86,
overall performance improved by roughly a factor of 1.6.
* THANKS: Add Benjamin Nuernberger, Sky Lin, TaeSung Roh.  Sort.
* TODO: Add note about how to make this algorithm better.
* bootstrap.conf: Add nproc, pthread.
* doc/coreutils.texi (sort invocation): Document new --threads option.
* src/Makefile.am (sort_LDADD): Add $(LIB_PTHREAD).
* src/sort.c: Include pthread.h, nproc.h.
(SUBTHREAD_LINES_HEURISTIC, THREADS_OPTION): New constants.
(sortlines_temp): Remove decl.
(usage, long_options, main): New option --threads.
(specify_nthreads): New function.
(mergelines): New signature, to emphasize the fact that the HI area must be
part of the destination.  All callers changed.
(sortlines): Rewrite to support and use parallelism, with a new signature.
All uses changed.  Merge in the functionality of sortlines_temp.
(struct thread_args): New struct.
(sortlines_thread): New function.
(sortlines_temp): Remove.
(sort): New arg NTHREADS.  All uses changed.
(main): Disable threading if we are sorting at random.
* tests/Makefile.am (TESTS): Add misc/sort-benchmark-random.
* tests/misc/sort-benchmark-random: New file.
---
 THANKS                           |   23 +++--
 TODO                             |    8 ++
 bootstrap.conf                   |    9 ++-
 doc/coreutils.texi               |    8 ++
 src/Makefile.am                  |    2 +-
 src/sort.c                       |  220 +++++++++++++++++++++++++++-----------
 tests/Makefile.am                |    1 +
 tests/misc/sort-benchmark-random |   67 ++++++++++++
 8 files changed, 263 insertions(+), 75 deletions(-)
 create mode 100644 tests/misc/sort-benchmark-random


diff --git a/THANKS b/THANKS
index 665a9ef..e37fdb3 100644
--- a/THANKS
+++ b/THANKS
@@ -24,12 +24,11 @@ aldomel                             [email protected]
 Alen Muzinic                        [email protected]
 Alexander Nguyen                    [email protected]
 Alexander V. Lukyanov               [email protected]
-Allen Hewes                         [email protected]
-Axel Dörfler                        [email protected]
 Alexandre Duret-Lutz                [email protected]
 Alexey Solovyov                     [email protected]
 Alexey Vyskubov                     [email protected]
 Alfred M. Szmidt                    [email protected]
+Allen Hewes                         [email protected]
 Andi Kleen                          [email protected]
 Andre Novaes Cunha                  [email protected]
 Andreas Frische                     [email protected]
@@ -61,13 +60,15 @@ Arvind Autar                        [email protected]
 Augey Mikus                         [email protected]
 Aurelien Jarno                      [email protected]
 Austin Donnelly                     [email protected]
+Axel Dörfler                        [email protected]
 Axel Kittenberger                   [email protected]
 Barry Kelly                         http://barrkel.blogspot.com/
 Bauke Jan Douma                     [email protected]
 Ben Elliston                        [email protected]
 Ben Harris                          [email protected]
-Benjamin Cutler                     [email protected]
 Bengt Martensson                    [email protected]
+Benjamin Cutler                     [email protected]
+Benjamin Nuernberger                [email protected]
 Bernard Giroud                      [email protected]
 Bernd Eckenfels                     [email protected]
 Bernd Leibing                       [email protected]
@@ -169,12 +170,12 @@ Elbert Pol                          [email protected]
 Eli Zaretskii                       [email protected]
 Elias Pipping                       [email protected]
 Emile LeBlanc                       [email protected]
-Erik Auerswald                      [email protected]
 Eric Backus                         [email protected]
 Eric Blake                          [email protected]
 Eric G. Miller                      [email protected]
 Eric Pemente                        [email protected]
 Eric S. Raymond                     [email protected]
+Erik Auerswald                      [email protected]
 Erik Bennett                        [email protected]
 Erik Corry                          [email protected]
 Evan Hunt                           [email protected]
@@ -207,7 +208,6 @@ Gerhard Poul                        [email protected]
 Germano Leichsenring                [email protected]
 Glen Lenker                         [email protected]
 Göran Uddeborg                      [email protected]
-Guochun Shi                         [email protected]
 GOTO Masanori                       [email protected]
 Greg Louis                          [email protected]
 Greg McGary                         [email protected]
@@ -218,6 +218,7 @@ Greg Wooledge                       [email protected]
 Gregory Leblanc                     [email protected]
 Guido Leenders                      [email protected]
 Guntram Blohm                       [email protected]
+Guochun Shi                         [email protected]
 H. J. Lu                            [email protected]
 Hans Ginzel                         [email protected]
 Hans Lermen                         [email protected]
@@ -232,8 +233,8 @@ Herbert Xu                          
[email protected]
 Holger Berger                       [email protected]
 Hon-Yin Kok                         [email protected]
 Hugh Daniel                         [email protected]
-Ian Bruce                           [email protected]
 Iain Calder                         [email protected]
+Ian Bruce                           [email protected]
 Ian Jackson                         [email protected]
 Ian Lance Taylor                    [email protected]
 Ian Turner                          [email protected]
@@ -243,8 +244,8 @@ Ingo Saitz                          [email protected]
 Ivo Timmermans                      [email protected]
 James                               [email protected]
 James Antill                        jmanti%[email protected]
-James Lemley                        [email protected]
 James Hunt                          [email protected]
+James Lemley                        [email protected]
 James Ralston                       [email protected]
 James Sneeringer                    [email protected]
 James Tanis                         [email protected]
@@ -318,8 +319,8 @@ Keith Thompson                      [email protected]
 Ken Pizzini                         [email protected]
 Kevin Mudrick                       [email protected]
 Kirk Kelsey                         [email protected]
-Kristin E Thomas                    [email protected]
 Kjetil Torgrim Homme                [email protected]
+Kristin E Thomas                    [email protected]
 Kristoffer Rose                     [email protected]
 Larry McVoy                         [email protected]
 Lars Hecking                        [email protected]
@@ -408,8 +409,8 @@ Michiel Bacchiani                   [email protected]
 Mikael Magnusson                    [email protected]
 Mike Castle                         [email protected]
 Mike Coleman                        [email protected]
-Mike Jetzer                         [email protected]
 Mike Frysinger                      [email protected]
+Mike Jetzer                         [email protected]
 Mikko Tuumanen                      [email protected]
 Mikulas Patocka                     [email protected]
 Miles Bader                         [email protected]
@@ -510,6 +511,7 @@ Savochkin Andrey Vladimirovich      [email protected]
 Scott Lurndal                       [email protected]
 Sébastien Maret                     [email protected]
 Shing-Shong Shei                    [email protected]
+Sky Lin                             [email protected]
 Soeren Sonnenburg                   [email protected]
 Solar Designer                      [email protected]
 Stanislav Ievlev                    [email protected]
@@ -530,9 +532,10 @@ Stuart Shelton                      [email protected]
 Sven Joachim                        [email protected]
 Szakacsits Szabolcs                 [email protected]
 Tadayoshi Funaba                    [email protected]
+TaeSung Roh                         [email protected]
 TAKAI Kousuke                       [email protected]
-Theodore Ts'o                       [email protected]
 The Wanderer                        [email protected]
+Theodore Ts'o                       [email protected]
 Theodoros V. Kalamatianos           [email protected]
 Thomas Bushnell                     [email protected]
 Thomas Goerlich                     [email protected]
diff --git a/TODO b/TODO
index 7288285..ee7e850 100644
--- a/TODO
+++ b/TODO
@@ -102,6 +102,14 @@ sort: Investigate better sorting algorithms; see Knuth 
vol. 3.
   5.3.1, who credits Lester Ford, Jr. and Selmer Johnson, American
   Mathematical Monthly 66 (1959), 387-389.
 
+  The current method of parallelization can be improved.  Parent
+  threads wait for children to finish, which can be a bottleneck.  It
+  might be better, at least when computation nears the root of the
+  merge tree, for parent threads to accept outputs from their children
+  one by one, and thus merge while their children are merging.  This
+  will require some inter-thread communication (e.g., via shared
+  objects that are accessed safely via mutual exclusion).
+
 shred: Update shred as described here to conform to DoD 5220 rules:
 http://lists.gnu.org/archive/html/bug-coreutils/2007-05/msg00075.html
 
diff --git a/bootstrap.conf b/bootstrap.conf
index 0747bb8..7358e10 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -74,12 +74,19 @@ gnulib_modules="
        memcasecmp memcmp2 mempcpy
        memrchr mgetgroups
        mkancesdirs mkdir mkdir-p mkstemp mktime modechange
-       mountlist mpsort obstack pathmax perl physmem
+       mountlist
+       mpsort
+       nproc
+       obstack
+       pathmax
+       perl
+       physmem
        posix-shell
        posixtm
        posixver
        progname
        propername
+       pthread
        putenv
        quote quotearg raise readlink areadlink-with-size
        randint
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 70effa1..a9ad20e 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4043,6 +4043,14 @@ have a large sort or merge that is I/O-bound, you can 
often improve
 performance by using this option to specify directories on different
 disks and controllers.
 
+...@item --threa...@var{n}
+Use no more than @var{n} threads to improve parallelism and thus
+reduce the wall-clock time needed for the sort.  The default @var{n}
+is the number of processors currently online, rounded down to the
+nearest power of two.  This default may change in the future.  If
+...@var{n} is a power of two, increasing it to a value less than 2...@var{n}
+does not typically help performance.
+
 @item -u
 @itemx --unique
 @opindex -u
diff --git a/src/Makefile.am b/src/Makefile.am
index 2313ed3..6113e5f 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -118,7 +118,7 @@ tac_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME)
 vdir_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) $(LIB_SELINUX) $(LIB_CAP)
 
 ## If necessary, add -lm to resolve use of pow in lib/strtod.c.
-sort_LDADD = $(LDADD) $(POW_LIB) $(LIB_GETHRXTIME)
+sort_LDADD = $(LDADD) $(POW_LIB) $(LIB_GETHRXTIME) $(LIB_PTHREAD)
 
 # for get_date and gettime
 date_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME)
diff --git a/src/sort.c b/src/sort.c
index ced0f2d..b4f0dfb 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -23,6 +23,7 @@
 #include <config.h>
 
 #include <getopt.h>
+#include <pthread.h>
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <signal.h>
@@ -32,6 +33,7 @@
 #include "filevercmp.h"
 #include "hash.h"
 #include "md5.h"
+#include "nproc.h"
 #include "physmem.h"
 #include "posixver.h"
 #include "quote.h"
@@ -83,6 +85,13 @@ struct rlimit { size_t rlim_cur; };
 # define OPEN_MAX 20
 #endif
 
+/* Heuristic value for the number of lines for which it is worth
+   creating a subthread, during an internal merge sort, on a machine
+   that has processors galore.  Currently this number is just a guess.
+   This value must be at least 4.  We don't know of any machine where
+   this number has any practical effect.  */
+enum { SUBTHREAD_LINES_HEURISTIC = 4 };
+
 #define UCHAR_LIM (UCHAR_MAX + 1)
 
 #ifndef DEFAULT_TMPDIR
@@ -289,8 +298,6 @@ static char const *compress_program;
    number are present, temp files will be used. */
 static unsigned int nmerge = NMERGE_DEFAULT;
 
-static void sortlines_temp (struct line *, size_t, struct line *);
-
 /* Report MESSAGE for FILE, then clean up and exit.
    If FILE is null, it represents standard output.  */
 
@@ -380,6 +387,7 @@ Other options:\n\
   -t, --field-separator=SEP  use SEP instead of non-blank to blank 
transition\n\
   -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or %s;\n\
                               multiple options specify multiple directories\n\
+      --threads=N           use no more than N threads to improve 
parallelism\n\
   -u, --unique              with -c, check for strict ordering;\n\
                               without -c, output only the first of an equal 
run\n\
 "), DEFAULT_TMPDIR);
@@ -423,7 +431,8 @@ enum
   FILES0_FROM_OPTION,
   NMERGE_OPTION,
   RANDOM_SOURCE_OPTION,
-  SORT_OPTION
+  SORT_OPTION,
+  THREADS_OPTION
 };
 
 static char const short_options[] = "-bcCdfgik:mMno:rRsS:t:T:uVy:z";
@@ -455,6 +464,7 @@ static struct option const long_options[] =
   {"temporary-directory", required_argument, NULL, 'T'},
   {"unique", no_argument, NULL, 'u'},
   {"zero-terminated", no_argument, NULL, 'z'},
+  {"threads", required_argument, NULL, THREADS_OPTION},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
   {NULL, 0, NULL, 0},
@@ -1253,6 +1263,21 @@ specify_sort_size (int oi, char c, char const *s)
   xstrtol_fatal (e, oi, c, long_options, s);
 }
 
+/* Specify the number of threads to spawn during internal sort.  */
+static unsigned long int
+specify_nthreads (int oi, char c, char const *s)
+{
+  unsigned long int nthreads;
+  enum strtol_error e = xstrtoul (s, NULL, 10, &nthreads, "");
+  if (e == LONGINT_OVERFLOW)
+    return ULONG_MAX;
+  if (e != LONGINT_OK)
+    xstrtol_fatal (e, oi, c, long_options, s);
+  if (nthreads == 0)
+    error (SORT_FAILURE, 0, _("number of threads must be nonzero"));
+  return nthreads;
+}
+
 /* Return the default sort size.  */
 static size_t
 default_sort_size (void)
@@ -2435,25 +2460,28 @@ mergefiles (struct sortfile *files, size_t ntemps, 
size_t nfiles,
   return nopened;
 }
 
-/* Merge into T the two sorted arrays of lines LO (with NLO members)
-   and HI (with NHI members).  T, LO, and HI point just past their
-   respective arrays, and the arrays are in reverse order.  NLO and
-   NHI must be positive, and HI - NHI must equal T - (NLO + NHI).  */
+/* Merge into T (of size NLINES) the two sorted arrays of lines
+   LO (with NLINES / 2 members), and
+   T - (NLINES / 2) (with NLINES - NLINES / 2 members).
+   T and LO point just past their respective arrays, and the arrays
+   are in reverse order.  NLINES must be at least 2.  */
 
 static inline void
-mergelines (struct line *t,
-           struct line const *lo, size_t nlo,
-           struct line const *hi, size_t nhi)
+mergelines (struct line *restrict t, size_t nlines,
+           struct line const *restrict lo)
 {
+  size_t nlo = nlines / 2;
+  size_t nhi = nlines - nlo;
+  struct line const *hi = t - nlo;
+
   for (;;)
     if (compare (lo - 1, hi - 1) <= 0)
       {
        *--t = *--lo;
        if (! --nlo)
          {
-           /* HI - NHI equalled T - (NLO + NHI) when this function
-              began.  Therefore HI must equal T now, and there is no
-              need to copy from HI to T.  */
+           /* HI must equal T now, and there is no need to copy from
+              HI to T.  */
            return;
          }
       }
@@ -2471,53 +2499,54 @@ mergelines (struct line *t,
       }
 }
 
-/* Sort the array LINES with NLINES members, using TEMP for temporary space.
-   NLINES must be at least 2.
-   The input and output arrays are in reverse order, and LINES and
-   TEMP point just past the end of their respective arrays.
 
-   Use a recursive divide-and-conquer algorithm, in the style
-   suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23.  Use
-   the optimization suggested by exercise 5.2.4-10; this requires room
-   for only 1.5*N lines, rather than the usual 2*N lines.  Knuth
-   writes that this memory optimization was originally published by
-   D. A. Bell, Comp J. 1 (1958), 75.  */
+static void sortlines (struct line *restrict, size_t, struct line *restrict,
+                      unsigned long int, bool);
 
-static void
-sortlines (struct line *lines, size_t nlines, struct line *temp)
+/* Thread arguments for sortlines_thread.  */
+struct thread_args
 {
-  if (nlines == 2)
-    {
-      if (0 < compare (&lines[-1], &lines[-2]))
-       {
-         struct line tmp = lines[-1];
-         lines[-1] = lines[-2];
-         lines[-2] = tmp;
-       }
-    }
-  else
-    {
-      size_t nlo = nlines / 2;
-      size_t nhi = nlines - nlo;
-      struct line *lo = lines;
-      struct line *hi = lines - nlo;
-      struct line *sorted_lo = temp;
-
-      sortlines (hi, nhi, temp);
-      if (1 < nlo)
-       sortlines_temp (lo, nlo, sorted_lo);
-      else
-       sorted_lo[-1] = lo[-1];
+  struct line *lines;
+  size_t nlines;
+  struct line *temp;
+  unsigned long int nthreads;
+  bool to_temp;
+};
 
-      mergelines (lines, sorted_lo, nlo, hi, nhi);
-    }
+/* Like sortlines, except with a signature acceptable to pthread_create.  */
+static void *
+sortlines_thread (void *data)
+{
+  struct thread_args const *args = data;
+  sortlines (args->lines, args->nlines, args->temp, args->nthreads,
+            args->to_temp);
+  return NULL;
 }
 
-/* Like sortlines (LINES, NLINES, TEMP), except output into TEMP
-   rather than sorting in place.  */
+/* Sort the array LINES with NLINES members, using TEMP for temporary space,
+   spawning NTHREADS threads.  NLINES must be at least 2.
+   The input and output arrays are in reverse order, and LINES and
+   TEMP point just past the end of their respective arrays.
+
+   If TO_TEMP, place the result in TEMP (trashing LINES in the
+   process); otherwise, place the result back into LINES so that it is
+   an in-place sort (trashing TEMP in the process).
+
+   Use a recursive divide-and-conquer algorithm, in the style
+   suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23.  If
+   multithreaded, this requires that TEMP contain NLINES entries; if
+   singlethreaded, use the optimization suggested by Knuth exercise
+   5.2.4-10, which requires room for only 1.5*N lines, rather than the
+   usual 2*N lines.  Knuth writes that this memory optimization was
+   originally published by D. A. Bell, Comp J. 1 (1958), 75.
+
+   This function is inline so that its tests for multthreadedness and
+   inplacedness can be optimized away in common cases.  */
 
 static void
-sortlines_temp (struct line *lines, size_t nlines, struct line *temp)
+sortlines (struct line *restrict lines, size_t nlines,
+          struct line *restrict temp,
+          unsigned long int nthreads, bool to_temp)
 {
   if (nlines == 2)
     {
@@ -2525,8 +2554,17 @@ sortlines_temp (struct line *lines, size_t nlines, 
struct line *temp)
         <http://lists.gnu.org/archive/html/bug-coreutils/2005-10/msg00086.html>
         in the IBM xlc 6.0.0.0 compiler in 64-bit mode.  */
       int swap = (0 < compare (&lines[-1], &lines[-2]));
-      temp[-1] = lines[-1 - swap];
-      temp[-2] = lines[-2 + swap];
+      if (to_temp)
+       {
+         temp[-1] = lines[-1 - swap];
+         temp[-2] = lines[-2 + swap];
+       }
+      else if (swap)
+       {
+         temp[-1] = lines[-1];
+         lines[-1] = lines[-2];
+         lines[-2] = temp[-1];
+       }
     }
   else
     {
@@ -2534,13 +2572,43 @@ sortlines_temp (struct line *lines, size_t nlines, 
struct line *temp)
       size_t nhi = nlines - nlo;
       struct line *lo = lines;
       struct line *hi = lines - nlo;
-      struct line *sorted_hi = temp - nlo;
 
-      sortlines_temp (hi, nhi, sorted_hi);
-      if (1 < nlo)
-       sortlines (lo, nlo, temp);
+      unsigned long int child_subthreads = nthreads / 2;
+      unsigned long int my_subthreads = nthreads - child_subthreads;
+      pthread_t thread;
+      struct thread_args args = {hi, nhi, temp - nlo, child_subthreads, 
to_temp};
+
+      if (child_subthreads != 0 && SUBTHREAD_LINES_HEURISTIC <= nlines
+         && pthread_create (&thread, NULL, sortlines_thread, &args) == 0)
+       {
+         /* Guarantee that nlo and nhi are each at least 2.  */
+         verify (4 <= SUBTHREAD_LINES_HEURISTIC);
+
+         sortlines (lo, nlo, temp, my_subthreads, !to_temp);
+         pthread_join (thread, NULL);
+       }
+      else
+       {
+         sortlines (hi, nhi, temp - (to_temp ? nlo : 0), 1, to_temp);
+         if (1 < nlo)
+           sortlines (lo, nlo, temp, 1, !to_temp);
+         else if (!to_temp)
+           temp[-1] = lo[-1];
+       }
 
-      mergelines (temp, lo, nlo, sorted_hi, nhi);
+      struct line *dest;
+      struct line const *sorted_lo;
+      if (to_temp)
+       {
+         dest = temp;
+         sorted_lo = lines;
+       }
+      else
+       {
+         dest = lines;
+         sorted_lo = temp;
+       }
+      mergelines (dest, nlines, sorted_lo);
     }
 }
 
@@ -2744,7 +2812,8 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
 /* Sort NFILES FILES onto OUTPUT_FILE. */
 
 static void
-sort (char * const *files, size_t nfiles, char const *output_file)
+sort (char * const *files, size_t nfiles, char const *output_file,
+      unsigned long int nthreads)
 {
   struct buffer buf;
   size_t ntemps = 0;
@@ -2758,8 +2827,11 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
       char const *file = *files;
       FILE *fp = xfopen (file, "r");
       FILE *tfp;
+
+      /* If singlethreaded, the merge uses the memory optimization
+        suggested in Knuth exercise 5.2.4-10; see sortlines.  */
       size_t bytes_per_line = (2 * sizeof (struct line)
-                              - sizeof (struct line) / 2);
+                              - (1 < nthreads ? 0 : sizeof (struct line) / 2));
 
       if (! buf.alloc)
        initbuf (&buf, bytes_per_line,
@@ -2787,7 +2859,7 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
          line = buffer_linelim (&buf);
          linebase = line - buf.nlines;
          if (1 < buf.nlines)
-           sortlines (line, buf.nlines, linebase);
+           sortlines (line, buf.nlines, linebase, nthreads, false);
          if (buf.eof && !nfiles && !ntemps && !buf.left)
            {
              xfclose (fp, file);
@@ -3039,6 +3111,7 @@ main (int argc, char **argv)
   bool mergeonly = false;
   char *random_source = NULL;
   bool need_random = false;
+  unsigned long int nthreads = 0;
   size_t nfiles = 0;
   bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
   bool obsolete_usage = (posix2_version () < 200112);
@@ -3361,6 +3434,10 @@ main (int argc, char **argv)
          add_temp_dir (optarg);
          break;
 
+       case THREADS_OPTION:
+         nthreads = specify_nthreads (oi, c, optarg);
+         break;
+
        case 'u':
          unique = true;
          break;
@@ -3506,6 +3583,9 @@ main (int argc, char **argv)
 
   if (need_random)
     {
+      /* Threading does not lock the randread_source structure, so
+        downgrade to one thread to avoid race conditions. */
+      nthreads = 1;
       randread_source = randread_new (random_source, MD5_DIGEST_SIZE);
       if (! randread_source)
        die (_("open failed"), random_source);
@@ -3560,7 +3640,21 @@ main (int argc, char **argv)
       IF_LINT (free (sortfiles));
     }
   else
-    sort (files, nfiles, outfile);
+    {
+      if (!nthreads)
+       {
+         /* The default NTHREADS is 2 ** floor (log2 (number of processors)).
+            If comparisons do not vary in cost and threads are
+            scheduled evenly, with the current algorithm there is no
+            performance advantage to using a number of threads that
+            is not a power of 2.  */
+         unsigned long int np2 = num_processors () / 2;
+         for (nthreads = 1; nthreads <= np2; nthreads *= 2)
+           continue;
+       }
+
+      sort (files, nfiles, outfile, nthreads);
+    }
 
   if (have_read_stdin && fclose (stdin) == EOF)
     die (_("close failed"), "-");
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 5f150ad..2e2bfec 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -201,6 +201,7 @@ TESTS =                                             \
   misc/shred-remove                            \
   misc/shuf                                    \
   misc/sort                                    \
+  misc/sort-benchmark-random                   \
   misc/sort-compress                           \
   misc/sort-continue                           \
   misc/sort-files0-from                                \
diff --git a/tests/misc/sort-benchmark-random b/tests/misc/sort-benchmark-random
new file mode 100644
index 0000000..2bace4f
--- /dev/null
+++ b/tests/misc/sort-benchmark-random
@@ -0,0 +1,67 @@
+#!/bin/sh
+# Benchmark sort on randomly generated data.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Written by Glen Lenker.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  sort --version
+fi
+
+. $srcdir/test-lib.sh
+
+very_expensive_
+
+# Use the 'C' locale to avoid problems in case Perl's sort isn't stable.
+LC_ALL=C
+export LC_ALL
+
+fail=0
+
+perl -e '
+my $num_lines = 500000;
+my $length = 100;
+
+for (my $i=0; $i < $num_lines; $i++)
+{
+    for (my $j=0; $j < $length; $j++)
+    {
+       printf "%c", 32 + rand(94);
+    }
+    print "\n";
+}' > in || framework_failure
+
+# We need to generate a lot of data for sort to show a noticeable
+# improvement in performance. Sorting it in PERL may take awhile.
+
+perl -e '
+open (FILE, "<in");
+my @list = <FILE>;
+print sort(@list);
+close (FILE);
+' > exp || framework_failure
+
+#start=$(date +%s)
+time sort in > out || fail=1
+#duration=$(expr $(date +%s) - $start)
+
+#echo sorting the random data took $duration seconds
+
+compare out exp || fail=1
+
+Exit $fail
-- 
1.6.2.1



_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

[PATCH] sort: Add --threads option, which parallelizes internal sort.

Reply via email to