Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.

Ralf Wildenhues Fri, 12 Jun 2009 00:55:04 -0700

* Jim Meyering wrote on Thu, May 28, 2009 at 09:33:21PM CEST:
> Glen Lenker wrote:
> > On Thu, Mar 26, 2009 at 09:50:08PM +0000, Ralf Wildenhues wrote:
> >>
> >> Example run, on an 8-way, and with cat'ed instances of the dictionary,
> >> on tmpfs, timings best of three:
> >
> > Hey Ralf, did you happen to specify the amount of RAM sort should
> > use. Not specifying enough RAM for sort would force break what would
> > be a single internal sort into multiple internal sort passes and then
> > an external sort. As it is external sort is still sequential.

No, I did not specify the amount of RAM.  The system I tested on has
plenty of RAM, way more than is needed for the sort.  Specifying
something like 2G of RAM does not make any visible difference.

> > I ran these tests on a 256MB instance of the dictionary in tmpfs, on a
> > 8-core machine specifying 2G of RAM.
> >
> > runtime [s]     threads
> > file size [MB]  1         2         4       8
> > 256             2m41.219  1m27.357  52.53   36.429

> > 160.22user 1.34system 2:41.61elapsed 99%CPU
> > 159.83user 1.45system 1:27.12elapsed 185%CPU
> > 159.84user 1.56system 0:52.26elapsed 308%CPU
> > 160.67user 1.53system 0:36.26elapsed 447%CPU
> >
> > This seems to be what I would expect from a good implementation.

Yes, 56% efficiency going from 1 to 8 threads sounds like a much better
number, also your system overhead looks very sane compared to what I
saw.  Seems like it's a system-specific issue after all.  Which Linux
kernel version and which pthread (glibc) version are you using?

> >> It suggests to me that too much time is spent busy-waiting in pthread_join,
> >> or that sort is computing too much (I haven't looked at the patch in 
> >> detail).
> >>
> >> Also, I'd have expected the rate going from 1 to 2 threads to get at least
> >> a bit better with bigger file size, but it remains remarkably constant,
> >> around 1.45 for this setup.  What am I missing?

Glen, could you look at this, too?  I mean just timings of 1 vs 2
threads for different file sizes?  Thanks.

> I ran some tests (see below), and got results that look
> similar to Ralf's.  I used an otherwise-idle 16-core Nehalem-based
> system with 8GB of RAM.

Which kernel and glibc?

> However, more importantly, while the 16- and 8-thread tests were
> running, I sampled thread counts using ps and was surprised to see
> the number of active threads was usually far less than the maximum.

Well yes, I suppose you are seeing the serial overhead from reading in
the data set by the first process alone, and from the later merging
stages where only fewer threads are active.

>   T=16             T is number of threads
>   12.96       <--- This is elapsed wall-clock time in seconds.
>   16.43
>   13.76
>   16.60

That varies a lot.  Are there other jobs on this system running?
Do you, Jim, also see high system time overhead?  That would support
the hypothesis of a less efficient thread implementation (or kernel
issue) being the cause.

Off-list, Glen wrote:
> I don't know if this is related, because up until now memory
> consumption hasn't been mentioned wrt to this patch, but this patch
> bumps the amount of memory consumed from 1.5x to 2x. Should we
> consider reducing this?

I don't think fighting over that 0.5 x size is worth any hassles,
specifically as that x is just the number of lines times the size
of a struct line, not the size of the data in one line (right?).

Cheers,
Ralf

_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.

Reply via email to