Re: Using mmap(2) in sort(1) instead of temp files

2024-04-06 Thread Brett Lymn
On Fri, Apr 05, 2024 at 07:36:42AM -0400, Mouse wrote:
> 
> Well, I'm not the one (putatively) doing the work.  But my answers to
> that are:
> 

Me neither.

> (1) Small sorts are not the issue, IMO.  Even a speedup as great as
> halving the time taken is not enough to worry about when it's on a par
> with the cost of starting sort(1) at all.
> 

This was not why I suggested it, it was more the case where someone
tries to sort a 50GiB file on a 16GiB machine causing it to slow to a
crawl as it attempts to deal with the memory pressure.

It may all be moot if the time to perform the sort algorithm far
outweighs the i/o time as has been noted by others.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Mouse
>> (4) Are there still incoherencies between mmap and read/write
>> access?  At one time there were, [...]
> This bug was fixed nearly a quarter century ago, in November 2000,
> with the merge of the unified buffer cache.

Ah, I recall UBC being brought in.

> I think using any version of NetBSD released in this millennium
> should be good to avoid the bug.

For use cases for which such a thing is appropriate, if such a thing
exists, yes, I daresay it would be.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Taylor R Campbell
> Date: Fri, 5 Apr 2024 07:36:42 -0400 (EDT)
> From: Mouse 
> 
> (4) Are there still incoherencies between mmap and read/write access?
> At one time there were, and I never got a good handle on what needed to
> be done to avoid them.

This bug was fixed nearly a quarter century ago, in November 2000,
with the merge of the unified buffer cache.  You can read about it
here:

http://www.usenix.org/event/usenix2000/freenix/full_papers/silvers/silvers.pdf

I think using any version of NetBSD released in this millennium should
be good to avoid the bug.


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Mouse
>> [...]
> Why not stat the input file and decide to use in memory iff the file
> is small enough?  This way sort will handle large sorts on small
> memory machines automatically.

Well, I'm not the one (putatively) doing the work.  But my answers to
that are:

(1) Small sorts are not the issue, IMO.  Even a speedup as great as
halving the time taken is not enough to worry about when it's on a par
with the cost of starting sort(1) at all.

(2) Using mmap versus read provides minimal speedup in this sort of
case: a small file which is being read sequentially.

(3) Code complexity: two paths means twice the testing, twice the
opportunities for bugs, (slightly more than) twice the maintenance,
etc.

(4) Are there still incoherencies between mmap and read/write access?
At one time there were, and I never got a good handle on what needed to
be done to avoid them.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Brett Lymn
On Thu, Apr 04, 2024 at 02:38:02PM +0200, Martin Husemann wrote:
> 
> Since the original comment hints at "instead of temp files" it is pretty
> clear that the second variant is meant. This avoids all file system operations
> and if the machine you run on has enough free memory it might not even 
> actually
> touch swap space.
> 

Why not stat the input file and decide to use in memory iff the file is
small enough?  This way sort will handle large sorts on small memory
machines automatically.

-- 
Brett Lymn
--
Sent from my NetBSD device.

"We are were wolves",
"You mean werewolves?",
"No we were wolves, now we are something else entirely",
"Oh"


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-04 Thread Joerg Sonnenberger
On Thursday, April 4, 2024 11:28:13 PM CEST Robert Elz wrote:
> Yes, in cases where temp files are actually needed, using mmap() is a
> very minor gain indeed - the buffering cost might be saved, but sorting
> a large file is a cpu costly endeavour (lots of comparisons, lots of times
> even with the best sorting algorithms available) so when temp files are
> needed in the first place (large input files) the saving is liklely to be
> a few ms in an operation which takes minutes of cpu time (or more).
> Not worth the bother.

I quite disagree here. mmap for the temp files with an appropriate madvise
can minimize data copies (by using the VFS cache directly) as well reduce
the cache foot print (by evicting pages once they are used up). Especially
for storage layers like NVME that can use a significant part of the
main memory bandwidth, that's important. I don't think it helps for the
original input though, especially with the associated problems of concurrent
writes or truncations.

Joerg




Re: Using mmap(2) in sort(1) instead of temp files

2024-04-04 Thread Robert Elz
Date:Thu, 4 Apr 2024 10:05:17 -0400 (EDT)
From:Mouse 
Message-ID:  <202404041405.kaa01...@stone.rodents-montreal.org>

  | Actually, if you mmap it PROT_WRITE and MAP_PRIVATE, you could go right
  | ahead.  But that'll cost RAM or swap space when the COW fault happens.

And the overheads from doing that (unless the file has very long records,
every block in the file would likely end up being copied, for a large file)
would drawf whatever small saving avoiding the buffering (in kernel, and
probably in stdio though that could be avoided) would gain.

  | It also works only when the input file fits into VM; to rephrase part
  | of what I wrote yesterday on tech-kern, sorting a file bigger than 4G
  | on a 32-bit port shouldn't break.

Agreed.   Aside from anything else sort(1) needs the ability to merge
already sorted files to handle the -m option, what it is doing using
temp files is simply setting up that scenario, in the case where the
file is big enough that it cannot simply be sorted in core.

  | (well, MAP_ANON).  Yes, but that has issues.  The size of an mmap()ped
  | area is fixed, set at map time, whereas file sizes grow dynamically.  I
  | suspect that trying to use mmap instead of temp files would amount to
  | implementing a rudimentary ramfs.

Yes, in cases where temp files are actually needed, using mmap() is a
very minor gain indeed - the buffering cost might be saved, but sorting
a large file is a cpu costly endeavour (lots of comparisons, lots of times
even with the best sorting algorithms available) so when temp files are
needed in the first place (large input files) the saving is liklely to be
a few ms in an operation which takes minutes of cpu time (or more).
Not worth the bother.

  | Furthermore, if the dataset fits in RAM, I'd say you shouldn't be using
  | the temporary-space paradigm at all; just slurp it in and sort it in
  | core.

Also agreed.   Whether the "slurp" means fgets(), read(), or mmap() doesn't
matter a lot for this, provided additional hidden copies are also being
made (ie: you want the data set to fit in RAM, as well as in the VM space,
as soon as paging/swapping starts, using temp files is likely better).
But to be sure about that, someone would need to benchmark it.

Certainly, simply believing "mmap() must be faster, therefore better"
is folly.There are applications for which it wins, and others where
it doesn't - my gut feeling is that sorting is one where it doesn't,
or at least not enough in the cases that matter to justify the extra
code complexity - saving a little when sorting a small file, which might
be possible, isn't good enough IMO.

I'd simply forget the whole thing, unless someone is motivated enough to
really benchmark various implementations for sorting LARGE files (say
several TB) and post the results.

kre



Re: Using mmap(2) in sort(1) instead of temp files

2024-04-04 Thread Mouse
>> Given the issues about using mmap, can anybody suggest how I should
>> proceed with the implementation, or if I should at all?

> There are two potential ways where mmap(2) could help improve the speed
> of sort:

>  - If you know the input file name, use a read-only mmap() of that file
>and avoid all buffering.  Downside: you can not store \0 at the
>end of a line anymore and need to deal with char*/size_t pairs for
>strings.

Actually, if you mmap it PROT_WRITE and MAP_PRIVATE, you could go right
ahead.  But that'll cost RAM or swap space when the COW fault happens.
It also works only when the input file fits into VM; to rephrase part
of what I wrote yesterday on tech-kern, sorting a file bigger than 4G
on a 32-bit port shouldn't break.

>  - You use "swap space" instead of a temporary file by doing

>   mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0);

(well, MAP_ANON).  Yes, but that has issues.  The size of an mmap()ped
area is fixed, set at map time, whereas file sizes grow dynamically.  I
suspect that trying to use mmap instead of temp files would amount to
implementing a rudimentary ramfs.

Furthermore, if the dataset fits in RAM, I'd say you shouldn't be using
the temporary-space paradigm at all; just slurp it in and sort it in
core.  And if it fits in VM but not RAM, given the way swap is tuned
for general-purpose use instead of the kind of access patterns sort
exhibits, I suspect temp files might end up being more performant.  And
if the dataset doesn't fit in VM, you'll need temp files regardless.

If this does go in, I really think it needs an option to suppress it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-04 Thread Martin Husemann
On Thu, Apr 04, 2024 at 12:02:30PM +, Ice Cream wrote:
> Given the issues about using mmap, can anybody suggest how
> I should proceed with the implementation, or if I should at all?

There are two potential ways where mmap(2) could help improve the speed
of sort:

 - If you know the input file name, use a read-only mmap() of that file
   and avoid all buffering. Downside: you can not store \0 at the end of
   a line anymore and need to deal with char*/size_t pairs for strings.

 - You use "swap space" instead of a temporary file by doing

mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0);

   and then use the returned pointer for temporary stuff. Obviously you
   can not do that for arbitarary sizes, so maybe you have to keep
   the old code and only do this trick if the file size is small enough
   or you process the file in pieces or whatever.

Since the original comment hints at "instead of temp files" it is pretty
clear that the second variant is meant. This avoids all file system operations
and if the machine you run on has enough free memory it might not even actually
touch swap space.

Martin