Re: Using mmap(2) in sort(1) instead of temp files
On Fri, Apr 05, 2024 at 07:36:42AM -0400, Mouse wrote: > > Well, I'm not the one (putatively) doing the work. But my answers to > that are: > Me neither. > (1) Small sorts are not the issue, IMO. Even a speedup as great as > halving the time taken is not enough to worry about when it's on a par > with the cost of starting sort(1) at all. > This was not why I suggested it, it was more the case where someone tries to sort a 50GiB file on a 16GiB machine causing it to slow to a crawl as it attempts to deal with the memory pressure. It may all be moot if the time to perform the sort algorithm far outweighs the i/o time as has been noted by others. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: Using mmap(2) in sort(1) instead of temp files
>> (4) Are there still incoherencies between mmap and read/write >> access? At one time there were, [...] > This bug was fixed nearly a quarter century ago, in November 2000, > with the merge of the unified buffer cache. Ah, I recall UBC being brought in. > I think using any version of NetBSD released in this millennium > should be good to avoid the bug. For use cases for which such a thing is appropriate, if such a thing exists, yes, I daresay it would be. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Using mmap(2) in sort(1) instead of temp files
> Date: Fri, 5 Apr 2024 07:36:42 -0400 (EDT) > From: Mouse > > (4) Are there still incoherencies between mmap and read/write access? > At one time there were, and I never got a good handle on what needed to > be done to avoid them. This bug was fixed nearly a quarter century ago, in November 2000, with the merge of the unified buffer cache. You can read about it here: http://www.usenix.org/event/usenix2000/freenix/full_papers/silvers/silvers.pdf I think using any version of NetBSD released in this millennium should be good to avoid the bug.
Re: Using mmap(2) in sort(1) instead of temp files
>> [...] > Why not stat the input file and decide to use in memory iff the file > is small enough? This way sort will handle large sorts on small > memory machines automatically. Well, I'm not the one (putatively) doing the work. But my answers to that are: (1) Small sorts are not the issue, IMO. Even a speedup as great as halving the time taken is not enough to worry about when it's on a par with the cost of starting sort(1) at all. (2) Using mmap versus read provides minimal speedup in this sort of case: a small file which is being read sequentially. (3) Code complexity: two paths means twice the testing, twice the opportunities for bugs, (slightly more than) twice the maintenance, etc. (4) Are there still incoherencies between mmap and read/write access? At one time there were, and I never got a good handle on what needed to be done to avoid them. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Using mmap(2) in sort(1) instead of temp files
On Thu, Apr 04, 2024 at 02:38:02PM +0200, Martin Husemann wrote: > > Since the original comment hints at "instead of temp files" it is pretty > clear that the second variant is meant. This avoids all file system operations > and if the machine you run on has enough free memory it might not even > actually > touch swap space. > Why not stat the input file and decide to use in memory iff the file is small enough? This way sort will handle large sorts on small memory machines automatically. -- Brett Lymn -- Sent from my NetBSD device. "We are were wolves", "You mean werewolves?", "No we were wolves, now we are something else entirely", "Oh"
Re: Using mmap(2) in sort(1) instead of temp files
On Thursday, April 4, 2024 11:28:13 PM CEST Robert Elz wrote: > Yes, in cases where temp files are actually needed, using mmap() is a > very minor gain indeed - the buffering cost might be saved, but sorting > a large file is a cpu costly endeavour (lots of comparisons, lots of times > even with the best sorting algorithms available) so when temp files are > needed in the first place (large input files) the saving is liklely to be > a few ms in an operation which takes minutes of cpu time (or more). > Not worth the bother. I quite disagree here. mmap for the temp files with an appropriate madvise can minimize data copies (by using the VFS cache directly) as well reduce the cache foot print (by evicting pages once they are used up). Especially for storage layers like NVME that can use a significant part of the main memory bandwidth, that's important. I don't think it helps for the original input though, especially with the associated problems of concurrent writes or truncations. Joerg
Re: Using mmap(2) in sort(1) instead of temp files
Date:Thu, 4 Apr 2024 10:05:17 -0400 (EDT) From:Mouse Message-ID: <202404041405.kaa01...@stone.rodents-montreal.org> | Actually, if you mmap it PROT_WRITE and MAP_PRIVATE, you could go right | ahead. But that'll cost RAM or swap space when the COW fault happens. And the overheads from doing that (unless the file has very long records, every block in the file would likely end up being copied, for a large file) would drawf whatever small saving avoiding the buffering (in kernel, and probably in stdio though that could be avoided) would gain. | It also works only when the input file fits into VM; to rephrase part | of what I wrote yesterday on tech-kern, sorting a file bigger than 4G | on a 32-bit port shouldn't break. Agreed. Aside from anything else sort(1) needs the ability to merge already sorted files to handle the -m option, what it is doing using temp files is simply setting up that scenario, in the case where the file is big enough that it cannot simply be sorted in core. | (well, MAP_ANON). Yes, but that has issues. The size of an mmap()ped | area is fixed, set at map time, whereas file sizes grow dynamically. I | suspect that trying to use mmap instead of temp files would amount to | implementing a rudimentary ramfs. Yes, in cases where temp files are actually needed, using mmap() is a very minor gain indeed - the buffering cost might be saved, but sorting a large file is a cpu costly endeavour (lots of comparisons, lots of times even with the best sorting algorithms available) so when temp files are needed in the first place (large input files) the saving is liklely to be a few ms in an operation which takes minutes of cpu time (or more). Not worth the bother. | Furthermore, if the dataset fits in RAM, I'd say you shouldn't be using | the temporary-space paradigm at all; just slurp it in and sort it in | core. Also agreed. Whether the "slurp" means fgets(), read(), or mmap() doesn't matter a lot for this, provided additional hidden copies are also being made (ie: you want the data set to fit in RAM, as well as in the VM space, as soon as paging/swapping starts, using temp files is likely better). But to be sure about that, someone would need to benchmark it. Certainly, simply believing "mmap() must be faster, therefore better" is folly.There are applications for which it wins, and others where it doesn't - my gut feeling is that sorting is one where it doesn't, or at least not enough in the cases that matter to justify the extra code complexity - saving a little when sorting a small file, which might be possible, isn't good enough IMO. I'd simply forget the whole thing, unless someone is motivated enough to really benchmark various implementations for sorting LARGE files (say several TB) and post the results. kre
Re: Using mmap(2) in sort(1) instead of temp files
>> Given the issues about using mmap, can anybody suggest how I should >> proceed with the implementation, or if I should at all? > There are two potential ways where mmap(2) could help improve the speed > of sort: > - If you know the input file name, use a read-only mmap() of that file >and avoid all buffering. Downside: you can not store \0 at the >end of a line anymore and need to deal with char*/size_t pairs for >strings. Actually, if you mmap it PROT_WRITE and MAP_PRIVATE, you could go right ahead. But that'll cost RAM or swap space when the COW fault happens. It also works only when the input file fits into VM; to rephrase part of what I wrote yesterday on tech-kern, sorting a file bigger than 4G on a 32-bit port shouldn't break. > - You use "swap space" instead of a temporary file by doing > mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0); (well, MAP_ANON). Yes, but that has issues. The size of an mmap()ped area is fixed, set at map time, whereas file sizes grow dynamically. I suspect that trying to use mmap instead of temp files would amount to implementing a rudimentary ramfs. Furthermore, if the dataset fits in RAM, I'd say you shouldn't be using the temporary-space paradigm at all; just slurp it in and sort it in core. And if it fits in VM but not RAM, given the way swap is tuned for general-purpose use instead of the kind of access patterns sort exhibits, I suspect temp files might end up being more performant. And if the dataset doesn't fit in VM, you'll need temp files regardless. If this does go in, I really think it needs an option to suppress it. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Using mmap(2) in sort(1) instead of temp files
On Thu, Apr 04, 2024 at 12:02:30PM +, Ice Cream wrote: > Given the issues about using mmap, can anybody suggest how > I should proceed with the implementation, or if I should at all? There are two potential ways where mmap(2) could help improve the speed of sort: - If you know the input file name, use a read-only mmap() of that file and avoid all buffering. Downside: you can not store \0 at the end of a line anymore and need to deal with char*/size_t pairs for strings. - You use "swap space" instead of a temporary file by doing mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0); and then use the returned pointer for temporary stuff. Obviously you can not do that for arbitarary sizes, so maybe you have to keep the old code and only do this trick if the file size is small enough or you process the file in pieces or whatever. Since the original comment hints at "instead of temp files" it is pretty clear that the second variant is meant. This avoids all file system operations and if the machine you run on has enough free memory it might not even actually touch swap space. Martin