>> On 11/30/23 12:11, Pádraig Brady wrote:
>>> Though that will generally give 128K, which is good when processing all
>>> of a file,
>>> but perhaps overkill when processing just the last part of a file.
>>
>> The 128 KiB number was computed as being better for apps like 'sed' that
>> typically read all or most of the file. 'tail' sometimes behaves that
>> way (e.g., 'tail -c +10') and so 'tail' should use 128 KiB in those
>> cases. The simplest way to do that is for 'tail' to use 128 KiB all the
>> time - that would cost little for uses like plain 'tail' and it could be
>> a significant win for uses like 'tail -c +10'.
>
> Yes I agree we should use io_blksize() in other routines in tail
> where we may dump lots of a file. However in this (most common) case
> the routine is dealing with the end of a regular file,
> so it's probably best to somewhat minimize the amount of data read,
> and more directly check the page_size which is issue at hand.
> I've pushed the fix at https://github.com/coreutils/coreutils/commit/73d119f4f
> where the adjustment (which also corresponds to what we do in wc) is:
>
> if (sb->st_size % page_size == 0)
> bufsize = MAX (BUFSIZ, page_size);
>
> I'll follow up with another patch to address the performance aspect,
> which uses io_blksize() where appropriate.
More testing shows that 256KiB does indeed give a 10-20% bump on modern systems.
Proposed change is attached.
cheers,
Pádraig.
From 06da9b2a060fed0eeeedcceb7c32af7644cf58c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 28 Feb 2024 16:41:40 +0000
Subject: [PATCH] cat,cp,mv,install,split: Set the minimum IO block size used
to 256KiB
* NEWS: Mention the change in behavior.
* src/ioblksize.h: Add updated test results and
increase value from 128KiB to 256KiB.
---
NEWS | 4 ++++
src/ioblksize.h | 30 +++++++++++++++++-------------
2 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/NEWS b/NEWS
index 69e282e37..7a5fbfd28 100644
--- a/NEWS
+++ b/NEWS
@@ -80,6 +80,10 @@ GNU coreutils NEWS -*- outline -*-
** Improvements
+ cp,mv,install,cat,split now read and write a minimum of 256KiB at a time.
+ This was previously 128KiB and increasing to 256KiB was seen to increase
+ throughput by 10-20% when reading cached files on modern systems.
+
SELinux operations in file copy operations are now more efficient,
avoiding unneeded MCS/MLS label translation.
diff --git a/src/ioblksize.h b/src/ioblksize.h
index 590b09f58..cabc71b45 100644
--- a/src/ioblksize.h
+++ b/src/ioblksize.h
@@ -41,21 +41,25 @@
system #5: 2.30GHz i7-3615QM with 1600MHz DDR3, arch=x86_64
system #6: 1.30GHz i5-4250U with 1-channel 1600MHz DDR3, arch=x86_64
system #7: 3.55GHz IBM,8231-E2B with 1066MHz DDR3, POWER7 revision 2.1
+ system #8: 2.60GHz i7-5600U with 1600MHz DDR3, arch=x86_64
+ system #9: 3.80GHz IBM,02CY649 with 2666MHz DDR4, POWER9 revision 2.3
+ system 10: 2.95GHz IBM,9043-MRX, POWER10 revision 2.0
+ system 11: 3.23Ghz Apple M1 with 2666MHz DDR4, arch=arm64
per-system transfer rate (GB/s)
- blksize #1 #2 #3 #4 #5 #6 #7
+ blksize #1 #2 #3 #4 #5 #6 #7 #8 #9 10 11
------------------------------------------------------------------------
- 1024 .73 1.7 2.6 .64 1.0 2.5 1.3
- 2048 1.3 3.0 4.4 1.2 2.0 4.4 2.5
- 4096 2.4 5.1 6.5 2.3 3.7 7.4 4.8
- 8192 3.5 7.3 8.5 4.0 6.0 10.4 9.2
- 16384 3.9 9.4 10.1 6.3 8.3 13.3 16.8
- 32768 5.2 9.9 11.1 8.1 10.7 13.2 28.0
- 65536 5.3 11.2 12.0 10.6 12.8 16.1 41.4
- 131072 5.5 11.8 12.3 12.1 14.0 16.7 54.8
- 262144 5.7 11.6 12.5 12.3 14.7 16.4 40.0
- 524288 5.7 11.4 12.5 12.1 14.7 15.5 34.5
- 1048576 5.8 11.4 12.6 12.2 14.9 15.7 36.5
+ 1024 .73 1.7 2.6 .64 1.0 2.5 1.3 .9 1.2 2.5 2.0
+ 2048 1.3 3.0 4.4 1.2 2.0 4.4 2.5 1.7 2.3 4.9 3.8
+ 4096 2.4 5.1 6.5 2.3 3.7 7.4 4.8 3.1 4.6 9.6 6.9
+ 8192 3.5 7.3 8.5 4.0 6.0 10.4 9.2 5.6 9.1 18.4 12.3
+ 16384 3.9 9.4 10.1 6.3 8.3 13.3 16.8 8.6 17.3 33.6 19.8
+ 32768 5.2 9.9 11.1 8.1 10.7 13.2 28.0 11.4 32.2 59.2 27.0
+ 65536 5.3 11.2 12.0 10.6 12.8 16.1 41.4 14.9 56.9 95.4 34.1
+ 131072 5.5 11.8 12.3 12.1 14.0 16.7 54.8 17.1 86.5 125.0 38.2
+ 262144 5.7 11.6 12.5 12.3 14.7 16.4 40.0 18.0 113.0 148.0 41.3
+ 524288 5.7 11.4 12.5 12.1 14.7 15.5 34.5 18.0 104.0 153.0 43.1
+ 1048576 5.8 11.4 12.6 12.2 14.9 15.7 36.5 18.2 87.9 114.0 44.8
Note that this is to minimize system call overhead.
@@ -71,7 +75,7 @@
In the future we could use the above method if available
and default to io_blksize() if not.
*/
-enum { IO_BUFSIZE = 128 * 1024 };
+enum { IO_BUFSIZE = 256 * 1024 };
static inline idx_t
io_blksize (struct stat const *st)
{
--
2.43.0