>> On 11/30/23 12:11, Pádraig Brady wrote:
>>> Though that will generally give 128K, which is good when processing all
>>> of a file,
>>> but perhaps overkill when processing just the last part of a file.
>>
>> The 128 KiB number was computed as being better for apps like 'sed' that
>> typically read all or most of the file. 'tail' sometimes behaves that
>> way (e.g., 'tail -c +10') and so 'tail' should use 128 KiB in those
>> cases. The simplest way to do that is for 'tail' to use 128 KiB all the
>> time - that would cost little for uses like plain 'tail' and it could be
>> a significant win for uses like 'tail -c +10'.
>
> Yes I agree we should use io_blksize() in other routines in tail
> where we may dump lots of a file. However in this (most common) case
> the routine is dealing with the end of a regular file,
> so it's probably best to somewhat minimize the amount of data read,
> and more directly check the page_size which is issue at hand.
> I've pushed the fix at https://github.com/coreutils/coreutils/commit/73d119f4f
> where the adjustment (which also corresponds to what we do in wc) is:
>
>     if (sb->st_size % page_size == 0)
>       bufsize = MAX (BUFSIZ, page_size);
>
> I'll follow up with another patch to address the performance aspect,
> which uses io_blksize() where appropriate.

More testing shows that 256KiB does indeed give a 10-20% bump on modern systems.
Proposed change is attached.

cheers,
Pádraig.
From 06da9b2a060fed0eeeedcceb7c32af7644cf58c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 28 Feb 2024 16:41:40 +0000
Subject: [PATCH] cat,cp,mv,install,split: Set the minimum IO block size used
 to 256KiB

* NEWS: Mention the change in behavior.
* src/ioblksize.h: Add updated test results and
increase value from 128KiB to 256KiB.
---
 NEWS            |  4 ++++
 src/ioblksize.h | 30 +++++++++++++++++-------------
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index 69e282e37..7a5fbfd28 100644
--- a/NEWS
+++ b/NEWS
@@ -80,6 +80,10 @@ GNU coreutils NEWS                                    -*- outline -*-
 
 ** Improvements
 
+  cp,mv,install,cat,split now read and write a minimum of 256KiB at a time.
+  This was previously 128KiB and increasing to 256KiB was seen to increase
+  throughput by 10-20% when reading cached files on modern systems.
+
   SELinux operations in file copy operations are now more efficient,
   avoiding unneeded MCS/MLS label translation.
 
diff --git a/src/ioblksize.h b/src/ioblksize.h
index 590b09f58..cabc71b45 100644
--- a/src/ioblksize.h
+++ b/src/ioblksize.h
@@ -41,21 +41,25 @@
    system #5: 2.30GHz i7-3615QM with 1600MHz DDR3, arch=x86_64
    system #6: 1.30GHz i5-4250U with 1-channel 1600MHz DDR3, arch=x86_64
    system #7: 3.55GHz IBM,8231-E2B with 1066MHz DDR3, POWER7 revision 2.1
+   system #8: 2.60GHz i7-5600U with 1600MHz DDR3, arch=x86_64
+   system #9: 3.80GHz IBM,02CY649 with 2666MHz DDR4, POWER9 revision 2.3
+   system 10: 2.95GHz IBM,9043-MRX, POWER10 revision 2.0
+   system 11: 3.23Ghz Apple M1 with 2666MHz DDR4, arch=arm64
 
                 per-system transfer rate (GB/s)
-   blksize   #1    #2    #3    #4    #5    #6    #7
+   blksize   #1    #2    #3    #4    #5    #6    #7    #8    #9     10    11
    ------------------------------------------------------------------------
-      1024  .73   1.7   2.6   .64   1.0   2.5   1.3
-      2048  1.3   3.0   4.4   1.2   2.0   4.4   2.5
-      4096  2.4   5.1   6.5   2.3   3.7   7.4   4.8
-      8192  3.5   7.3   8.5   4.0   6.0  10.4   9.2
-     16384  3.9   9.4  10.1   6.3   8.3  13.3  16.8
-     32768  5.2   9.9  11.1   8.1  10.7  13.2  28.0
-     65536  5.3  11.2  12.0  10.6  12.8  16.1  41.4
-    131072  5.5  11.8  12.3  12.1  14.0  16.7  54.8
-    262144  5.7  11.6  12.5  12.3  14.7  16.4  40.0
-    524288  5.7  11.4  12.5  12.1  14.7  15.5  34.5
-   1048576  5.8  11.4  12.6  12.2  14.9  15.7  36.5
+      1024  .73   1.7   2.6   .64   1.0   2.5   1.3    .9   1.2    2.5   2.0
+      2048  1.3   3.0   4.4   1.2   2.0   4.4   2.5   1.7   2.3    4.9   3.8
+      4096  2.4   5.1   6.5   2.3   3.7   7.4   4.8   3.1   4.6    9.6   6.9
+      8192  3.5   7.3   8.5   4.0   6.0  10.4   9.2   5.6   9.1   18.4  12.3
+     16384  3.9   9.4  10.1   6.3   8.3  13.3  16.8   8.6  17.3   33.6  19.8
+     32768  5.2   9.9  11.1   8.1  10.7  13.2  28.0  11.4  32.2   59.2  27.0
+     65536  5.3  11.2  12.0  10.6  12.8  16.1  41.4  14.9  56.9   95.4  34.1
+    131072  5.5  11.8  12.3  12.1  14.0  16.7  54.8  17.1  86.5  125.0  38.2
+    262144  5.7  11.6  12.5  12.3  14.7  16.4  40.0  18.0 113.0  148.0  41.3
+    524288  5.7  11.4  12.5  12.1  14.7  15.5  34.5  18.0 104.0  153.0  43.1
+   1048576  5.8  11.4  12.6  12.2  14.9  15.7  36.5  18.2  87.9  114.0  44.8
 
 
    Note that this is to minimize system call overhead.
@@ -71,7 +75,7 @@
    In the future we could use the above method if available
    and default to io_blksize() if not.
  */
-enum { IO_BUFSIZE = 128 * 1024 };
+enum { IO_BUFSIZE = 256 * 1024 };
 static inline idx_t
 io_blksize (struct stat const *st)
 {
-- 
2.43.0

Reply via email to