Hi,

I have found a performance issue with the sort command when used on
pseudo files with zero size. For instance, sorting `/proc/kallsyms`, as
demonstrated below, takes significantly longer than executing with
`cat`, generating numerous temporary files. I confirmed this issue on
v8.32 as well as on commit 8f3989d in the master branch.

  $ time cat /proc/kallsyms | sort > /dev/null
  real    0m0.954s
  user    0m0.873s
  sys     0m0.096s

  $ time sort /proc/kallsyms > /dev/null
  real    0m8.555s
  user    0m3.367s
  sys     0m5.064s

  $ strace -e trace=openat sort /proc/kallsyms 2>&1 > /dev/null \
    | grep /tmp/sort | head -100
  ...
  openat(AT_FDCWD, "/tmp/sortM6Y6Y1", ...
  openat(AT_FDCWD, "/tmp/sortPrHKMG", ...

  $ strace -e trace=openat -c sort /proc/kallsyms > /dev/null
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    6.419777          19    333258         8 openat
  ------ ----------- ----------- --------- --------- ----------------
  100.00    6.419777          19    333258         8 total

It appears that the buffer size allocated for pseudo files with zero
size is insufficient, likely because it is based on their file size,
which is zero. As seen in the attached patch, I think using
`INPUT_FILE_SIZE_GUESS` to calculate the buffer size when the file size
is zero would resolve this issue.

Best regards,
Takashi Kusumi
From 9f759cc72014ab66da0e14318d4aa0c72e9311d9 Mon Sep 17 00:00:00 2001
From: Takashi Kusumi <tkus...@zlab.co.jp>
Date: Fri, 5 Apr 2024 12:03:42 +0900
Subject: [PATCH] sort: fix performance issue on zero-sized pseudo files

Previously, an insufficient buffer size was chosen for zero-sized pseudo
files (e.g., /proc/kallsyms). Now, the buffer size is calculated using
INPUT_FILE_SIZE_GUESS when the file size is zero.
---
 src/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sort.c b/src/sort.c
index 329ed45dc..8d757da55 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1538,7 +1538,7 @@ sort_buffer_size (FILE *const *fps, size_t nfps,
           != 0)
         sort_die (_("stat failed"), files[i]);
 
-      if (S_ISREG (st.st_mode))
+      if (S_ISREG (st.st_mode) && st.st_size != 0)
         file_size = st.st_size;
       else
         {
-- 
2.41.0

Reply via email to