find, fts: dramatical improvement of speed in find
Hi. I applied the following patch to fts.c from latest revision of gnulib ( 47bf2cf3184027c1eb9c1dfeea5c5b8b2d69710d ): diff --git a/lib/fts.c b/lib/fts.c index ade8c3349..70b424dea 100644 --- a/lib/fts.c +++ b/lib/fts.c @@ -1514,7 +1514,7 @@ mem1: saved_errno = errno; bool skip_stat = (ISSET(FTS_PHYSICAL) && ISSET(FTS_NOSTAT) && DT_IS_KNOWN(dp) - && ! DT_MUST_BE(dp, DT_DIR)); + && ! DT_MUST_BE(dp, DT_DIR)) || (ISSET(FTS_LOGICAL) && ISSET(FTS_NOSTAT) && DT_IS_KNOWN(dp) && ! DT_MUST_BE(dp, DT_DIR) && ! DT_MUST_BE(dp, DT_LNK)); p->fts_info = FTS_NSOK; /* Propagate dirent.d_type information back to caller, when possible. */ (If my mail client damaged this patch, you can see it here: http://paste.debian.net/hidden/c4eaca5b/ ) Then I copied this patched fts.c to sources of "find" and "find" started to work significantly faster. Idea is this: if we are in FTS_LOGICAL mode, we don't need "stat" if we know this is not directory or symlink. Of course, I don't fully understand fts.c code, so, please, carefully review patch and make any needed additional changes (say, to comments). I got time decrease from 14.16 s to 9.21 s when searching in my home directory using this command: time -p sudo /tmp/sidabcn/root/findutils/find/find -O3 -L /home/user '(' -path '/home/user/Downloads' -o -path '*/.git' -o -path '*/Default' -o -path '*/dev/fd' -o -path '/home/user/opt' -o -path '*/node_modules' ')' -prune -false -o -type f > /tmp/st (Of course, I don't want to share my home dir, so you will not be able to reproduce my test, but you can still try to type something like "find -O3 -L /home") == Askar Safin https://github.com/safinaskar
find, fts: dramatical improvement of speed in find
Hi. It seems you missed my letter. This time I CC'd gnulib/find devs. I have gnulib/fts patch, which improves speed. > Hi. I applied the following patch to fts.c from latest revision of gnulib ( > 47bf2cf3184027c1eb9c1dfeea5c5b8b2d69710d ): > > diff --git a/lib/fts.c b/lib/fts.c ... > > (If my mail client damaged this patch, you can see it here: > http://paste.debian.net/hidden/c4eaca5b/ ) > Then I copied this patched fts.c to sources of "find" and "find" started to > work significantly faster. > Idea is this: if we are in FTS_LOGICAL mode, we don't need "stat" if we know > this is not directory or symlink. > Of course, I don't fully understand fts.c code, so, please, carefully review > patch and make any needed additional changes (say, to comments). > > I got time decrease from 14.16 s to 9.21 s when searching in my home directory > using this command: > > time -p sudo /tmp/sidabcn/root/findutils/find/find -O3 -L /home/user '(' -path > '/home/user/Downloads' -o -path '*/.git' -o -path '*/Default' -o -path > '*/dev/fd' -o -path '/home/user/opt' -o -path '*/node_modules' ')' -prune > -false -o -type f > /tmp/st > > (Of course, I don't want to share my home dir, so you will not be able to > reproduce my test, but you can still try to type something like "find -O3 -L > /home") > > == > Askar Safin > https://github.com/safinaskar > == Askar Safin https://github.com/safinaskar
Re: find, fts: dramatical improvement of speed in find
On 2020-04-23 19:24, Askar Safin wrote: > It seems you missed my letter. no. The point is that open source contribution is most often voluntary work - also at least in my case - which is done when there's time. And despite many people are bored due to Corona lock-down in many countries, for me it's rather the opposite. Re. the performance improvement: while the change is very small, improvements like that have to be checked extremely thoroughly. Performance improvements are tempting ... and dangerous at the same time. Who said "premature performance improvement is the root of all evil"? Have a nice day, Berny
Re: find, fts: dramatical improvement of speed in find
On 4/25/20 1:29 AM, Bernhard Voelker wrote: > despite many people are bored due to Corona lock-down in many > countries, for me it's rather the opposite. Oh yes, things are pretty busy around here too. It's been predicted that over half of all working-age Americans will not be drawing wages next month, but the minority who are still working have more to do than ever. And although I'm too old to be a "working-age American", I'm still one of the worker bees. That being said, Askar's idea looks good to me, though the code can be simpler than what he proposed. I installed the attached patch into Gnulib. Thanks, Askar. >From a884f9d641f0749504acfd4e39a48c3fb7bd393e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 25 Apr 2020 11:02:53 -0700 Subject: [PATCH] Tune fts for FTS_LOGICAL+FTS_NOSTAT >From a suggestion by Askar Safin in: https://lists.gnu.org/r/bug-gnulib/2020-04/msg00074.html * lib/fts.c (fts_build): If file types are known, optimize FTS_LOGICAL+FTS_NOSTAT for non-symlinks and non-directories the same way that we already optimize FTS_PHYSICAL+FTS_NOSTAT for non-directories. --- ChangeLog | 10 ++ lib/fts.c | 7 --- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 4bf912fe4..c13c82bac 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2020-04-25 Paul Eggert + + Tune fts for FTS_LOGICAL+FTS_NOSTAT + From a suggestion by Askar Safin in: + https://lists.gnu.org/r/bug-gnulib/2020-04/msg00074.html + * lib/fts.c (fts_build): If file types are known, optimize + FTS_LOGICAL+FTS_NOSTAT for non-symlinks and non-directories the + same way that we already optimize FTS_PHYSICAL+FTS_NOSTAT for + non-directories. + 2020-04-19 Bruno Haible vasnprintf: Add support for printing wide characters using escapes. diff --git a/lib/fts.c b/lib/fts.c index ade8c3349..bf62dfa92 100644 --- a/lib/fts.c +++ b/lib/fts.c @@ -1511,10 +1511,11 @@ mem1: saved_errno = errno; inode numbers. Some day we might optimize that away, too, for directories where d_ino is known to be valid. */ -bool skip_stat = (ISSET(FTS_PHYSICAL) - && ISSET(FTS_NOSTAT) +bool skip_stat = (ISSET(FTS_NOSTAT) && DT_IS_KNOWN(dp) - && ! DT_MUST_BE(dp, DT_DIR)); + && ! DT_MUST_BE(dp, DT_DIR) + && (ISSET(FTS_PHYSICAL) + || ! DT_MUST_BE(dp, DT_LNK))); p->fts_info = FTS_NSOK; /* Propagate dirent.d_type information back to caller, when possible. */ -- 2.25.3
Re: find, fts: dramatical improvement of speed in find
Hi. Thanks a lot for applying patch. I use "find" very often (always in "-L" mode), so its performance is important for me. So I want to continue optimizing it. I found that gnulib commit 2649851d0409c3fafee7cf396d11c10892ac0e53 (2017) introduced a speed regression. "find -L /home/user" on my computer with find 79e8e03cda028c7d3134d8de63a40367eaa2f952 (2017) and gnulib f7eb1b99e30517fc50c130cdecec24059a1b7c4f (previous before 2649851d0409c3fafe) takes 7,32 s. But same find version (79e8e03cda028c7d3134d8de63a40367eaa2f952) with gnulib 2649851d0409c3fafee7cf396d11c10892ac0e53 takes 8,29 s. I don't know reason, but I noticed that if I apply to regressed version (gnulib 2649851d0409c3fafee7cf396d11c10892ac0e53) patch http://paste.debian.net/hidden/1ff503a8/ , then regression disappears, i. e. I will get normal ~7,32 s. Also I was able to port this patch to modern find and gnulib version. Let's take current find master 7642d172e10a890975696d28278e5192d81afc5b and current gnulib master bddb8c50edc730e4ea60181a541f4fe41ba933ff (i. e. with my optimization from previous letter). If I apply patch http://paste.debian.net/hidden/845d44cf/ (this is my attempt to port that anti-regression patch) to this gnulib commit, then speed increases from 3,33 s. to 2,46 s. Also I don't understand comment "If we're not in CWDFD mode, don't bother with this optimization, since the caller is not serious about performance" from modern gnulib sources (fts.c). What this means? When I run "find -L" (with find 7642d172e10a890975696d28278e5192d81afc5b and gnulib bddb8c50edc730e4ea60181a541f4fe41ba933ff without patches from *this* letter), I got to that code path (I verified this by inserting fprintf there). So "find -L" actually gets us to that point. And I need performance in that use case. == Askar Safin https://github.com/safinaskar
Re: find, fts: dramatical improvement of speed in find
Hi. It seems my previous letter was missed. Also, Paul, mailer daemon of my mail provider mail.ru said me that it cannot deliver letters to you. You don't want random letters or it is problem on my side? > Hi. Thanks a lot for applying patch. I use "find" very often (always in "-L" > mode), so its performance is important for me. So I want to continue > optimizing it. > > I found that gnulib commit 2649851d0409c3fafee7cf396d11c10892ac0e53 (2017) > introduced a speed regression. > > "find -L /home/user" on my computer with find > 79e8e03cda028c7d3134d8de63a40367eaa2f952 (2017) and gnulib > f7eb1b99e30517fc50c130cdecec24059a1b7c4f (previous before 2649851d0409c3fafe) > takes 7,32 s. > But same find version (79e8e03cda028c7d3134d8de63a40367eaa2f952) with gnulib > 2649851d0409c3fafee7cf396d11c10892ac0e53 takes 8,29 s. > I don't know reason, but I noticed that if I apply to regressed version > (gnulib 2649851d0409c3fafee7cf396d11c10892ac0e53) patch > http://paste.debian.net/hidden/1ff503a8/ , then regression disappears, i. e. I > will get normal ~7,32 s. > > Also I was able to port this patch to modern find and gnulib version. Let's > take current find master 7642d172e10a890975696d28278e5192d81afc5b and current > gnulib master bddb8c50edc730e4ea60181a541f4fe41ba933ff (i. e. with my > optimization from previous > letter). If I apply patch http://paste.debian.net/hidden/845d44cf/ (this is my > attempt to port that anti-regression patch) to this gnulib commit, then speed > increases from 3,33 s. to 2,46 s. > > Also I don't understand comment "If we're not in CWDFD mode, don't bother with > this optimization, since the caller is not serious about performance" from > modern gnulib sources (fts.c). What this means? When I run "find -L" (with > find > 7642d172e10a890975696d28278e5192d81afc5b and gnulib > bddb8c50edc730e4ea60181a541f4fe41ba933ff without patches from *this* letter), > I got to that code path (I verified this by inserting fprintf there). So "find > -L" actually gets us to that point. And I need > performance in that use case. > > == > Askar Safin > https://github.com/safinaskar == Askar Safin https://github.com/safinaskar
Re: find, fts: dramatical improvement of speed in find
Askar Safin wrote: > Hi. It seems my previous letter was missed. Also, Paul, mailer daemon of my > mail provider mail.ru said me that it cannot deliver letters to you. You don't > want random letters or it is problem on my side? Your mail appeared in the archive: https://lists.gnu.org/archive/html/bug-gnulib/2020-04/msg00090.html You can therefore assume that it reached the subscribers. Your expectation, however, that every mail you send will be replied to quickly, is not met by reality. Bernhard and Paul explained why. You need to give the people time. If the issue you are reporting is important, feel free to send a polite 'ping' in a month or so. Bruno