find, fts: dramatical improvement of speed in find

2020-04-16 Thread Askar Safin
Hi. I applied the following patch to fts.c from latest revision of gnulib ( 
47bf2cf3184027c1eb9c1dfeea5c5b8b2d69710d ):

diff --git a/lib/fts.c b/lib/fts.c
index ade8c3349..70b424dea 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -1514,7 +1514,7 @@ mem1:   saved_errno = errno;
 bool skip_stat = (ISSET(FTS_PHYSICAL)
   && ISSET(FTS_NOSTAT)
   && DT_IS_KNOWN(dp)
-  && ! DT_MUST_BE(dp, DT_DIR));
+  && ! DT_MUST_BE(dp, DT_DIR)) || 
(ISSET(FTS_LOGICAL) && ISSET(FTS_NOSTAT) && DT_IS_KNOWN(dp) && ! DT_MUST_BE(dp, 
DT_DIR) && ! DT_MUST_BE(dp, DT_LNK));
 p->fts_info = FTS_NSOK;
 /* Propagate dirent.d_type information back
to caller, when possible.  */

(If my mail client damaged this patch, you can see it here: 
http://paste.debian.net/hidden/c4eaca5b/ )
Then I copied this patched fts.c to sources of "find" and "find" started to 
work significantly faster.
Idea is this: if we are in FTS_LOGICAL mode, we don't need "stat" if we know 
this is not directory or symlink.
Of course, I don't fully understand fts.c code, so, please, carefully review 
patch and make any needed additional changes (say, to comments).

I got time decrease from 14.16 s to 9.21 s when searching in my home directory 
using this command:

time -p sudo /tmp/sidabcn/root/findutils/find/find -O3 -L /home/user '(' -path 
'/home/user/Downloads' -o -path '*/.git' -o -path '*/Default' -o -path 
'*/dev/fd' -o -path '/home/user/opt' -o -path '*/node_modules' ')' -prune 
-false -o -type f > /tmp/st

(Of course, I don't want to share my home dir, so you will not be able to 
reproduce my test, but you can still try to type something like "find -O3 -L 
/home")

==
Askar Safin
https://github.com/safinaskar


find, fts: dramatical improvement of speed in find

2020-04-23 Thread Askar Safin
Hi. It seems you missed my letter. This time I CC'd gnulib/find devs. I have 
gnulib/fts patch, which improves speed.

> Hi. I applied the following patch to fts.c from latest revision of gnulib (
> 47bf2cf3184027c1eb9c1dfeea5c5b8b2d69710d ):
> 
> diff --git a/lib/fts.c b/lib/fts.c
...
> 
> (If my mail client damaged this patch, you can see it here:
> http://paste.debian.net/hidden/c4eaca5b/ )
> Then I copied this patched fts.c to sources of "find" and "find" started to
> work significantly faster.
> Idea is this: if we are in FTS_LOGICAL mode, we don't need "stat" if we know
> this is not directory or symlink.
> Of course, I don't fully understand fts.c code, so, please, carefully review
> patch and make any needed additional changes (say, to comments).
> 
> I got time decrease from 14.16 s to 9.21 s when searching in my home directory
> using this command:
> 
> time -p sudo /tmp/sidabcn/root/findutils/find/find -O3 -L /home/user '(' -path
> '/home/user/Downloads' -o -path '*/.git' -o -path '*/Default' -o -path
> '*/dev/fd' -o -path '/home/user/opt' -o -path '*/node_modules' ')' -prune
> -false -o -type f > /tmp/st
> 
> (Of course, I don't want to share my home dir, so you will not be able to
> reproduce my test, but you can still try to type something like "find -O3 -L
> /home")
> 
> ==
> Askar Safin
> https://github.com/safinaskar
> 


==
Askar Safin
https://github.com/safinaskar


Re: find, fts: dramatical improvement of speed in find

2020-04-25 Thread Bernhard Voelker
On 2020-04-23 19:24, Askar Safin wrote:
> It seems you missed my letter.

no.  The point is that open source contribution is most often voluntary
work - also at least in my case - which is done when there's time.
And despite many people are bored due to Corona lock-down in many
countries, for me it's rather the opposite.

Re. the performance improvement: while the change is very small, improvements
like that have to be checked extremely thoroughly.
Performance improvements are tempting ... and dangerous at the same time.
Who said "premature performance improvement is the root of all evil"?

Have a nice day,
Berny



Re: find, fts: dramatical improvement of speed in find

2020-04-25 Thread Paul Eggert
On 4/25/20 1:29 AM, Bernhard Voelker wrote:
> despite many people are bored due to Corona lock-down in many
> countries, for me it's rather the opposite.

Oh yes, things are pretty busy around here too. It's been predicted that over
half of all working-age Americans will not be drawing wages next month, but the
minority who are still working have more to do than ever. And although I'm too
old to be a "working-age American", I'm still one of the worker bees.

That being said, Askar's idea looks good to me, though the code can be simpler
than what he proposed. I installed the attached patch into Gnulib. Thanks, 
Askar.
>From a884f9d641f0749504acfd4e39a48c3fb7bd393e Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 25 Apr 2020 11:02:53 -0700
Subject: [PATCH] Tune fts for FTS_LOGICAL+FTS_NOSTAT

>From a suggestion by Askar Safin in:
https://lists.gnu.org/r/bug-gnulib/2020-04/msg00074.html
* lib/fts.c (fts_build): If file types are known, optimize
FTS_LOGICAL+FTS_NOSTAT for non-symlinks and non-directories the
same way that we already optimize FTS_PHYSICAL+FTS_NOSTAT for
non-directories.
---
 ChangeLog | 10 ++
 lib/fts.c |  7 ---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4bf912fe4..c13c82bac 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2020-04-25  Paul Eggert  
+
+	Tune fts for FTS_LOGICAL+FTS_NOSTAT
+	From a suggestion by Askar Safin in:
+	https://lists.gnu.org/r/bug-gnulib/2020-04/msg00074.html
+	* lib/fts.c (fts_build): If file types are known, optimize
+	FTS_LOGICAL+FTS_NOSTAT for non-symlinks and non-directories the
+	same way that we already optimize FTS_PHYSICAL+FTS_NOSTAT for
+	non-directories.
+
 2020-04-19  Bruno Haible  
 
 	vasnprintf: Add support for printing wide characters using escapes.
diff --git a/lib/fts.c b/lib/fts.c
index ade8c3349..bf62dfa92 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -1511,10 +1511,11 @@ mem1:   saved_errno = errno;
inode numbers.  Some day we might optimize that
away, too, for directories where d_ino is known to
be valid.  */
-bool skip_stat = (ISSET(FTS_PHYSICAL)
-  && ISSET(FTS_NOSTAT)
+bool skip_stat = (ISSET(FTS_NOSTAT)
   && DT_IS_KNOWN(dp)
-  && ! DT_MUST_BE(dp, DT_DIR));
+  && ! DT_MUST_BE(dp, DT_DIR)
+  && (ISSET(FTS_PHYSICAL)
+  || ! DT_MUST_BE(dp, DT_LNK)));
 p->fts_info = FTS_NSOK;
 /* Propagate dirent.d_type information back
to caller, when possible.  */
-- 
2.25.3



Re: find, fts: dramatical improvement of speed in find

2020-04-26 Thread Askar Safin
Hi. Thanks a lot for applying patch. I use "find" very often (always in "-L" 
mode), so its performance is important for me. So I want to continue optimizing 
it.

I found that gnulib commit 2649851d0409c3fafee7cf396d11c10892ac0e53 (2017) 
introduced a speed regression.

"find -L /home/user" on my computer with find 
79e8e03cda028c7d3134d8de63a40367eaa2f952 (2017) and gnulib 
f7eb1b99e30517fc50c130cdecec24059a1b7c4f (previous before 2649851d0409c3fafe) 
takes 7,32 s.
But same find version (79e8e03cda028c7d3134d8de63a40367eaa2f952) with gnulib 
2649851d0409c3fafee7cf396d11c10892ac0e53 takes 8,29 s.
I don't know reason, but I noticed that if I apply to regressed version (gnulib 
2649851d0409c3fafee7cf396d11c10892ac0e53) patch 
http://paste.debian.net/hidden/1ff503a8/ , then regression disappears, i. e. I 
will get normal ~7,32 s.

Also I was able to port this patch to modern find and gnulib version. Let's 
take current find master 7642d172e10a890975696d28278e5192d81afc5b and current 
gnulib master bddb8c50edc730e4ea60181a541f4fe41ba933ff (i. e. with my 
optimization from previous 
letter). If I apply patch http://paste.debian.net/hidden/845d44cf/ (this is my 
attempt to port that anti-regression patch) to this gnulib commit, then speed 
increases from 3,33 s. to 2,46 s.

Also I don't understand comment "If we're not in CWDFD mode, don't bother with 
this optimization, since the caller is not serious about performance" from 
modern gnulib sources (fts.c). What this means? When I run "find -L" (with find 
7642d172e10a890975696d28278e5192d81afc5b and gnulib 
bddb8c50edc730e4ea60181a541f4fe41ba933ff without patches from *this* letter), I 
got to that code path (I verified this by inserting fprintf there). So "find 
-L" actually gets us to that point. And I need 
performance in that use case.

==
Askar Safin
https://github.com/safinaskar


Re: find, fts: dramatical improvement of speed in find

2020-05-04 Thread Askar Safin
Hi. It seems my previous letter was missed. Also, Paul, mailer daemon of my
mail provider mail.ru said me that it cannot deliver letters to you. You don't
want random letters or it is problem on my side?

> Hi. Thanks a lot for applying patch. I use "find" very often (always in "-L"
> mode), so its performance is important for me. So I want to continue
> optimizing it.
> 
> I found that gnulib commit 2649851d0409c3fafee7cf396d11c10892ac0e53 (2017)
> introduced a speed regression.
> 
> "find -L /home/user" on my computer with find
> 79e8e03cda028c7d3134d8de63a40367eaa2f952 (2017) and gnulib
> f7eb1b99e30517fc50c130cdecec24059a1b7c4f (previous before 2649851d0409c3fafe)
> takes 7,32 s.
> But same find version (79e8e03cda028c7d3134d8de63a40367eaa2f952) with gnulib
> 2649851d0409c3fafee7cf396d11c10892ac0e53 takes 8,29 s.
> I don't know reason, but I noticed that if I apply to regressed version
> (gnulib 2649851d0409c3fafee7cf396d11c10892ac0e53) patch
> http://paste.debian.net/hidden/1ff503a8/ , then regression disappears, i. e. I
> will get normal ~7,32 s.
> 
> Also I was able to port this patch to modern find and gnulib version. Let's
> take current find master 7642d172e10a890975696d28278e5192d81afc5b and current
> gnulib master bddb8c50edc730e4ea60181a541f4fe41ba933ff (i. e. with my
> optimization from previous 
> letter). If I apply patch http://paste.debian.net/hidden/845d44cf/ (this is my
> attempt to port that anti-regression patch) to this gnulib commit, then speed
> increases from 3,33 s. to 2,46 s.
> 
> Also I don't understand comment "If we're not in CWDFD mode, don't bother with
> this optimization, since the caller is not serious about performance" from
> modern gnulib sources (fts.c). What this means? When I run "find -L" (with
> find 
> 7642d172e10a890975696d28278e5192d81afc5b and gnulib
> bddb8c50edc730e4ea60181a541f4fe41ba933ff without patches from *this* letter),
> I got to that code path (I verified this by inserting fprintf there). So "find
> -L" actually gets us to that point. And I need 
> performance in that use case.
> 
> ==
> Askar Safin
> https://github.com/safinaskar
==
Askar Safin
https://github.com/safinaskar


Re: find, fts: dramatical improvement of speed in find

2020-05-04 Thread Bruno Haible
Askar Safin wrote:
> Hi. It seems my previous letter was missed. Also, Paul, mailer daemon of my
> mail provider mail.ru said me that it cannot deliver letters to you. You don't
> want random letters or it is problem on my side?

Your mail appeared in the archive:
https://lists.gnu.org/archive/html/bug-gnulib/2020-04/msg00090.html

You can therefore assume that it reached the subscribers.

Your expectation, however, that every mail you send will be replied to quickly,
is not met by reality. Bernhard and Paul explained why. You need to give the
people time. If the issue you are reporting is important, feel free to send
a polite 'ping' in a month or so.

Bruno