[Bash-completion-devel] more testing of disk prefetch

Peter Cordes Wed, 03 Dec 2014 10:25:08 -0800

 I redid my patch series to include just the things I'm sure are
bugfixes for git HEAD.  I also redid their commit messages with line
wrapping.  Will send in a quoting cleanup patch later, with more stuff
in one patch, when I'm ready to sign off on it.



 I've played around with my prefetch idea, and now I'm happier with
it, having explored the possibilities a bit.  It'd be neat if some
people could test this on their own systems, esp. ones with any kind
of slow hard drive or slow CPU, or a crufty /etc/bash_completion.d
with a lot of crap in it.

 Sending an email chock-full of everything that's worth writing down,
just so it's in the list archives in case anyone ever wants it.  I
didn't want to stick most of this into a git commit or a bug comment.



 It seems to be hard to get Linux to really drop filesystem caches.

That or my SATA hard drive's internal cache is big enough that it
doesn't need to seek around to get the requested data when all you're
doing is  time . ./bash_completion  between dropping caches.  That
might be more likely, since it's faster than the first time running it
after not doing so for a while, but nowhere near as fast as with hot
cache.


Intel Core2 Duo E6600 (2.4GHz, 2 non-hyperthreaded cores) 5GB RAM
/etc on ext4 (noatime) on a WD10EARS-00M  (fairly old green-power magnetic HD)
/usr/local/src/bash-completion  on xfs on the same HD

One point of interest with these results is that the very FIRST
test of no-prefetch is FAR higher than any of the others.  IDK how to
get my system back to a state where it will take that long again.  If
anyone has any suggestions, that'd be great.  The hd churn from
grepping a linux kernel tree helps some, but it's not enough.


alias churn_hd="grep -r  --include '*.c' xxxxx 
/usr/local/src/linux/ubuntu-trusty/ >/dev/null"
alias dropcache="sync; echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null; 
sleep 6"
# the sleep 6 seconds is to give things time to settle down right
# after dropping caches.  Otherwise you get re-reading of
# constantly-accessed data contending with bash_completion, which is
# realistic but less repeatable.


  no prefetch
$ for i in {0..4};do  dropcache ; time BASH_COMPLETION_DISABLE_PREFETCH=1 . 
./bash_completion;churn_hd; done 2>&1 |
        grep '^real' --line-buffered | 
        perl -ne '/m(.*)s/ and $tot+=$1 and ++$c; print; END { print "  avg: ", 
$tot/$c, "\n"; }'
real    0m1.383s
real    0m0.605s
real    0m0.893s
real    0m0.797s
real    0m0.677s
 avg (of last 4 only): 0.743

 My first version of prefetch: just cat
  prefetch with:  ( exec cat $glob &>/dev/null </dev/null )& disown $!
for i in {0..4}; do dropcache; time . ./bash_completion; churn_hd; done 2>&1 | 
grep '^real'
real    0m0.761s      0m0.557s    0m0.821s    0m0.558s    0m0.749s (collapsed 
for readability)
  avg: 0.6892

with hot cache I get: avg: 0.0906



 Trying to get fancier: fork off cat, and then access all the inodes
of the files we will want, to get a deeper read queue depth.

prefetch with
(
        shopt -s failglob  # cat doesn't run at all with no matches, even if 
nullglob is on.
        # I was worried that cat could get stuck reading from stdin,
        # but ( cat & ) redirects stdin from /dev/null because subshells don't 
have job control.
        exec &>/dev/null  # quash output of cat and any bash failglob errors
        cat $glob &        # contents
        true $glob/   # inodes.  just expanding this glob will stat(2), no ls 
-dL needed.
) &
disown $!       # don't pollute job control

real    0m1.431s    0m0.665s    0m0.594s    0m0.713s    0m0.570s
  avg (of last 4 only): 0.6355
real    0m0.701s    0m0.557s    0m0.713s    0m0.594s    0m0.797s
  avg: 0.6724
real    0m0.594s    0m0.977s    0m0.558s    0m0.737s    0m0.593s
  avg: 0.6918
real    0m0.893s    0m0.570s    0m0.749s    0m0.606s    0m0.569s
  avg: 0.6774
avg of averages: 0.669s (still excluding the 1.4sec outlier)


observations from strace:
$ for i in {0..4}; do sync; dropcache; (ls -dLF /; cat /dev/null; ) > /dev/null;
  strace -o prefetch.$i.glob-inode.cat-bg.strace -tt -s 256 -f -e 
trace='!rt_sigprocmask,rt_sigaction' \
         bash -c "time . ./bash_completion";
  churn_hd;
done 2>&1 | ...
real    0m1.135s    0m0.591s    0m0.590s    0m0.697s    0m0.588s
  avg: 0.6165 (excluding the outlier).  Faster I think because the
clock doesn't start until bash has loaded, so that gets libc and all
that cached.

in the cached case, strace bash -c 'time . ./bash_completion'
real    0m0.299s
user    0m0.103s
sys     0m0.064s

 cat's read() system calls weren't always returning at the same time
as bash's.  While bash was chewing on some of the bigger files, or
esp. ones that made it stat(2) other things to look for files from
some of the scripts sourced (e.g. grup did a lot of stuff), the cat
process got ahead of bash and was able to get files into cache so
bash's read(2) returned right away when it got there.  This is exactly
the behaviour I was trying to get from a prefetch process. IDK if
drop_caches isn't clearing inode caches or something, but all the
stat(2) system calls from bash in the prefetch subshell happen with no
delay.  Or maybe almost all the relevant inodes are near each other on
disk, and got read together in a block?  Anyway, once one is done,
it's just boom, sequence of stat calls while the other processes are
stuck on something.

 I had been doing inode prefetch by running ls -dL on the glob, but
that always slowed things down, by maybe 0.08s, compared to just cat
prefetch.  Regardless of whether I forked cat & and exec ls, or vice
versa.  But I think the main problem was just ls finding all its
libraries and stuff, and the startup overhead.  Expanding $glob/ to
make bash stat everything to see if it's a directory is a pretty neat
idea, IMHO.  And it might help in a case where not all the inodes load
together.  If they did, just exec cat in the background subshell get
it done as part of opening the files.  I'm leaving it in on the theory
that inodes might be near each other even if they don't all come in in
the same disk read, so it's better to prefetch all the stat info
before reading file contents.

 On cygwin, the extra CPU time from the stat system calls, even in a
background process, might be a bad thing.  cygwin stat is really slow,
last I heard.  And so is fork.  cygwin users might want to set
BASH_COMPLETION_DISABLE_PREFETCH=1, if this patch makes it in.

 On a single-core CPU, it could be a very small slowdown.  cat is
doing a decent amount of system calls.  Copying RAM around isn't a big
deal, the files are all very small, none big enough for cat to need
more than one size=65536 read.  Esp. since write(2) to /dev/null just
returns without doing anything, no extra copying there.


 There is a fadvise(1), which would be perfect.  (FADVISE_WILLNEED
would does exactly what cat does, but without copying the data to
userspace, or writing it.  It blocks until the data is cached, if the
readahead queue fills up.)  However, the current implementation is
written in perl, so it's not useful to start it for a handful of small
files.  It probably takes about as much disk IO to start it as
bash_completion load-time does total.


 I tested having the prefetch thread wait to finish stat(2)ing all the
files before running cat, and it performs essentially the same.  I
think I'll go with this version, since if there is a significant
amount of disk activity from the inodes, that + cat's read requests
could make an annoying hiccup of disk load that might interfere with
something else the user had running.  Prob. better to err on the side
of being less agressive with read queue depth.  Also, this way forks
only once.

( #  prefetch with
        shopt -s failglob  # don't even run cat if no matches
        exec &>/dev/null
        true $glob/
        exec cat $glob
) &   disown $!
real    0m1.143s    0m0.605s    0m0.677s    0m0.725s    0m0.546s
  avg: 0.63825 (last 4 only)
real    0m0.689s    0m0.558s    0m0.785s    0m0.606s    0m0.833s
  avg: 0.6942
real    0m0.629s    0m0.773s    0m0.581s    0m0.749s    0m0.594s
  avg: 0.6652
average of averages: 0.666s

 This is the version in the patch I'm attaching.



 I also tested with (ls -dLF /; cat /dev/null; ) >/dev/null  ahead of
the timing timed part, to get some essential stuff cached again.

  prefetch:
(...;   true $glob/
        exec cat $glob ) & disown $!
$ for i in {0..4}; do sync; dropcache; (ls -dLF /; cat /dev/null; ) >/dev/null;
  time . ./bash_completion; churn_hd; done 2>&1 | ...
real    0m1.033s    0m0.472s    0m0.581s    0m0.545s    0m0.749s
  avg: 0.58675 (last 4)
real    0m0.533s    0m0.689s    0m0.509s    0m0.665s    0m0.533s
  avg: 0.5858
real    0m0.665s    0m0.521s    0m0.713s    0m0.533s    0m0.617s
  avg: 0.6098
real    0m0.485s    0m0.605s    0m0.545s    0m0.628s    0m0.581s
  avg: 0.5688
avg of averages: 0.588s


no-prefetch with that ls and cat outside the timed part:
$ for i in {0..4}; do sync; dropcache; (ls -dLF /; cat /dev/null; ) >/dev/null; 
        time BASH_COMPLETION_DISABLE_PREFETCH=1 . ./bash_completion; churn_hd;
  done 2>&1 | grep '^real' --line-buffered | ...
real    0m1.195s
real    0m0.641s
real    0m0.700s
real    0m0.665s
real    0m0.701s
  avg: 0.67675 (only last 4)
real    0m0.617s
real    0m0.706s
real    0m0.617s
real    0m0.700s
real    0m0.641s
  avg: 0.6562
real    0m0.821s
real    0m0.629s
real    0m0.845s
real    0m0.653s
real    0m0.917s
  avg: 0.773
real    0m0.677s
real    0m0.773s
real    0m0.605s
real    0m1.169s
real    0m0.617s
  avg: 0.685 (excluding outlier)
avg of averages: 0.698s

 So, in this case prefetch is saving 0.11s, out of 0.7, or a 15%
speedup.  Well that's not as good as I thought it was doing, but it's
pretty decent.

It's a good thing bash doesn't support inline assembly,
or I'd be at this for weeks... :P

 Seriously though, a lot of people are stuck waiting for bash for a
second or so, and speeding it up a bit is worth putting effort into,
IMO.

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

From b703f34d537104a98b60c98f92c263e61077e9ed Mon Sep 17 00:00:00 2001
From: Peter Cordes <pe...@cordes.ca>
Date: Wed, 3 Dec 2014 13:59:31 -0400
Subject: [PATCH 1/3] _longopt: fix parsing --help output that has -- in the
 description

This fixes parsing of things like grep --help:
  -r, --recursive like --directories=recurse

using \([^-]\|-[^-]\)* instead of .* at the front of the pattern makes the
greedy match at the front stop at the first --, rather than getting
--directories= from the -r line.

 Also move option completion ahead of the logic that checks previous arg
to see if this arg should be limited to a file or directory.  Too smart
for its own good in such a naive function, crossed up by things like ls
--directory or grep --files-with-matches.

 The sed in the case $prev block doesn't need this, because it puts $prev
into the pattern, and it's already presumably a valid option.  It will get
the right --option wherever it is in the line containing it.

 Also turns out that bash sorts and uniquifies the results itself,
so sort -u isn't needed.

Still not perfect, misses --silent from the help output line:
    -q, --quiet, --silent     suppress all normal output

 Could maybe loop over the --matches in sed, now that we have a
sufficiently non-greedy regex to match things in front of --options, but
then you'd need a full-blown sed program with pattern and hold space...
yuck.  Or maybe use awk?  Or hardcode a pattern that can match up to 3
long options on one line?

 This also breaks on commands with weird --help output, like if they for
some reason have --something BEFORE an option name.  You could start to
work around that, with another group like --[^-A-Za-z0-9] to match a --
that isn't at the start of an option, but that's just gratuitously
unreadable.  Do that for more robustness if anyone ever turns it into a
sed program that loops over --option matches on a single line.
---
 bash_completion | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/bash_completion b/bash_completion
index 55c9e48661028cee71301fd244c12d516303d437..a1dbb48e8bb69be538cb6ae9bb4485b6fa4ec698 100644
--- a/bash_completion
+++ b/bash_completion
@@ -1777,6 +1777,20 @@ _longopt()
     local cur prev words cword split
     _init_completion -s || return
 
+    # Check for options first: some programs have options like
+    # --directory=recursive that don't take directory args
+    # It's more likely the user knows what they're doing,
+    # for this naive --help parsing function.
+    if [[ "$cur" == -* ]]; then
+        COMPREPLY=( $( compgen -W "$( LC_ALL=C $1 --help 2>&1 | \
+	    sed -ne 's/\([^-]\|-[^-]\)*\(--[-A-Za-z0-9]\{1,\}=\{0,1\}\).*/\2/p' )" \
+            -- "$cur" ) )
+	    # initial part of that regex matches only up to before the first --,
+	    # to avoid tripping on " -r, --recursive   like --directory=recursive" in grep --help, for example.
+        [[ $COMPREPLY == *= ]] && compopt -o nospace
+        return 0
+    fi
+
     case "${prev,,}" in
         --help|--usage|--version)
             return 0
@@ -1807,12 +1821,7 @@ _longopt()
 
     $split && return 0
 
-    if [[ "$cur" == -* ]]; then
-        COMPREPLY=( $( compgen -W "$( LC_ALL=C $1 --help 2>&1 | \
-            sed -ne 's/.*\(--[-A-Za-z0-9]\{1,\}=\{0,1\}\).*/\1/p' | sort -u )" \
-            -- "$cur" ) )
-        [[ $COMPREPLY == *= ]] && compopt -o nospace
-    elif [[ "$1" == @(mk|rm)dir ]]; then
+    if [[ "$1" == @(mk|rm)dir ]]; then
         _filedir -d
     else
         _filedir
-- 
2.1.3

From bede4c22106bdd601718019861ac8f017139069c Mon Sep 17 00:00:00 2001
From: Peter Cordes <pe...@cordes.ca>
Date: Wed, 3 Dec 2014 14:00:58 -0400
Subject: [PATCH 2/3] upstart support for service completion

initctl list works for unprivileged users.

Wasn't sure what file to check to detect that upstart was present, but
/sbin should always be mounted, and upstart itself provides
/sbin/upstart-dbus-bridge, and it's not a conffile in /etc that someone
could move if they wanted to on their local system.  And it's absolutely
not going to have a name conflict with anything from another package.  :)

 I think it's important to check that the system is using an upstart init,
so you don't run initctl when completing in a root shell on another kind
of system, and maybe do something like generating system log messages.
---
 bash_completion | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/bash_completion b/bash_completion
index a1dbb48e8bb69be538cb6ae9bb4485b6fa4ec698..fd3bc41aaa160fdb3437db419d034c2be045ccc4 100644
--- a/bash_completion
+++ b/bash_completion
@@ -1137,6 +1137,10 @@ _services()
     COMPREPLY+=( $( systemctl list-units --full --all 2>/dev/null | \
         awk '$1 ~ /\.service$/ { sub("\\.service$", "", $1); print $1 }' ) )
 
+    if [[ -x /sbin/upstart-dbus-bridge ]]; then
+        COMPREPLY+=( $( initctl list 2>/dev/null | cut -d' ' -f1 ) )
+    fi
+
     COMPREPLY=( $( compgen -W '${COMPREPLY[@]#${sysvdirs[0]}/}' -- "$cur" ) )
 }
 
-- 
2.1.3

From ba68b737e6ccf0be3d8a5ab729eb3ab5e04fd2c1 Mon Sep 17 00:00:00 2001
From: Peter Cordes <pe...@cordes.ca>
Date: Wed, 3 Dec 2014 14:02:38 -0400
Subject: [PATCH 3/3] speed up loading the compat dir with disk prefetch

Fork off a prefetch thread to make sure the HD isn't sitting idle while
there's still data we're going to need.

tail(1) might spend less CPU copying stuff around in RAM (it would make
fewer system calls writing /dev/null), but POSIX tail only takes one arg.
There's a fadvise(1) which would be perfect if it was standard, and not
written in perl!

 I'm seeing a moderate speedup for this change, about 15% on Linux 3.13
with a magnetic HD on an idle system, after a
echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
And no slowdown with hot caches.  (dual core CPU)

 Other changes:
Had to move the necessary stuff up near the top of the file.
Was able to greatly simplify the loop over BASH_COMPLETION_COMPAT_DIR by
using the glob in the first place, instead of ls and then filtering.

 Took out the check for [[ -r $i ]] before sourcing.  If you have files in
/etc/bash_completion.d that aren't readable, you might not even notice if
bash_completion silently ignores them.  It's not like anything else uses
the directory, so don't be too quiet when there is a problem.

 I've even seen packages put completions in subdirectories (e.g. unison)
Could change from -f to -e to get warnings for that.  A package could
legitimately have a helper function or something in a subdir, though.
---
 bash_completion | 48 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/bash_completion b/bash_completion
index fd3bc41aaa160fdb3437db419d034c2be045ccc4..f2b7db58877c6800fe4e9a46d6d84dcab80ea013 100644
--- a/bash_completion
+++ b/bash_completion
@@ -47,9 +47,41 @@ readonly BASH_COMPLETION_COMPAT_DIR
 #
 _blacklist_glob='@(acroread.sh)'
 
+# Glob for matching various backup files.
+#
+_backup_glob='@(#*#|*@(~|.@(bak|orig|rej|swp|dpkg*|rpm@(orig|new|save))))'
+
 # Turn on extended globbing and programmable completion
 shopt -s extglob progcomp
 
+# source (or prefetch from disk) compat completion directory definitions
+_load_compat_dir()
+{
+    [[ -d $BASH_COMPLETION_COMPAT_DIR ]] || return
+    local i glob="$BASH_COMPLETION_COMPAT_DIR/!($_backup_glob|Makefile*|$_blacklist_glob)"
+
+    if [[ $1 == prefetch ]]; then
+        if [[ ! $BASH_COMPLETION_DISABLE_PREFETCH ]]; then
+            ( # fork a background subshell to let main continue ASAP
+                exec &>/dev/null
+                true $glob/   # inodes.  expanding this glob will stat(2)
+                exec cat $glob	# contents
+            ) &
+            disown $!
+        fi
+    else
+        for i in $glob; do
+            # If there are unreadable files, user probably wants to know,
+            # so don't check -r
+            [[ -f $i ]] && . "$i"
+        done
+    fi
+}
+# called again near the end of this file, and then unset
+_load_compat_dir prefetch
+
+
+
 # A lot of the following one-liners were taken directly from the
 # completion examples provided with the bash 2.04 source distribution
 
@@ -1105,10 +1137,6 @@ _gids()
     fi
 }
 
-# Glob for matching various backup files.
-#
-_backup_glob='@(#*#|*@(~|.@(bak|orig|rej|swp|dpkg*|rpm@(orig|new|save))))'
-
 # Complete on xinetd services
 #
 _xinetd_services()
@@ -1999,16 +2027,8 @@ _xfunc()
     "$@"
 }
 
-# source compat completion directory definitions
-if [[ -d $BASH_COMPLETION_COMPAT_DIR && -r $BASH_COMPLETION_COMPAT_DIR && \
-    -x $BASH_COMPLETION_COMPAT_DIR ]]; then
-    for i in $(LC_ALL=C command ls "$BASH_COMPLETION_COMPAT_DIR"); do
-        i=$BASH_COMPLETION_COMPAT_DIR/$i
-        [[ ${i##*/} != @($_backup_glob|Makefile*|$_blacklist_glob) \
-            && -f $i && -r $i ]] && . "$i"
-    done
-fi
-unset i _blacklist_glob
+_load_compat_dir source
+unset _blacklist_glob _load_compat_dir
 
 # source user completion file
 [[ ${BASH_SOURCE[0]} != ~/.bash_completion && -r ~/.bash_completion ]] \
-- 
2.1.3

signature.asc
Description: Digital signature

_______________________________________________
Bash-completion-devel mailing list
Bash-completion-devel@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/bash-completion-devel

[Bash-completion-devel] more testing of disk prefetch

Reply via email to