Paul Eggert wrote: > Here's a proposed change to GNU grep so that 'grep -r' > no longer follows symlinks, and by default does not > read devices, which is more convenient in the typical > use case. Symlinks and devices on the command line are > still dereferenced; the change affects only symlinks and > devices encountered recursively. Users who want dereferencing > for all files can use -R, which retains its old meaning. > > I considered adding --dereference, --no-dereference, > --dereference-command-line, etc, but that was a lot of extra complexity > (particularly in the documentation) for little payoff -- > anybody who wants something that complicated can use 'find', > as the whole point of 'grep -r' is that it should be simple. > > As a side effect this patch changes grep to use fts, > which should make it more robust with large directory > hierarchies. This is why the patch subtracts more lines > than it adds. > > This assumes the latest gnulib, so the first patch > syncs to gnulib, and the second one does the real work.
Thanks for working on this. Just a couple of nits, so far. I ran out of time before I got into the meat of it, not that I expect to find anything... > Subject: [PATCH 1/2] build: update gnulib submodule to latest > > --- > gnulib | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/gnulib b/gnulib > index eb21377..f9d2fe2 160000 > --- a/gnulib > +++ b/gnulib > @@ -1 +1 @@ > -Subproject commit eb213779301aa663ab84ac947e8e181e9ad554d0 > +Subproject commit f9d2fe251f3a104df656ab6ffc64821893ab9003 > -- > 1.7.6.5 > > >>From 7a97e788391b13ebf6242ff74d7a19a1944d56c1 Mon Sep 17 00:00:00 2001 > From: Paul Eggert <[email protected]> > Date: Sat, 10 Mar 2012 11:29:32 -0800 > Subject: [PATCH 2/2] grep: -r no follows symlinks > > Change -r to follow only command-line symlinks, and by default to > reads only devices named on the command line. This is a simple s/reads/read/ > way to get a more-useful behavior when searching random > directories; the idea is to use 'find' if you want something fancy. > -R acts as before and gets a new alias --dereference-recursive. > The code now uses fts internally, so it should be more robust and > faster with large hierarchies. > * .gitignore: Remove lib/savedir.c, lib/savedir.h. > * tests/symlink: New file > * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. > Perhaps other changes are needed too, but I'm not sure what > this makefile is for. > * NEWS: Document changes. > * doc/grep.texi (File and Directory Selection): Likewise. > * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. > Add fstatat, fts, openat-safer. > * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. > * lib/savedir.c, lib/savedir.h: Remove. > * po/POTFILES.in: Add lib/openat-die.c. > * src/main.c: Include fcntl-safer.h, fts_.h. Don't include > isdir.h, savedir.h. > (struct stats, stats_base): Remove. > (long_options, usage, main): Add --dereference-recursive and > implement -r vs -R. > (filename_prefix_len, fts_options): New static vars. > (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. > (devices): Now defaults to READ_COMMAND_LINE_DEVICES. > (reset, grep): Now takes just struct stat rather than file name and > struct stats. All callers changed. > (fillbuf): Now takes struct stat reather than struct stats. > All callers changed. > (grep): Don't worry about recursing too deeply; fts and grepdesc > handle this now. > (grepdirent, grepdesc, grep_command_line_args): New functions. > (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats > arg. All callers changed. Use openat_safer rather than open. > Use desc == STDIN_FILENO to tell whether we're reading "-". > Don't worry about EINTR when closing -- not possible, since we're > not catching signals. > * tests/Makefile.am (TESTS): Add symlink. > * tests/symlink: New file. > --- > .gitignore | 2 - > Makefile.boot | 2 - > NEWS | 11 ++ > bootstrap.conf | 7 +- > doc/grep.texi | 26 +++- > lib/Makefile.am | 2 +- > lib/savedir.c | 163 ---------------------- > lib/savedir.h | 11 -- > po/POTFILES.in | 1 + > src/main.c | 400 +++++++++++++++++++++++++++------------------------- > tests/Makefile.am | 1 + > tests/symlink | 65 +++++++++ > 12 files changed, 312 insertions(+), 379 deletions(-) > delete mode 100644 lib/savedir.c > delete mode 100644 lib/savedir.h > create mode 100755 tests/symlink > > diff --git a/.gitignore b/.gitignore > index 35f5d10..0b195d9 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -64,5 +64,3 @@ TAGS > !/lib/colorize-posix.c > !/lib/colorize-w32.c > !/lib/colorize.h > -!/lib/savedir.c > -!/lib/savedir.h > diff --git a/Makefile.boot b/Makefile.boot > index 043429b..4414110 100644 > --- a/Makefile.boot > +++ b/Makefile.boot > @@ -41,10 +41,8 @@ LIB_OBJS_core = \ > $(libdir)/error.$(OBJEXT) \ > $(libdir)/exclude.$(OBJEXT) \ > $(libdir)/hard-locale.$(OBJEXT) \ > - $(libdir)/isdir.$(OBJEXT) \ > $(libdir)/quotearg.$(OBJEXT) \ > $(libdir)/regex.$(OBJEXT) \ > - $(libdir)/savedir.$(OBJEXT) \ > $(libdir)/strtoumax.$(OBJEXT) \ > $(libdir)/xmalloc.$(OBJEXT) \ > $(libdir)/xstrtol.$(OBJEXT) \ > diff --git a/NEWS b/NEWS > index d0a63d5..3a752b1 100644 > --- a/NEWS > +++ b/NEWS > @@ -2,6 +2,17 @@ GNU grep NEWS -*- outline > -*- > > * Noteworthy changes in release ?.? (????-??-??) [?] > > +** New features > + > + The -R option now has a long-option alias --dereference-recursive. > + > +** Changes in behavior > + > + The -r (--recursive) option now follows only command-line symlinks. > + Also, by default -r now reads only devices named on the command > + line; this can be overridden with --devices. -R acts as before, so > + use -R if you prefer the old behavior of following all symlinks and > + defaulting to reading all devices. > > * Noteworthy changes in release 2.11 (2012-03-02) [stable] > > diff --git a/bootstrap.conf b/bootstrap.conf ... > diff --git a/doc/grep.texi b/doc/grep.texi ... > diff --git a/lib/Makefile.am b/lib/Makefile.am ... > diff --git a/lib/savedir.c b/lib/savedir.c ... > diff --git a/lib/savedir.h b/lib/savedir.h ... > diff --git a/po/POTFILES.in b/po/POTFILES.in ... > diff --git a/src/main.c b/src/main.c ... > @@ -369,6 +360,7 @@ unsigned char eolbyte; > /* For error messages. */ > /* The input file name, or (if standard input) "-" or a --label argument. */ > static char const *filename; > +static int filename_prefix_len; I'd use an unsigned type, since it will never be negative. > static int errseen; > static int write_error_seen; > > @@ -392,14 +384,19 @@ ARGMATCH_VERIFY (directories_args, directories_types); > > static enum directories_type directories = READ_DIRECTORIES; > > +enum { basic_fts_options = FTS_CWDFD | FTS_NOSTAT | FTS_TIGHT_CYCLE_CHECK }; > +static int fts_options = basic_fts_options | FTS_COMFOLLOW | FTS_PHYSICAL; > + > /* How to handle devices. */ > static enum > { > + READ_COMMAND_LINE_DEVICES, > READ_DEVICES, > SKIP_DEVICES > - } devices = READ_DEVICES; > + } devices = READ_COMMAND_LINE_DEVICES; > > -static int grepdir (char const *, struct stats const *); > +static int grepfile (int, char const *, int, int); > +static int grepdesc (int, int); I see legacy types like the latter "int" that should be bool, but that can obviously wait. > #if defined HAVE_DOS_FILE_CONTENTS > static inline int undossify_input (char *, size_t); > #endif > @@ -473,7 +470,7 @@ static off_t after_last_match; /* Pointer after last > matching line that > /* Reset the buffer for a new file, returning zero if we should skip it. > Initialize on the first time through. */ > static int > -reset (int fd, char const *file, struct stats *stats) > +reset (int fd, struct stat const *st) Nice simplification. I'll have to stop here for now. Haven't even tested yet. Thanks again.
