A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1941 ====================================================================== Reported By: dwheeler Assigned To: ajosey ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1941 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section: grep Page Number: 1 Line Number: 1 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2025-08-30 21:51 UTC Last Modified: 2025-09-12 16:28 UTC ====================================================================== Summary: Add widely-implemented options to grep ======================================================================
---------------------------------------------------------------------- (0007258) stephane (reporter) - 2025-09-12 16:28 https://www.austingroupbugs.net/view.php?id=1941#c7258 ---------------------------------------------------------------------- > All implement a "whole word" match with -w. However, that > raises complications on defining word boundaries, especially > since POSIX doesn't define the underlying construct. This may > be quite doable, but since that discussion is complicated, > maybe that's for another day. Actually POSIX does already specify the \< and \> regexp operators for the ex utility: https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/ex.html#tag_20_40_13_58 > \< > Match the beginning of a word. (See the definition of word > at the beginning of Command Descriptions in ex.) > \> > Match the end of a word. That's the wrong reference, btw, looks like it should be a reference to "Input Editing in ex" (I'll raise a bug about that): > word > > In the POSIX locale, a word consists of a maximal sequence of > letters, digits, and underscores, delimited at both ends by > characters other than letters, digits, or underscores, or by > the beginning or end of a line or the edit buffer. And the initial implementation of grep -w (AFAIK from BSD in the late 70s, ex being also a BSD utility) was implemented by adding \<...\> around the regex to match. https://github.com/dspinellis/unix-history-repo/blob/BSD-2/src/grep.c#L105-L106 That's however not necessarily the best approach and not what all implementations do these days. For example, with GNU grep (and its clones): <pre> $ echo 'a -b- c' | grep '\<-b-\>' $ echo 'a -b- c' | grep -we -b- a -b- c </pre> That is grep -w word being more like grep -P '(?<!\w)word(?!\w)' regardless of whether "word" itself starts and/ord ends with \w or not. Sounds like a better approach. <pre> $ echo 'a--b--c' | grep -we -b- a--b--c </pre> May be more debattable. The fact that there's no agreement in practice between grep implementations, may mean it's best to leave it out for now. Another issue with \<, \> if they were to be specified is that we'd likely want to also specify the REG_STARTEND BSD flag for regcomp() and sed/grep -o to use it, or we'd get into issues such as: <pre> $ echo aaa | sed 's/\<a/<a/g <a<a<a $ echo aaa | grep -o '\<a' a a a </pre> As each "a" ends up being at the start of the subject upon successive match. For the record, and for what it's worth, I otherwise support your proposal. Issue History Date Modified Username Field Change ====================================================================== 2025-08-30 21:51 dwheeler New Issue 2025-08-30 21:51 dwheeler Status New => Under Review 2025-08-30 21:51 dwheeler Assigned To => ajosey 2025-08-30 21:56 dwheeler Note Added: 0007240 2025-08-30 21:59 dwheeler Note Added: 0007241 2025-08-31 00:07 mirabilos Note Added: 0007242 2025-08-31 00:10 mirabilos Note Added: 0007243 2025-08-31 21:52 dwheeler Note Added: 0007244 2025-08-31 22:01 dwheeler Note Added: 0007245 2025-09-01 05:57 stephane Note Added: 0007246 2025-09-01 06:05 stephane Note Added: 0007247 2025-09-01 15:36 dwheeler Note Added: 0007249 2025-09-01 17:10 dwheeler Note Added: 0007250 2025-09-01 17:18 dwheeler Note Added: 0007251 2025-09-11 15:31 lanodan Note Added: 0007253 2025-09-11 15:36 lanodan Note Edited: 0007253 2025-09-11 15:37 lanodan Note Edited: 0007253 2025-09-11 15:37 lanodan Note Edited: 0007253 2025-09-11 15:50 geoffclare Project 1003.1(2008)/Issue 7 => 1003.1(2024)/Issue8 2025-09-11 18:01 dwheeler Note Added: 0007256 2025-09-12 16:28 stephane Note Added: 0007258 ======================================================================
