On 11/10/2006 6:28 AM, Prof Brian Ripley wrote: > On Fri, 10 Nov 2006, Duncan Murdoch wrote: > >> On 11/9/2006 5:14 AM, Romain Francois wrote: >>> Hello, >>> >>> What about an `invert` argument in grep, to return elements that are >>> *not* matching a regular expression : >>> >>> R> grep("pink", colors(), invert = TRUE, value = TRUE) >>> >>> would essentially return the same as : >>> >>> R> colors() [ - grep("pink", colors()) ] > > Note that grep("pat", x, value = TRUE) is not the same as x[grep("pat", x)], > as the help page carefully points out. (I think it would be better > if it were.) > >>> I'm attaching the files that I modified (against today's tarball) for >>> that purpose. > > (BTW, sending whole files makes it difficult to see the changes and even > harder to merge them; please use diffs. From a quick look the changes > were very incomplete, as the internal functions were changed and there > were no changed C files.) > >> I think a more generally useful change would be to be able to return a >> logical vector with TRUE for a match and FALSE for a non-match, so a >> simple !grep(...) does the inversion. (This is motivated by the recent >> R-help discussion of the fact that x[-selection] doesn't always invert >> the selection when it's a vector of indices.) > > I don't think that is pertinent here, as the indices are always a vector > of positive integers.
The issue is that the vector might be empty, in which case arithmetically negating it has no effect. Negating a vector of integer indices is not a good way to invert a selection, while logical negation of a logical vector is fine. > >> A way to do that without expanding the argument list would be to allow >> >> value="logical" >> >> as well as value=TRUE and value=FALSE. >> >> This would make boolean operations easy, e.g. >> >> colors()[grep("dark", colors(), value="logical") >> & !grep("blue", colors(), value="logical")] >> >> to select the colors that contain "dark" but not "blue". (In this case >> the RE to select that subset is rather simple because "dark" always >> precedes "blue", but if that wasn't true, it would be a lot messier.) > > That might be worthwhile, but it is relatively simple to change positive > integer indices to logical ones and v.v. > > My personal take is that having 'value=TRUE' was already a complication > not worth having, and implementing it at C level was an efficiency tweak > not worth the maintenance effort (and also means that '[' methods are not > dispatched). This makes it sound as though it would be worthwhile to redo the implementation of value=TRUE as something equivalent to x[grep("pat", x)] by putting this case into the R code. This would simplify the C code and make the interface a little less quirky. (I'm not sure how much code this would break because of the loss of coercion to character.) The value="logical" implementation could also be done in R, not C. The advantage of putting it into grep() rather than leaving it for the user to change later is that grep() has a copy of x in hand, so a user of grep() will not have to save length(x) to use in the conversion to logical. Duncan Murdoch ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel