Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my example a bit
> x <- c("Groucho <grou...@marx.com>", "<ch...@marx.com>", "Harpo") > strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x, proto=data.frame(Name=character(), Address=character(), stringsAsFactors=FALSE)) Name Address 1 Groucho grou...@marx.com 2 ch...@marx.com 3 Harpo Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Aug 15, 2019 at 1:04 PM William Dunlap <wdun...@tibco.com> wrote: > I don't care much for regmatches and haven't tried strextract, but I think > replacing the character(0) by NA_character_ is almost always inappropriate > if the match information comes from gregexpr. > > I think strcapture() does a pretty good job of what I think you are trying > to do. Perhaps adding an argument to map no match to NA instead of "" > would give you just what you wanted. > > > x <- c("Groucho <grou...@marx.com>", "<ch...@marx.com>", "Harpo") > > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", > x, proto=data.frame(Name=character(), Junk=character(), > Address=character(), stringsAsFactors=FALSE)) > > d[c("Name", "Address")] > Name Address > 1 Groucho grou...@marx.com > 2 ch...@marx.com > 3 Harpo > > str(.Last.value) > 'data.frame': 3 obs. of 2 variables: > $ Name : chr "Groucho" "" "Harpo" > $ Address: chr "grou...@marx.com" "ch...@marx.com" "" > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 < > cyclicgroup...@yahoo.com> wrote: > >> I do think keeping the default behavior is desirable for backwards >> compatibility; my suggestion is not to change default behavior but to add >> an optional argument that allows a different behavior. Although this can be >> implemented in a user-defined function, retaining empty matches facilitates >> programmatic use, and seems to be something that should be available in >> base R. It is available, for example, in MATLAB, a comparable array >> language. >> >> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the >> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL >> (the default) results in drops for vector outputs and character(0) for list >> outputs and nomatch = NA results in insertion of NA_character_, and nomatch >> = '' results in insertion of empty string. >> >> I can submit proposed patch code if others think this is a good idea. >> >> What are your thoughts on the proposed alteration to (currently >> nonexported) strextract? I assume (maybe wrongly) that the plan is to >> eventually export that function. >> >> Thank you, >> CG >> > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel