Just started thinking about this. The name of regmatches() suggests
that it will only extract the matches but not return anything for the
non-matches. We might need another function that returns a value for
non-matches. Perhaps the value should be the empty string for
non-matches and NA for matches to NA. The rationale is that we
delegate to regexpr() (at least conceptually), and it returns a
"matching region" which would be empty when there is no match. We
could allow strcapture() to accept an atomic vector as a prototype,
which would do what you want for regexec() (NA on no match, empty
string on empty capture). Then we could call the regexpr()-based
function strextract().

What do you think?

Michael

On Thu, Aug 29, 2019 at 3:27 PM Cyclic Group Z_1
<cyclicgroup...@yahoo.com> wrote:
>
> Thank you! I greatly appreciate your consideration, though of course it is up 
> to you. I think many people switch to stringr/stringi simply because 
> functions in those packages have some consistent design choices, for example, 
> they do not drop empty/missing matches, which facilitates array-based 
> programming. For example, in the cases where one needs to make a new column 
> in a data.frame (data.table, tibble, etc.) of regex extractions. Or in any 
> other case where there needs to be an element-wise correspondence between 
> input and output. I think insertion of NA_character_ to prevent dropping 
> indices seems like the natural choice for an array language (which, I think, 
> motivated the creation of stringr/stringi). While those are great packages 
> and this behavior can be easily replicated with simple wrappers, string 
> operations are normally easy to accomplish in base languages, so this seems 
> like something that would be appropriate to have in base. For example, MATLAB 
> and Pandas regex both all
 ow non-dropping empty matches (though of course I acknowledge Pandas is not a 
base language).
>
> Best,
> CG



-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to