Re: [R] Help with regex replacements
Magic! tmp %>% as_tibble() %>% rename(Text = value) %>% mutate(Text = str_replace_all(Text, fixed("."), "")) %>% # filter(row_number() < 4) %>% mutate(Text2 = gsub("((|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "", Text)) Which (as you have already shown!) gave me this: # A tibble: 7 × 2 Text Text2 1 "Я досяг того, чого хотів" "Я досяг того, чого хотів" 2 "Мені вдалося зробити бажане" "Мені вдалося зробити бажане" 3 "Я досяг (досягла) того, чого хотів (хотіла)" "Я досяг того, чого хотів " 4 "Я досяг(-ла) речей, яких хотілося досягти" "Я досяг речей, яких хотілося досягти" 5 "Я досяг/ла того, чого хотів/ла" "Я досяг того, чого хотів" 6 "Я досяг\\досягла того, чого прагнув\\прагнула" "Я досяг того, чого прагнув" 7 "Я досягнув(ла) того, чого хотів(ла)" "Я досягнув того, чого хотів" perfect and I will spend some time tomorrow unpacking that regex and trying to drive the learning points into my thick skull! Deeply indebted, as so often here though generally only when I'm reading others questions and the answers! Chris On 27/06/2023 20:48, Bert Gunter wrote: OK, so you want parentheses, not "brackets" + I think I misinterpreted your specification, which I think is actually incomplete. Based on what I think you meant, how does this work: gsub("((|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text) [1] "Я досяг того, чого хотів" "Мені вдалося\nзробити бажане" [3] "Я досяг того, чого хотів " "Я\nдосяг речей, яких хотілося досягти" [5] "Я досяг того, чого\nхотів" "Я досяг того, чого прагнув" [7] "Я\nдосягнув того, чого хотів" If you want it without the \n's, cat the above to get: cat(gsub("((|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text)) Я досяг того, чого хотів Мені вдалося зробити бажане Я досяг того, чого хотів Я досяг речей, яких хотілося досягти Я досяг того, чого хотів Я досяг того, чого прагнув Я досягнув того, чого хотів Cheers, Bert On Tue, Jun 27, 2023 at 11:09 AM Bert Gunter wrote: Does this do it for you (or get you closer): gsub("\\[.*\\]|[] |/ ","",tmp$Text) [1] "Я досяг того, чого хотів" [2] "Мені вдалося\nзробити бажане" [3] "Я досяг (досягла) того, чого хотів (хотіла)" [4] "Я\nдосяг(-ла) речей, яких хотілося досягти" [5] "Я досяг/ла того, чого\nхотів/ла" [6] "Я досяг\\досягла того, чого прагнув\\прагнула" [7] "Я\nдосягнув(ла) того, чого хотів(ла)" On Tue, Jun 27, 2023 at 10:16 AM Chris Evans via R-help wrote: I am sure this is easy for people who are good at regexps but I'm failing with it. The situation is that I have hundreds of lines of Ukrainian translations of some English. They contain things like this: 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" Using dput(): tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" )) Those show four different ways translators have handled gendered words: 1) Ignore them and (I'm guessing) only give the masculine 2) Give the feminine form of the word (or just the feminine suffix) in brackets 3) Give the feminine form/suffix prefixed by a forward slash 4) Give the feminine form/suffix prefixed by backslash (here a double backslash) I would like just to drop all these feminine gendered options. (Don't worry, they'll get back in later.) So I would like to replace 1) anything between brackets with nothing! 2) anything between a forward slash and the next space with nothing 3) anything between a backslash and the next space with nothing but preserving the rest of the text. I have been trying to achieve this using str_replace_all() but I am failing utterly. Here's a silly little example of my failures. This was just trying to get the text I wanted to replace (as I was trying to simplify the issues for my tired wetware): > tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)"
Re: [R] Help with regex replacements
OK, so you want parentheses, not "brackets" + I think I misinterpreted your specification, which I think is actually incomplete. Based on what I think you meant, how does this work: gsub("((|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text) [1] "Я досяг того, чого хотів" "Мені вдалося\nзробити бажане" [3] "Я досяг того, чого хотів ""Я\nдосяг речей, яких хотілося досягти" [5] "Я досяг того, чого\nхотів" "Я досяг того, чого прагнув" [7] "Я\nдосягнув того, чого хотів" If you want it without the \n's, cat the above to get: cat(gsub("((|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text)) Я досяг того, чого хотів Мені вдалося зробити бажане Я досяг того, чого хотів Я досяг речей, яких хотілося досягти Я досяг того, чого хотів Я досяг того, чого прагнув Я досягнув того, чого хотів Cheers, Bert On Tue, Jun 27, 2023 at 11:09 AM Bert Gunter wrote: > Does this do it for you (or get you closer): > > gsub("\\[.*\\]|[] |/ ","",tmp$Text) > [1] "Я досяг того, чого хотів" > [2] "Мені вдалося\nзробити бажане" > [3] "Я досяг (досягла) того, чого хотів (хотіла)" > [4] "Я\nдосяг(-ла) речей, яких хотілося досягти" > [5] "Я досяг/ла того, чого\nхотів/ла" > [6] "Я досяг\\досягла того, чого прагнув\\прагнула" > [7] "Я\nдосягнув(ла) того, чого хотів(ла)" > > On Tue, Jun 27, 2023 at 10:16 AM Chris Evans via R-help < > r-help@r-project.org> wrote: > >> I am sure this is easy for people who are good at regexps but I'm >> failing with it. The situation is that I have hundreds of lines of >> Ukrainian translations of some English. They contain things like this: >> >> 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг >> (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося >> досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого >> прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" >> >> Using dput(): >> >> tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося >> зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я >> досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого >> хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я >> досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = >> c("tbl_df", "tbl", "data.frame" )) Those show four different ways >> translators have handled gendered words: 1) Ignore them and (I'm >> guessing) only give the masculine 2) Give the feminine form of the word >> (or just the feminine suffix) in brackets 3) Give the feminine >> form/suffix prefixed by a forward slash 4) Give the feminine form/suffix >> prefixed by backslash (here a double backslash) I would like just to >> drop all these feminine gendered options. (Don't worry, they'll get back >> in later.) So I would like to replace 1) anything between brackets with >> nothing! 2) anything between a forward slash and the next space with >> nothing 3) anything between a backslash and the next space with nothing >> but preserving the rest of the text. I have been trying to achieve this >> using str_replace_all() but I am failing utterly. Here's a silly little >> example of my failures. This was just trying to get the text I wanted to >> replace (as I was trying to simplify the issues for my tired wetware): > >> tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = >> str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) >> %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin >> `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", >> "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to >> access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run >> `rlang::last_trace()` to see where the error occurred. I have tried >> gurgling around the internet but am striking out so throwing myself on >> the list. Apologies if this is trivial but I'd hate to have to clean >> these hundreds of lines by hand though it's starting to look as if I'd >> achieve that faster by hand than I will by banging my ignorance of R >> regexp syntax on the problem. TIA, Chris >> >> -- >> Chris Evans (he/him) >> Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, >> University of Roehampton, London, UK. >> Work web site: https://www.psyctc.org/psyctc/ >> CORE site: http://www.coresystemtrust.org.uk/ >> Personal site: https://www.psyctc.org/pelerinage2016/ >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] Help with regex replacements
Does this do it for you (or get you closer): gsub("\\[.*\\]|[] |/ ","",tmp$Text) [1] "Я досяг того, чого хотів" [2] "Мені вдалося\nзробити бажане" [3] "Я досяг (досягла) того, чого хотів (хотіла)" [4] "Я\nдосяг(-ла) речей, яких хотілося досягти" [5] "Я досяг/ла того, чого\nхотів/ла" [6] "Я досяг\\досягла того, чого прагнув\\прагнула" [7] "Я\nдосягнув(ла) того, чого хотів(ла)" On Tue, Jun 27, 2023 at 10:16 AM Chris Evans via R-help < r-help@r-project.org> wrote: > I am sure this is easy for people who are good at regexps but I'm > failing with it. The situation is that I have hundreds of lines of > Ukrainian translations of some English. They contain things like this: > > 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг > (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося > досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого > прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" > > Using dput(): > > tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося > зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я > досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого > хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я > досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = > c("tbl_df", "tbl", "data.frame" )) Those show four different ways > translators have handled gendered words: 1) Ignore them and (I'm > guessing) only give the masculine 2) Give the feminine form of the word > (or just the feminine suffix) in brackets 3) Give the feminine > form/suffix prefixed by a forward slash 4) Give the feminine form/suffix > prefixed by backslash (here a double backslash) I would like just to > drop all these feminine gendered options. (Don't worry, they'll get back > in later.) So I would like to replace 1) anything between brackets with > nothing! 2) anything between a forward slash and the next space with > nothing 3) anything between a backslash and the next space with nothing > but preserving the rest of the text. I have been trying to achieve this > using str_replace_all() but I am failing utterly. Here's a silly little > example of my failures. This was just trying to get the text I wanted to > replace (as I was trying to simplify the issues for my tired wetware): > > tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = > str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) > %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin > `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", > "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to > access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run > `rlang::last_trace()` to see where the error occurred. I have tried > gurgling around the internet but am striking out so throwing myself on > the list. Apologies if this is trivial but I'd hate to have to clean > these hundreds of lines by hand though it's starting to look as if I'd > achieve that faster by hand than I will by banging my ignorance of R > regexp syntax on the problem. TIA, Chris > > -- > Chris Evans (he/him) > Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, > University of Roehampton, London, UK. > Work web site: https://www.psyctc.org/psyctc/ > CORE site: http://www.coresystemtrust.org.uk/ > Personal site: https://www.psyctc.org/pelerinage2016/ > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with regex replacements
Thanks Avi (I am a keen follower or your, and other stalwart helpers here). On 27/06/2023 18:27, avi.e.gr...@gmail.com wrote: Chris, Consider breaking up your task into multiple passes. Sorry, I could have explained more of what I had tried. I never know how long to make things here. I had been doing that. My plan was to pick them off, one by one but I think I am banging my head on a fundamental incomprehension on my part. And do them in whatever order preserves what you need. Agree. First, are you talking about brackets as in square brackets, or as in your example, parentheses? Sorry, always get that wrong, parentheses. Mea culpa. If you are sure you have no nested brackets, your requirement seems to be that anything matching [ stuff ] be replaced with nothing. Or if using parentheses, something similar. > 99% sure there are no nested parentheses. However, there are lines with none, one or sometimes (as in the little reprex) more than one set of parentheses. Your issue here is both sets of symbols are special so you must escape them so they are seen as part of the pattern and not the instructions. So, sorry to be stupid but I thought I was doing that using "\(.*\)" Could you reply showing me the correct escaping and the correct replacing? I was using str_replace_all() but happy to use gsub() if that's easier/safer/better. The idea would be to pass through the text once and match all instances on a line and then replace with nothing or whatever is needed. Nothing. But there is no guarantee some of your constructs will be on the same line completely so be wary. Totally agree. I also see that my Emailer (Thunderbird) despite my exhorting it not to, mangled the Email. Have tried to fix that. The mess below should have said: I am sure this is easy for people who are good at regexps but I'm failing with it. The situation is that I have hundreds of lines of Ukrainian translations of some English. They contain things like this: 1"Я досяг того, чого хотів" 2"Мені вдалося зробити бажане" 3"Я досяг (досягла) того, чого хотів (хотіла)" 4"Я досяг(-ла) речей, яких хотілося досягти" 5"Я досяг/ла того, чого хотів/ла" 6"Я досяг\\досягла того, чогопрагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" Using dput(): tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" )) Those show four different ways translators have handled gendered words: 1) Ignore them and (I'm guessing) only give the masculine 2) Give the feminine form of the word (or just the feminine suffix) in brackets 3) Give the feminine form/suffix prefixed by a forward slash 4) Give the feminine form/suffix prefixed by backslash (here a double backslash) I would like just to drop all these feminine gendered options. (Don't worry, they'll get back in later.) So I would like to replace 1) anything between brackets with nothing! 2) anything between a forward slash and the next space with nothing 3) anything between a backslash and the next space with nothing but preserving the rest of the text. I have been trying to achieve this using str_replace_all() but I am failing utterly. Here's a silly little example of my failures. This was just trying to get the text I wanted to replace (as I was trying to simplify the issues for my tired wetware): > tmp %>% + as_tibble() %>% + rename(Text = value) %>% + mutate(Text = str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Error in `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", "\\1")`. Caused by error in `stri_replace_first_regex()`:! Trying to access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run `rlang::last_trace()` to see where the error occurred. I have tried gurgling around the internet but am striking out so throwing myself on the list. Apologies if this is trivial but I'd hate to have to clean these hundreds of lines by hand though it's starting to look as if I'd achieve that faster by hand than I will by banging my ignorance of R regexp syntax on the problem. TIA, Chris -Original Message- From: R-help On Behalf Of Chris Evans via R-help Sent: Tuesday, June 27, 2023 1:16 PM To: r-help@r-project.org Subject: [R] Help with regex replacements I am sure this is easy for people who are good at regexps
Re: [R] Help with regex replacements
Chris, Consider breaking up your task into multiple passes. And do them in whatever order preserves what you need. First, are you talking about brackets as in square brackets, or as in your example, parentheses? If you are sure you have no nested brackets, your requirement seems to be that anything matching [ stuff ] be replaced with nothing. Or if using parentheses, something similar. Your issue here is both sets of symbols are special so you must escape them so they are seen as part of the pattern and not the instructions. The idea would be to pass through the text once and match all instances on a line and then replace with nothing or whatever is needed. But there is no guarantee some of your constructs will be on the same line completely so be wary. -Original Message- From: R-help On Behalf Of Chris Evans via R-help Sent: Tuesday, June 27, 2023 1:16 PM To: r-help@r-project.org Subject: [R] Help with regex replacements I am sure this is easy for people who are good at regexps but I'm failing with it. The situation is that I have hundreds of lines of Ukrainian translations of some English. They contain things like this: 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" Using dput(): tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" )) Those show four different ways translators have handled gendered words: 1) Ignore them and (I'm guessing) only give the masculine 2) Give the feminine form of the word (or just the feminine suffix) in brackets 3) Give the feminine form/suffix prefixed by a forward slash 4) Give the feminine form/suffix prefixed by backslash (here a double backslash) I would like just to drop all these feminine gendered options. (Don't worry, they'll get back in later.) So I would like to replace 1) anything between brackets with nothing! 2) anything between a forward slash and the next space with nothing 3) anything between a backslash and the next space with nothing but preserving the rest of the text. I have been trying to achieve this using str_replace_all() but I am failing utterly. Here's a silly little example of my failures. This was just trying to get the text I wanted to replace (as I was trying to simplify the issues for my tired wetware): > tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run `rlang::last_trace()` to see where the error occurred. I have tried gurgling around the internet but am striking out so throwing myself on the list. Apologies if this is trivial but I'd hate to have to clean these hundreds of lines by hand though it's starting to look as if I'd achieve that faster by hand than I will by banging my ignorance of R regexp syntax on the problem. TIA, Chris -- Chris Evans (he/him) Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, University of Roehampton, London, UK. Work web site: https://www.psyctc.org/psyctc/ CORE site: http://www.coresystemtrust.org.uk/ Personal site: https://www.psyctc.org/pelerinage2016/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with regex replacements
I am sure this is easy for people who are good at regexps but I'm failing with it. The situation is that I have hundreds of lines of Ukrainian translations of some English. They contain things like this: 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)" Using dput(): tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame" )) Those show four different ways translators have handled gendered words: 1) Ignore them and (I'm guessing) only give the masculine 2) Give the feminine form of the word (or just the feminine suffix) in brackets 3) Give the feminine form/suffix prefixed by a forward slash 4) Give the feminine form/suffix prefixed by backslash (here a double backslash) I would like just to drop all these feminine gendered options. (Don't worry, they'll get back in later.) So I would like to replace 1) anything between brackets with nothing! 2) anything between a forward slash and the next space with nothing 3) anything between a backslash and the next space with nothing but preserving the rest of the text. I have been trying to achieve this using str_replace_all() but I am failing utterly. Here's a silly little example of my failures. This was just trying to get the text I wanted to replace (as I was trying to simplify the issues for my tired wetware): > tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run `rlang::last_trace()` to see where the error occurred. I have tried gurgling around the internet but am striking out so throwing myself on the list. Apologies if this is trivial but I'd hate to have to clean these hundreds of lines by hand though it's starting to look as if I'd achieve that faster by hand than I will by banging my ignorance of R regexp syntax on the problem. TIA, Chris -- Chris Evans (he/him) Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, University of Roehampton, London, UK. Work web site: https://www.psyctc.org/psyctc/ CORE site: http://www.coresystemtrust.org.uk/ Personal site: https://www.psyctc.org/pelerinage2016/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.