I thought replacing the spaces following instances of +++,++,+,- with "\n" and then reading with scan should succeed. Like Ivan Krylov I was fairly sure that you meant the minus sign to be "-" rather than "–", but perhaps your were using MS Word as an editor which is inconsistent with effective use of R. If so, learn to use a proper programming editor, and in any case learn to post to rhelp in plain text.
-- David scan(text=gsub("([-+]){1}\\s", "\\1\n", dat), what="", sep="\n") > On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybak...@outlook.com> wrote: > > Hello List, > > I have a dataset consisting of strings that I want to split while saving the > delimiter. > > Some example data: > “leucocyten + gramnegatieve staven +++ grampositieve staven ++” > “leucocyten – grampositieve coccen +” > > I want to split the strings such that I get the following result: > c(“leucocyten +”, “gramnegatieve staven +++”, “grampositieve staven ++”) > c(“leucocyten –“, “grampositieve coccen +”) > > I have tried strsplit with a regular expression with a positive lookahead, > but I am not able to achieve the results that I want. > > I have tried: > as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE) > > Which results in: > c(“leucocyten “, “+”, “gramnegatieve staven “, “+”, “+”, “+”, > “grampositieve staven ++”) > c(“leucocyten “, “–“, “grampositieve coccen +”) > > > Is there a function or regular expression that will make this possible? > > Kind regards, > Emily > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.