Dear all, Lets say I have the following:
> x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are > you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at > statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: > @Eve I'm sure they'll sort it out :)", "blahblah") > x [1] "Eve: Going to try something new today..." [2] "Adam: Hey @Eve, how are you finding R? #rstats" [3] "Eve: @Adam, It's awesome, so much better at statistics that \n#Excel ever was! @Cain & @Able disagree though :(" [4] "Adam: @Eve I'm sure they'll sort it out :)" [5] "blahblah" I would like to come up with a data frame which looks like this (pulling out the usernames and #tags): > data.frame(Msg = x, Source = c("Eve", "Adam", "Eve", "Adam", NA), Mentions = > c(NA, "Eve", "Adam, Cain, Able", "Eve", NA), HashTags = c(NA, "rstats", > "Excel", NA, NA)) The best I can do so far is: source <- lapply(x, function (x) { tmp <- strsplit(x, ":", fixed = TRUE) if(length(tmp[[1]]) < 2) { tmp <- c(NA, tmp) } return(tmp[[1]][1]) } ) source <- unlist(source) [1] "Eve" "Adam" "Eve" "Adam" NA I can't work out how to extract the usernames starting with '@' or the #tags. I can identify them using gsub and replace them, but I don't know how to just extract those terms only, e.g. sort of the opposite of the following > gsub("@([A-Za-z0-9_]+)", "@[...]", x) [1] "Eve: Going to try something new today..." [2] "Adam: Hey @[...], how are you finding R? #rstats" [3] "Eve: @[...], It's awesome, so much better at statistics that #Excel ever was! @[...] & @[...] disagree though :(" [4] "Adam: @[...] I'm sure they'll sort it out :)" [5] "blahblah" and > gsub("#([A-Za-z0-9_]+)", "#[...]", x) [1] "Eve: Going to try something new today..." [2] "Adam: Hey @Eve, how are you finding R? #[...]" [3] "Eve: @Adam, It's awesome, so much better at statistics that #[...] ever was! @Cain & @Able disagree though :(" [4] "Adam: @Eve I'm sure they'll sort it out :)" [5] "blahblah" I hope that makes sense, and thank you kindly in advance for your time. Tony Breyal ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.