how bout using read.table(... , sep=" "). That would give you a vector of single words. then
grepl("\\[[9-z]+\\]",x) will return a boolean vector > x<-c('test','[bracket]','hi]','[blah','foo','[bar]') > grepl('\\[[9-z]+\\]',x) [1] FALSE TRUE FALSE FALSE FALSE TRUE > x[grepl('\\[[9-z]+\\]',x)] [1] "[bracket]" "[bar]" You might need a more complex reg-ex to catch them all incase of ([citation]) instances for example. Justin On Tue, Jan 24, 2012 at 6:52 AM, mdvaan <mathijsdev...@gmail.com> wrote: > Hi, > > I have a series of MS word files and each file contains plain text. From > these texts I would like to extract only those elements (read: words) that > are between square brackets. Example of a text: > > Most fundamentally, it has led to an effort to clarify the organizational > form concept. According to them [see also Smith, Jones and Carroll 2002], > categories emerge as audience members recognize dissimilarities among > groups > of consumers and label them as members of a common set [Nicol 2000]. > > Now I would like to get the following selection: > > see also Smith, Jones and Carroll 2002 > Nicol 2000 > > Any ideas on how to do this? What would be the best way to import the text > in R? The entire text as an element in a dataframe? Thank you very much! > > Best, > > Mathijs > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.