how bout using read.table(... , sep=" ").

That would give you a vector of single words.  then

grepl("\\[[9-z]+\\]",x)

will return a boolean vector


> x<-c('test','[bracket]','hi]','[blah','foo','[bar]')
> grepl('\\[[9-z]+\\]',x)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE
> x[grepl('\\[[9-z]+\\]',x)]
[1] "[bracket]" "[bar]"

You might need a more complex reg-ex to catch them all incase of
([citation]) instances for example.

Justin

On Tue, Jan 24, 2012 at 6:52 AM, mdvaan <mathijsdev...@gmail.com> wrote:

> Hi,
>
> I have a series of MS word files and each file contains plain text. From
> these texts I would like to extract only those elements (read: words) that
> are between square brackets. Example of a text:
>
> Most fundamentally, it has led to an effort to clarify the organizational
> form concept. According to them [see also Smith, Jones and Carroll 2002],
> categories emerge as audience members recognize dissimilarities among
> groups
> of consumers and label them as members of a common set [Nicol 2000].
>
> Now I would like to get the following selection:
>
> see also Smith, Jones and Carroll 2002
> Nicol 2000
>
> Any ideas on how to do this? What would be the best way to import the text
> in R? The entire text as an element in a dataframe? Thank you very much!
>
> Best,
>
> Mathijs
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to