Hi R-Help!
I am a newbie in R and computer science in general. I have done the basic 
readings of introduction to R and TM packages. I am using R Foundation on a 
windows 7 system. 
I have been given a project which requires me to search annual reports of 76 
companies for multiple key phrases such as "finance program" or "improving 
working capital". The goal is to see how many times each key phrase appears in 
each annual report. 
The following script is what I have accomplished thus far:
#load tm package library(tm)
#set working directory of text files of annual 
reportssetwd('C:/Users/a446578/Desktop/Annual Reports Text Files')
dest<-("C:/Users/a446578/Desktop/Annual Reports Text Files")
#create corpus of 76 annual reports text files 
a<-Corpus(DirSource("C:/Users/a446578/Desktop/Annual Reports Text Files"), 
readerControl = list(language="lat")
#cleaning corpus a<-tm_map(a, removeNumbers)a<-tm_map(a, 
removePunctuation)a<-tm_map(a, content_transformer(tolower))a<-tm_map(a, 
removeWords, stopwords("english"))
#create the term document matrix dtm<-DocumentTermMatrix(a)
#searching for key phrases tm_term_score(dtm, c("finance program", "improving 
working capital", "reduce days", "increase trade receivables"))
Everything runs smoothly apart from the last step (#searching for key phrases). 
I understand that the tm_term_score function is only used for single key words 
and not phrases. How can I achieve the same result the tm_term_score function 
gives me, but with phrases instead of words?
I have posted an almost identical question on another forum but was not able to 
comprehend the response. I trust you guys at R-help can give me a good solution 
able to be understood by someone as weak as I am at R. 
Thanks a lot guys!Warwivck
 



                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to