Here's the code and results.  The corpus is the text version of a single book.  
 (r vs. 3.2)
> docs <- tm_map(docs, stemDocument)
> dtm <- DocumentTermMatrix(docs)
> freq <- colSums(as.matrix(dtm))
> ord <- order(freq)
> freq[tail(ord)]
one experi   will   can lucid dream
287   312   363   452   1018   2413
> freq[head(ord)]
abbey abdomin   abdu abraham absent   abus
  1       1       1       1       1       1
> dim(dtm)
[1]   1 5265
> dtms <- removeSparseTerms(dtm, 0.1)
> dim(dtms)
[1]   1 5265
> dtms <- removeSparseTerms(dtm, 0.001)
> dim(dtms)
[1]   1 5265
> dtms <- removeSparseTerms(dtm, 0.9)
> dim(dtms)
[1]   1 5265
> 

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to