Hello,
I want to use the SnowballStemmer on a collection of plain text documents.
However, when I apply it to my corpus using the tm_map function it only stems
the last word of each document (The problem is the for wordStem and
stemDocument does not work at all). An example:
> path <- c("c:\path\to\directory") # collection of plain text documents
> corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain,
> language = "en_US" , load = T))
> inspect(corp)
A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
$`1.txt`
running runs runners
$`2.txt`
happyness happies
> corp2<-tm_map(corp, SnowballStemmer)
> inspect(corp2)
A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
$`1.txt`
[1] running runs runn
$`2.txt`
[1] happyness happi
How can I get the stemming function to work?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.