[R] package tm: reading XML files

Ad Feelders Tue, 29 May 2012 03:03:23 -0700

Dear fellow R users,

I'm using the package tm for text mining, and have a problem withreading in a corpus from XML files.When I copy the example from "Introduction to the tm package" of thesmall reuters subset "crude", everything goes well, and I get a corpuswith the required meta data.When I read in the entire reuters21578 corpus in XML format however (ora self-created subset thereof) the meta data is lost, and the files areinterpreted as plain text.I use the following command, where the indicated directory contains allreuters 21578 documents as separate XML files:

> reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"),readerContol=list(reader=readReut21578XML))


I'm running R2.15.0 under Windows XP.

Has anybody else encountered this problem and found a cause/solution.

Best regards,

-Ad Feelders

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] package tm: reading XML files

Reply via email to