Some hints: list.files() will return the list of files in a directory readLines() will allow you to load text files as vectors of lines strsplit() will allow you to break lines into words c(x,y) concatenates vectors x and y ; x <- c(x,y) appends vector y to x unique() will allow you to get rid of repeats And the Map/Reduce family of functions will allow you to write what you want in about 15 lines of concise R code with no loops.
Hope it helps, Cheers, jcb! On Tue, Jul 19, 2011 at 11:11 AM, Alexander James Rickett <ack.van...@gmail.com> wrote: > Hello everyone, > > I'm doing some JGR (a gui frontend for R) development, specifically adding > functionality from tm. In order to enable users to select some text files > from a file dialog, and turn them into a corpus, I need to be able to > generate a corpus using a *SINGLE* text file as a single document, and to > append a new document to an existing corpora. I know if I could read files > into single character vectors I'd be in business, but I can't find how to do > this either. This seems like a no-brainer, so I'm at my wits' end. > > Here's pseudo code of what I'd like to be able to do: > > ########################################## >> corp1doc <- Corpus(singleTextDocSource("path/to/doc")) #read in 1 text doc >> as a 1-document corpus >> corp1doc > A corpus with 1 text document > >> corp1doc[[2]] <- AnotherSingleTextDoc("path/to/doc") #append a second >> document to the same corpus >> corp1doc > A corpus with 2 text documents > ########################################## > > I can almost do this with dirSource, by setting pattern='filename', but this > requires me to also to separate the path to the enclosing directory, which > shouldn't be necessary. > > Thanks for taking a look! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.