Some hints:
list.files() will return the list of files in a directory
readLines() will allow you to load text files as vectors of lines
strsplit() will allow you to break lines into words
c(x,y) concatenates vectors x and y ; x <- c(x,y) appends vector y to x
unique() will allow you to get rid of repeats
And the Map/Reduce family of functions will allow you to write what
you want in about 15 lines of concise R code with no loops.

Hope it helps,
Cheers,
jcb!

On Tue, Jul 19, 2011 at 11:11 AM, Alexander James Rickett
<ack.van...@gmail.com> wrote:
> Hello everyone,
>
> I'm doing some JGR (a gui frontend for R) development, specifically adding 
> functionality from tm.  In order to enable users to select some text files 
> from a file dialog, and turn them into a corpus, I need to be able to 
> generate a corpus using a *SINGLE* text file as a single document, and to 
> append a new document to an existing corpora.  I know if I could read files 
> into single character vectors I'd be in business, but I can't find how to do 
> this either.  This seems like a no-brainer, so I'm at my wits' end.
>
> Here's pseudo code of what I'd like to be able to do:
>
> ##########################################
>> corp1doc <- Corpus(singleTextDocSource("path/to/doc")) #read in 1 text doc 
>> as a 1-document corpus
>> corp1doc
>        A corpus with 1 text document
>
>> corp1doc[[2]] <- AnotherSingleTextDoc("path/to/doc") #append a second 
>> document to the same corpus
>> corp1doc
>        A corpus with 2 text documents
> ##########################################
>
> I can almost do this with dirSource, by setting pattern='filename', but this 
> requires me to also to separate the path to the enclosing directory, which 
> shouldn't be necessary.
>
> Thanks for taking a look!

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to