I think I know why it works faster, cause VectorSource in above code only takes the files names as a corpus and not the contents of the files :D duh!
Any suggestions to create a vector source out of contents of the txt files ? On Sat, Aug 17, 2013 at 1:59 PM, Ajinkya Kale <kaleajin...@gmail.com> wrote: > Funny, it works fine if I use VectorSource > ovid <- Corpus(VectorSource(list.files(sourceDir)[1:1253]), readerControl > = list(language = "lat")) > So I tried only executing > DirDource(sourceDir) and that fails with the > error i mentioned earlier. So its not a problem with Corpus() which I > thought initially it was. > > Also, I noticed that VectorSource works way more faster than having a > DirSource there. > Any particular reason ? > > > On Sat, Aug 17, 2013 at 11:16 AM, Ajinkya Kale <kaleajin...@gmail.com>wrote: > >> It contains all text files which were converted from doc, docx, ppt etc. >> using libreoffice. >> Some of them are non-english text documents. >> >> Sorry I cannot share the corpus.. but if someone can shed light on what >> might cause this error then I can try to eliminate those documents if some >> specific docs are causing it. >> >> >> On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat >> <nalimi...@club.fr>wrote: >> >>> Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit : >>> > I am trying to use the text mining package ... I keep getting this >>> error : >>> > >>> > rm(list=ls()) >>> > library(tm) >>> > sourceDir <- "Z:\\projectk_viz\\docs_to_index" >>> > ovid <- Corpus(DirSource(sourceDir),readerControl = list(language = >>> "lat")) >>> > >>> > Error in if (vectorized && (length <= 0)) stop("vectorized sources must >>> > have positive length") : missing value where TRUE/FALSE needed >>> > >>> > I am not sure what it means. >>> The posting guide asks for a reproducible example. If you cannot make >>> available to us the contents of sourceDir, at least you should tell us >>> what kind of files it contains. Have you tried with only some of the >>> files the directory contains ? >>> >>> >>> Regards >>> >>> > --ajinkya >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> -- >> >> Sincerely, >> Ajinkya >> http://ajinkya.info >> > > > > -- > > Sincerely, > Ajinkya > http://ajinkya.info > -- Sincerely, Ajinkya http://ajinkya.info [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.