Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit : > I did exactly what you mentioned... tried subset of these documents > and found out there were some junk non-txt files which were causing > this issue. Everything worked fine with dirsource once I deleted them > from the dir. > But I feel these functions should also tell what file they are failing > at.... I have ended up debugging with sub sets of input one too many > times. Good. Could you send us (or maybe privately to me) at least an excerpt of the file that is enough to reproduce the bug? Indeed it would be nice to get a more explicit error message from tm if possible.
Regards > > On Aug 18, 2013 9:01 AM, "Milan Bouchet-Valat" <nalimi...@club.fr> > wrote: > Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit : > > It contains all text files which were converted from doc, > docx, ppt > > etc. using libreoffice. > > Some of them are non-english text documents. > > > > > > Sorry I cannot share the corpus.. but if someone can shed > light on > > what might cause this error then I can try to eliminate > those > > documents if some specific docs are causing it. > I think you should go the other way round: try with only one > document > and see if it works, and do enough attempts to find out in > what cases it > works and in what cases it fails. If it always fails, try with > examples > provided by tm, and then with parts of your documents. > > I don't think it makes sense to try to use VectorSource() as > it would > imply reimplementing DirSource(). > > > Regards > > > On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat > > <nalimi...@club.fr> wrote: > > Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale > a écrit : > > > I am trying to use the text mining package ... I > keep > > getting this error : > > > > > > rm(list=ls()) > > > library(tm) > > > sourceDir <- "Z:\\projectk_viz\\docs_to_index" > > > ovid <- Corpus(DirSource(sourceDir),readerControl > = > > list(language = "lat")) > > > > > > Error in if (vectorized && (length <= 0)) > stop("vectorized > > sources must > > > have positive length") : missing value where > TRUE/FALSE > > needed > > > > > > I am not sure what it means. > > > > The posting guide asks for a reproducible example. > If you > > cannot make > > available to us the contents of sourceDir, at least > you should > > tell us > > what kind of files it contains. Have you tried with > only some > > of the > > files the directory contains ? > > > > > > Regards > > > > > --ajinkya > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, > reproducible > > code. > > > > > > > > > > > > -- > > > > Sincerely, > > Ajinkya > > http://ajinkya.info > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.