No, it was just on my desktop (and on a network drive, and in a temp folder on my c drive).
There have been some new policies put into place at work though, and perhaps that includes more / some monitoring software, but I don't know. Sent from my iPhone On Dec 7, 2011, at 4:11 PM, peter dalgaard <pda...@gmail.com> wrote: > > On Dec 7, 2011, at 22:37 , R. Michael Weylandt wrote: > >> R 2.13.2 on Mac OS X 10.5.8 takes about 1.8s to read the file >> verbatim: system.time(read.table("test2.txt")) > > About 2.3s with 2.14 on a 1.86 GHz MacBook Air 10.6.8. > > Gene, are you by any chance storing the file in a heavily virus-scanned > system directory? > > -pd > >> Michael >> >> 2011/12/7 Gene Leynes <gley...@gmail.com>: >>> Peter, >>> >>> You're quite right; it's nearly impossible to make progress without a >>> working example. >>> >>> I created an ** extremely simplified ** example for distribution. The real >>> data has numeric, character, and boolean classes. >>> >>> The file still takes 25.08 seconds to read, despite it's small size. >>> >>> I neglected to mention that I'm using R 2.13.0 and I"m on a windows 7 >>> machine (not that it should particularly matter with this type of data / >>> functions). >>> >>> ## The code: >>> options(stringsAsFactors=FALSE) >>> system.time(dat <- read.table('test2.txt', nrows=-1, sep='\t', header=TRUE)) >>> str(dat, 0) >>> >>> >>> Thanks again! >>> >>> >>> >>> On Wed, Dec 7, 2011 at 1:21 AM, peter dalgaard <pda...@gmail.com> wrote: >>> >>>> >>>> On Dec 6, 2011, at 22:33 , Gene Leynes wrote: >>>> >>>>> Mark, >>>>> >>>>> Thanks for your suggestions. >>>>> >>>>> That's a good idea about the NULL columns; I didn't think of that. >>>>> Surprisingly, it didn't have any effect on the time. >>>> >>>> Hmm, I think you want "character" and "NULL" there (i.e., quoted). Did you >>>> fix both? >>>> >>>>>> read.table(whatever, as.is=TRUE, colClasses = c(rep(character,4), >>>>>> rep(NULL,3696)). >>>> >>>> As a general matter, if you want people to dig into this, they need some >>>> paraphrase of the file to play with. Would it be possible to set up a small >>>> R program that generates a data file which displays the issue? Everything I >>>> try seems to take about a second to read in. >>>> >>>> -pd >>>> >>>>> >>>>> This problem was just a curiosity, I already did the import using Excel >>>> and >>>>> VBA. I was just going to illustrate the power and simplicity of R, but >>>> it >>>>> ironically it's been much slower and harder in R... >>>>> The VBA was painful and messy, and took me over an hour to write; but at >>>>> least it worked quickly and reliably. >>>>> The R code was clean and only took me about 5 minutes to write, but the >>>> run >>>>> time was prohibitively slow! >>>>> >>>>> I profiled the code, but that offers little insight to me. >>>>> >>>>> Profile results with 10 line file: >>>>> >>>>>> summaryRprof("C:/Users/gene.leynes/Desktop/test.out") >>>>> $by.self >>>>> self.time self.pct total.time total.pct >>>>> scan 12.24 53.50 12.24 53.50 >>>>> read.table 10.58 46.24 22.88 100.00 >>>>> type.convert 0.04 0.17 0.04 0.17 >>>>> make.names 0.02 0.09 0.02 0.09 >>>>> >>>>> $by.total >>>>> total.time total.pct self.time self.pct >>>>> read.table 22.88 100.00 10.58 46.24 >>>>> scan 12.24 53.50 12.24 53.50 >>>>> type.convert 0.04 0.17 0.04 0.17 >>>>> make.names 0.02 0.09 0.02 0.09 >>>>> >>>>> $sample.interval >>>>> [1] 0.02 >>>>> >>>>> $sampling.time >>>>> [1] 22.88 >>>>> >>>>> >>>>> Profile results with 250 line file: >>>>> >>>>>> summaryRprof("C:/Users/gene.leynes/Desktop/test.out") >>>>> $by.self >>>>> self.time self.pct total.time total.pct >>>>> scan 23.88 68.15 23.88 68.15 >>>>> read.table 10.78 30.76 35.04 100.00 >>>>> type.convert 0.30 0.86 0.32 0.91 >>>>> character 0.02 0.06 0.02 0.06 >>>>> file 0.02 0.06 0.02 0.06 >>>>> lapply 0.02 0.06 0.02 0.06 >>>>> unlist 0.02 0.06 0.02 0.06 >>>>> >>>>> $by.total >>>>> total.time total.pct self.time self.pct >>>>> read.table 35.04 100.00 10.78 30.76 >>>>> scan 23.88 68.15 23.88 68.15 >>>>> type.convert 0.32 0.91 0.30 0.86 >>>>> sapply 0.04 0.11 0.00 0.00 >>>>> character 0.02 0.06 0.02 0.06 >>>>> file 0.02 0.06 0.02 0.06 >>>>> lapply 0.02 0.06 0.02 0.06 >>>>> unlist 0.02 0.06 0.02 0.06 >>>>> simplify2array 0.02 0.06 0.00 0.00 >>>>> >>>>> $sample.interval >>>>> [1] 0.02 >>>>> >>>>> $sampling.time >>>>> [1] 35.04 >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Dec 6, 2011 at 2:34 PM, Mark Leeds <marklee...@gmail.com> wrote: >>>>> >>>>>> hi gene: maybe someone else will reply with some subtleties that I'm >>>> not >>>>>> aware of. one other thing >>>>>> that might help: if you know which columns you want , you can set the >>>>>> others to NULL through >>>>>> colClasses and this should speed things up also. For example, say you >>>> knew >>>>>> you only wanted the >>>>>> first four columns and they were character. then you could do, >>>>>> >>>>>> read.table(whatever, as.is=TRUE, colClasses = c(rep(character,4), >>>>>> rep(NULL,3696)). >>>>>> >>>>>> hopefully someone else will say something that does the trick. it seems >>>>>> odd to me as far as the >>>>>> difference in timings ? good luck. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Dec 6, 2011 at 1:55 PM, Gene Leynes <gley...@gmail.com> wrote: >>>>>> >>>>>>> Mark, >>>>>>> >>>>>>> Thank you for the reply >>>>>>> >>>>>>> I neglected to mention that I had already set >>>>>>> options(stringsAsFactors=FALSE) >>>>>>> >>>>>>> I agree, skipping the factor determination can help performance. >>>>>>> >>>>>>> The main reason that I wanted to use read.table is because it will >>>>>>> correctly determine the column classes for me. I don't really want to >>>>>>> specify 3700 column classes! (I'm not sure what they are anyway). >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 6, 2011 at 12:40 PM, Mark Leeds <marklee...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> Hi Gene: Sometimes using colClasses in read.table can speed things up. >>>>>>>> If you know what your variables are ahead of time and what you want >>>> them to >>>>>>>> be, this allows you to be specific by specifying, character or >>>> numeric, >>>>>>>> etc and often it makes things faster. others will have more to say. >>>>>>>> >>>>>>>> also, if most of your variables are characters, R will try to turn >>>>>>>> convert them into factors by default. If you use as.is = TRUE it >>>> won't >>>>>>>> do this and that might speed things up also. >>>>>>>> >>>>>>>> >>>>>>>> Rejoinder: above tidbits are just from experience. I don't know if >>>>>>>> it's in stone or a hard and fast rule. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 6, 2011 at 1:15 PM, Gene Leynes <gley...@gmail.com> >>>> wrote: >>>>>>>> >>>>>>>>> ** Disclaimer: I'm looking for general suggestions ** >>>>>>>>> I'm sorry, but can't send out the file I'm using, so there is no >>>>>>>>> reproducible example. >>>>>>>>> >>>>>>>>> I'm using read.table and it's taking over 30 seconds to read a tiny >>>>>>>>> file. >>>>>>>>> The strange thing is that it takes roughly the same amount of time if >>>>>>>>> the >>>>>>>>> file is 100 times larger. >>>>>>>>> >>>>>>>>> After re-reviewing the data Import / Export manual I think the best >>>>>>>>> approach would be to use Python, or perhaps the readLines function, >>>> but >>>>>>>>> I >>>>>>>>> was hoping to understand why the simple read.table approach wasn't >>>>>>>>> working >>>>>>>>> as expected. >>>>>>>>> >>>>>>>>> Some relevant facts: >>>>>>>>> >>>>>>>>> 1. There are about 3700 columns. Maybe this is the problem? Still >>>>>>>>> the >>>>>>>>> >>>>>>>>> file size is not very large. >>>>>>>>> 2. The file encoding is ANSI, but I'm not specifying that in the >>>>>>>>> >>>>>>>>> function. Setting fileEncoding="ANSI" produces an "unsupported >>>>>>>>> conversion" >>>>>>>>> error >>>>>>>>> 3. readLines imports the lines quickly >>>>>>>>> 4. scan imports the file quickly also >>>>>>>>> >>>>>>>>> >>>>>>>>> Obviously, scan and readLines would require more coding to identify >>>>>>>>> columns, etc. >>>>>>>>> >>>>>>>>> my code: >>>>>>>>> system.time(dat <- read.table('C:/test.txt', nrows=-1, sep='\t', >>>>>>>>> header=TRUE)) >>>>>>>>> >>>>>>>>> It's taking 33.4 seconds and the file size is only 315 kb! >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Gene >>>>>>>>> >>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>> >>>>>>>>> ______________________________________________ >>>>>>>>> R-help@r-project.org mailing list >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>> PLEASE do read the posting guide >>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> -- >>>> Peter Dalgaard, Professor, >>>> Center for Statistics, Copenhagen Business School >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>> Phone: (+45)38153501 >>>> Email: pd....@cbs.dk Priv: pda...@gmail.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.