[R] gdata selectively not working
I can use gdata to successfully read in the example Excel file, but not any other excel files. Why might this be the case? It seems that the problem has something to do with opening the database but no indication as to what the problem is. So i'm at a loss of how to fix it. library(gdata) gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. snip test - read.xls(C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx, verbose=T) Using perl at C:\Perl64\bin\perl.exe Using perl at C:\Perl64\bin\perl.exe Converting xls file C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx to csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv ... Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv 1 '... Loading 'C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx'... Done. Orignal Filename: C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx Number of Sheets: 4 Writing sheet number 1 ('Sheet First') to file 'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv' Minrow=0 Maxrow=7 Mincol=0 Maxcol=2 0 Done. Reading csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv ... Done. This tells me that perl can be found, used, and my local temp directory can be written/read to just fine. Now to try to read one of my own files. test - read.xls(C:/Dropbox/Animals/LARPBO/Database.xlsx, verbose=T) Using perl at C:\Perl64\bin\perl.exe Using perl at C:\Perl64\bin\perl.exe Converting xls file C:/Dropbox/Animals/LARPBO/Database.xlsx to csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv ... Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1 '... Unable to open file 'C:/Dropbox/Animals/LARPBO/Database.xlsx'. 2 Done. Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv' missing! In addition: Warning message: running command 'C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1' had status 2 Error in file.exists(tfn) : invalid 'file' argument So it appears that it's a problem with the original Excel file. But there's nothing that tells me what the problem actually is. Thanks -Robin Jeffries [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gdata selectively not working
T hank you Paul, that was the problem. I have installed R into Dropbox and so am aware of access issues, I have to pause syncing whenever I install a new package. I assumed that would work here as well. Unfortunately even when DB sync is paused (or turned off) I still can't read the file into Gdata. If I move it into another location, say C:/Temp then its fine. Annoying, but I will have to work around it for now. -Robin On Tue, Apr 2, 2013 at 4:41 AM, Paul Johnson pauljoh...@gmail.com wrote: On Apr 2, 2013 1:28 AM, Robin Jeffries robin.a.jeffr...@gmail.com wrote: I can use gdata to successfully read in the example Excel file, but not any other excel files. Why might this be the case? It seems that the problem has something to do with opening the database but no indication as to what the problem is. So i'm at a loss of how to fix it. would you please try this NOT a network share (Dropbox). I suspect File access issues cause this. Lots of details under the hood there. Just check most obvious problem first. Next will need you give link to xls file in question. The gdata functions always work for me... Pj library(gdata) gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. snip test - read.xls(C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx, verbose=T) Using perl at C:\Perl64\bin\perl.exe Using perl at C:\Perl64\bin\perl.exe Converting xls file C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx to csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv ... Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv 1 '... Loading 'C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx'... Done. Orignal Filename: C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx Number of Sheets: 4 Writing sheet number 1 ('Sheet First') to file 'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv' Minrow=0 Maxrow=7 Mincol=0 Maxcol=2 0 Done. Reading csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv ... Done. This tells me that perl can be found, used, and my local temp directory can be written/read to just fine. Now to try to read one of my own files. test - read.xls(C:/Dropbox/Animals/LARPBO/Database.xlsx, verbose=T) Using perl at C:\Perl64\bin\perl.exe Using perl at C:\Perl64\bin\perl.exe Converting xls file C:/Dropbox/Animals/LARPBO/Database.xlsx to csv file C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv ... Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1 '... Unable to open file 'C:/Dropbox/Animals/LARPBO/Database.xlsx'. 2 Done. Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv' missing! In addition: Warning message: running command 'C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/ xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1' had status 2 Error in file.exists(tfn) : invalid 'file' argument So it appears that it's a problem with the original Excel file. But there's nothing that tells me what the problem actually is. Thanks -Robin Jeffries [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting with missing data
Simply put, I want to subset the data frame 'a' where 'y=0'. a - as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0))) a x y 1 1 1 2 2 0 3 3 NA 4 4 1 5 5 0 6 6 NA 7 7 NA 8 8 1 9 9 1 10 10 0 names(a) [1] x y table(a$y) 0 1 3 4 table(a$y, useNA=always) 01 NA 343 b - a[a$y==0,] b x y 2 2 0 NA NA NA 5 5 0 NA.1 NA NA NA.2 NA NA 10 10 0 is(a$y) [1] numeric vector Instead of only pulling the rows where a$y==0, i'm getting where they're 0, OR NA. ? Again I feel like either something was changed when I wasn't looking.. or I'm reaaly forgetting something important. Thanks, Robin Jeffries MS, DrPH Candidate Department of Biostatistics, UCLA 530-633-STAT(7828) rjeffr...@ucla.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Resources for utilizing multiple processors
Hello, I know of some various methods out there to utilize multiple processors but am not sure what the best solution would be. First some things to note: I'm running dependent simulations, so direct parallel coding is out (multicore, doSnow, etc). I'm on Windows, and don't know C. I don't plan on learning C or any of the *nix languages. My main concern deals with Multiple analyses on large data sets. By large I mean that when I'm done running 2 simulations R is using ~3G of RAM, the remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic to compare the two resulting samples, grinding the process to a halt. I'd like to have separate cores simultaneously run each analysis. That will save on time and I'll have to ponder the BGR calculation problem another way. Can R temporarily use HD space to write calculations to instead of RAM? The second concern boils down to whether or not there is a way to split up dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and better yet, a third to run all the pre-and post- processing tidbits! So if anyone has any suggestions as to a direction I can look into, it would be appreciated. Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-633-STAT(7828) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with one of those apply functions
Hello there, I'm still struggling with the *apply commands. I have 5 people with id's from 10 to 14. I have varying amounts (nrep) of repeated outcome (value) measured on them. nrep - 1:5 id- rep(c(p1, p2, p3, p4, p5), nrep) value - rnorm(length(id)) I want to create a new vector that contains the sum of the values per person. subject.value[1] - value[1]# 1 measurement subject.value[2] - sum(value[2:3]) # the next 2 measurements ... subject.value[5] - sum(value[11:15]) # the next 5 measurements I'd imagine it'll be some sort of *apply(value, nrep, sum) but I can't seem to land on the right format. Can someone give me a heads up as to what the correct syntax and function is? Danke, Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with one of those apply functions
Thanks Steve, I needed the alternative. tapply worked for my toy example, but it didn't for my real example. it might be b/c it was in a data frame, but i'm not sure. Using plyr did work however. Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 On Wed, Feb 2, 2011 at 2:34 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Wed, Feb 2, 2011 at 4:08 PM, Robin Jeffries rjeffr...@ucla.edu wrote: Hello there, I'm still struggling with the *apply commands. I have 5 people with id's from 10 to 14. I have varying amounts (nrep) of repeated outcome (value) measured on them. nrep - 1:5 id- rep(c(p1, p2, p3, p4, p5), nrep) value - rnorm(length(id)) I want to create a new vector that contains the sum of the values per person. subject.value[1] - value[1]# 1 measurement subject.value[2] - sum(value[2:3]) # the next 2 measurements ... subject.value[5] - sum(value[11:15]) # the next 5 measurements I'd imagine it'll be some sort of *apply(value, nrep, sum) but I can't seem to land on the right format. Can someone give me a heads up as to what the correct syntax and function is? In addition to tapply (as Phil pointed out), you can look at the functions in plyr. I somehow find them more intuitive, at times, then their sister base functions, especially since more often than not you'll have your data in a data.frame. For instance: R set.seed(123) R nrep - 1:5 R id - rep(c(p1, p2, p3, p4, p5), nrep) R value - rnorm(length(id)) R DF - data.frame(id=id, value=value) R tapply(value, id, sum) p1 p2 p3 p4 p5 -0.5604756 1.3285308 1.9148611 -1.9366599 1.5395087 R library(plyr) R ddply(DF, .(id), summarize, total=sum(value)) id total 1 p1 -0.5604756 2 p2 1.3285308 3 p3 1.9148611 4 p4 -1.9366599 5 p5 1.5395087 In this case, though, I'll grant you that tapply is simpler if you already know how to use it. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contacthttp://cbio.mskcc.org/%7Elianos/contact [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing pretty' vectors in Sweave
Ah! I was always trying collapse with sep and other options. Not by itself. Perfect! And yes, that was my bad example. Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 On Tue, Jan 18, 2011 at 10:27 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: Hi Robin, Have you looked at the 'collapse' argument to paste? something like: myvec - paste(1:4, collapse = , ) Might do what you want. Also maybe ?bquote or the like to get rid of quotes possibly (I'm not in a position to try presently). Side note, it is really probably best not to use 'c' as a variable name since it is such a fundamental function. Cheers, Josh On Jan 18, 2011, at 21:46, Robin Jeffries rjeffr...@ucla.edu wrote: I am trying to print a nice looking vector in Sweave. c - 1:4 I want to see (1, 2, 3, 4) in TeX. . If I use paste(c, ,, sep=) I get 1, 2, 3, 4, If use cat(c, sep=,) I can't seem to assign it to an object, 1,2,3,4 myvec - cat(c, sep=,) 1,2,3,4 myvec NULL and if I bypass the object assignment and put My vector is (\Sweave{cat(c, sep=,)}). prints out My vector is (). Suggestions? Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Printing pretty' vectors in Sweave
I am trying to print a nice looking vector in Sweave. c - 1:4 I want to see (1, 2, 3, 4) in TeX. . If I use paste(c, ,, sep=) I get 1, 2, 3, 4, If use cat(c, sep=,) I can't seem to assign it to an object, 1,2,3,4 myvec - cat(c, sep=,) 1,2,3,4 myvec NULL and if I bypass the object assignment and put My vector is (\Sweave{cat(c, sep=,)}). prints out My vector is (). Suggestions? Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How is MissInfo calculated? (mitools)
What does missInfo compute and how is it computed? There is only 1 observation missing the ethnic3 variable. There is no other missing data. N=1409 summary(MIcombine(mod1)) Multiple imputation results: with(rt.imp, glm(G1 ~ stdage + female + as.factor(ethnic3) + u, family = binomial())) MIcombine.default(mod1) results se (lower upper)missInfo (Intercept) -0.408954530.14743928 -0.70805544 -0.1098536 53 % stdage 0.139913600.06046537 0.02140364 0.2584236 0 % female -0.055876350.11083362 -0.27310639 0.1613537 0 % as.factor(ethnic3)1 0.172978350.19556664 -0.21032531 0.5562820 0 % as.factor(ethnic3)2 0.635070200.18017975 0.28192410 0.9882163 0 % u -0.013229760.18896230 -0.40291914 0.3764596 64 % Thanks, Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GC verbose=false still showing report
I must be reading the help file for gc() wrong. I thought it said that gc(verbose=FALSE) will run the garbage collection without printing the Ncells/Vcells summary. However, this is what I get: gc(verbose = FALSE) used (Mb) gc trigger (Mb) max used (Mb) Ncells 267097 14.3 531268 28.4 531268 28.4 Vcells 429302 3.3 20829406 159.0 55923977 426.7 I'm embedding this in an Sweave/TeX file, so I *really* can't have this printing out. Suggestions other than manually editing the TeX file? Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GC verbose=false still showing report
invisible(gc()) worked perfectly. Thanks Jeff. @ Josh: I know how to toggle showing/hiding command echos, but I haven't figured out how to toggle on/off any printed output. On Sat, Oct 9, 2010 at 5:10 PM, Robin Jeffries rjeffr...@ucla.edu wrote: I must be reading the help file for gc() wrong. I thought it said that gc(verbose=FALSE) will run the garbage collection without printing the Ncells/Vcells summary. However, this is what I get: gc(verbose = FALSE) used (Mb) gc trigger (Mb) max used (Mb) Ncells 267097 14.3 531268 28.4 531268 28.4 Vcells 429302 3.3 20829406 159.0 55923977 426.7 I'm embedding this in an Sweave/TeX file, so I *really* can't have this printing out. Suggestions other than manually editing the TeX file? Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice xyplots plots with multiple lines per cell
Hello, I need to plot the means of some outcome for two groups (control vs intervention) over time (discrete) on the same plot, for various subsets such as gender and grade level. What I have been doing is creating all possible subsets first, using the aggregate function to create the means over time, then plotting the means over time (as a simple line plot with both control intervention on one plot) for one subset. I then use par() and repeat this plot for each gender x grade level subset so they all appear on one page. This appears to me to be very similar to an xyplot, something like mean(outcome) ~ gender + gradelevel. However, I can't figure out how I could get both control and intervention lines in the same plot. Any suggestions? What i'm doing now -works-, but just seems to be the long way around. -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple apply syntax
I know this is a simple question, but I have yet to master the apply statements. Any help would be appreciated. I have a column of probabilities and sample sizes, I would like to create a column of binomial random variables using those corresponding probabilities. Eg. mat = as.matrix(cbind(p=runif(10,0,1), n=rep(1:5))) p n [1,] 0.5093493 1 [2,] 0.4947375 2 [3,] 0.6753015 3 [4,] 0.8595729 4 [5,] 0.1004739 5 [6,] 0.6292883 1 [7,] 0.3752004 2 [8,] 0.6889157 3 [9,] 0.2435880 4 [10,] 0.9619128 5 I want to create mat$x as binomial(n, p) Thanks, Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting indexes
Hallo! I have a vector of ID's like so, id - c(1,2,2,3,3,3,4,5,5) I would like to create a [start,stop] pair of vectors that index the first and last observation per ID. For the ID list above, it would look like 1 1 2 3 4 6 7 7 8 9 I haven't worked with indexes/data manipulation much in R, so any pointers would be helpful. Many thanks! ~~~ -Robin Jeffries Dr.P.H. Candidate in Biostatistics UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting indexes
Awesome! Thanks:) On Tue, May 25, 2010 at 9:40 PM, Erik Iverson er...@ccbr.umn.edu wrote: Robin Jeffries wrote: Hallo! I have a vector of ID's like so, id - c(1,2,2,3,3,3,4,5,5) I would like to create a [start,stop] pair of vectors that index the first and last observation per ID. For the ID list above, it would look like 1 1 2 3 4 6 7 7 8 9 which(!duplicated(id)) [1] 1 2 4 7 8 cumsum(rle(id)$lengths) [1] 1 3 6 7 9 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sparse matrices in lme4
I read somewhere (help list, documentation) that the random effects in lme4 uses sparse matrix technology. I'd like to confirm with others that I can't use a sparse matrix as a fixed effect? I'm getting an Invalid type (S4) error. Thanks. ~~~ -Robin Jeffries Dr.P.H. Candidate in Biostatistics UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression with sparse matricies
I would like to run a logistic regression on some factor variables (main effects and eventually an interaction) that are very sparse. I have a moderately large dataset, ~100k observations with 1500 factor levels for one variable (x1) and 600 for another (X2), creating ~19000 levels for the interaction (X1:X2). I would like to take advantage of the sparseness in these factors to avoid using GLM. Actually glm is not an option given the size of the design matrix. I have looked through the Matrix package as well as other packages without much help. Is there some option, some modification of glm, some way that it will recognize a sparse matrix and avoid large matrix inversions? -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Indexing with sparse matrices (SparseM)
Hello, I'm working with a very large, very sparse X matrix. Let csr.X - * as.matrix.csr*(X) as described by the SparseM package. The documentation says that Indexing work just like they do on dense matrices. To me this says that I should be able to perform operations on the rows of csr.X in the same way I would on X itself. E.g. f - function(x){ for (i in 1:n){ u[i] - log(1+exp(t(X[i,])%*%beta)) } sm - sum(u) return(sm) } However, csr.X[i,] doesn't exist. Now I get how *as.matrix.csr* coerces X into an object with three arrays, two indexes and a list of the non-zero data. What I can't quite wrap my brain around is how I would go about using those indices to perform iterative operations on the rows of X, for example in my toy function above. I'm hoping that someone with more experience working with sparse matrices can provide a few suggestions or pointers? I'm not hooked on this package either, it was just the first one I came across via Rseek. Many thanks, -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Source.R file from cmd line
I want to set up a windows system task that will run a .R script at pre-specified times. Can someone please help with the command line syntax that I would assign to the task? I know that i can open a command prompt, type R, and then source the file, but I don't know how to pass multiple line arguments to the command line in a system task. Thanks, ~~~ -Robin Jeffries Dr.P.H. Candidate in Biostatistics UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obvious reason for not looping twice?
I do get the following error message: *Error in lookup.svc[i, j] - svc[svc$st == unique(svc$st)[i] svc$vc == : * * replacement has length zero* I also thought it might be because of how R treats NA, but then I would expect the loop to stop at the place of error (i=1, j=2) and not continue to fill out all of column i. I've tried using %in%, but that seems to do a non-positional check for whether or not entries are *somewhere *in those vectors. I need to find the location of where the match occurs. My goal is to turn this: st vc y A Z .2 B Z .4 B Y .3 C Y .1 C X .8 into a 2x2 table with entries 'y' vc Z Y X A .2 0 0 st B .4 .3 0 C 0 .1 .8 Right now it's giving me vc Z Y X A .2 0.000 0.000 st B 0 0 0 C 0 0 0 So it seems to finish out the row that it's currently on, but then won't continue to loop. -Robin On Sun, Apr 25, 2010 at 4:44 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Robin Do you get an error or warning? It may have something to do with how == treats NA: x - 1:4 x[x == 1] [1] 1 x - c(1:4, NA) x[x == 1] [1] 1 NA x[x %in% 1] [1] 1 If so, using %in% is one way to avoid the problem. However, I would have thought you'd get an error message if this were the case. HTH . Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Robin Jeffries Sent: Monday, 26 April 2010 10:26 a.m. To: r-help@r-project.org Subject: [R] Obvious reason for not looping twice? Is there an obvious reason why this won't loop to i=2 and beyond? There are many combinations of *st* *vc* that don't exist in svc. For example, when s=1 there's only an entry at v=1. That's fine, the entry can stay 0. lookup.svc - array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))), dimnames=list(unique(svc$st), unique(svc$vc))) for (i in 1:length(unique(svc$st))) { for (j in 1:length(unique(svc$vc))){ lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i] svc$vc == unique(svc$vc)[j], 4] }} Thanks, Robin ~~~ -Robin Jeffries Dr.P.H. Candidate UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obvious reason for not looping twice?
Seriously! That easy! I kept thinking that xtab would just give me frequencies of how many times the combination occurred, and not the values themselves. Thanks! -Robin On Mon, Apr 26, 2010 at 7:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Try this; xtabs(y ~ st + vc, data = x) On Mon, Apr 26, 2010 at 11:35 AM, Robin Jeffries rjeffr...@ucla.eduwrote: I do get the following error message: *Error in lookup.svc[i, j] - svc[svc$st == unique(svc$st)[i] svc$vc == : * * replacement has length zero* I also thought it might be because of how R treats NA, but then I would expect the loop to stop at the place of error (i=1, j=2) and not continue to fill out all of column i. I've tried using %in%, but that seems to do a non-positional check for whether or not entries are *somewhere *in those vectors. I need to find the location of where the match occurs. My goal is to turn this: st vc y A Z .2 B Z .4 B Y .3 C Y .1 C X .8 into a 2x2 table with entries 'y' vc Z Y X A .2 0 0 st B .4 .3 0 C 0 .1 .8 Right now it's giving me vc Z Y X A .2 0.000 0.000 st B 0 0 0 C 0 0 0 So it seems to finish out the row that it's currently on, but then won't continue to loop. -Robin On Sun, Apr 25, 2010 at 4:44 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Robin Do you get an error or warning? It may have something to do with how == treats NA: x - 1:4 x[x == 1] [1] 1 x - c(1:4, NA) x[x == 1] [1] 1 NA x[x %in% 1] [1] 1 If so, using %in% is one way to avoid the problem. However, I would have thought you'd get an error message if this were the case. HTH . Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Robin Jeffries Sent: Monday, 26 April 2010 10:26 a.m. To: r-help@r-project.org Subject: [R] Obvious reason for not looping twice? Is there an obvious reason why this won't loop to i=2 and beyond? There are many combinations of *st* *vc* that don't exist in svc. For example, when s=1 there's only an entry at v=1. That's fine, the entry can stay 0. lookup.svc - array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))), dimnames=list(unique(svc$st), unique(svc$vc))) for (i in 1:length(unique(svc$st))) { for (j in 1:length(unique(svc$vc))){ lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i] svc$vc == unique(svc$vc)[j], 4] }} Thanks, Robin ~~~ -Robin Jeffries Dr.P.H. Candidate UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Obvious reason for not looping twice?
Is there an obvious reason why this won't loop to i=2 and beyond? There are many combinations of *st* *vc* that don't exist in svc. For example, when s=1 there's only an entry at v=1. That's fine, the entry can stay 0. lookup.svc - array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))), dimnames=list(unique(svc$st), unique(svc$vc))) for (i in 1:length(unique(svc$st))) { for (j in 1:length(unique(svc$vc))){ lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i] svc$vc == unique(svc$vc)[j], 4] }} Thanks, Robin ~~~ -Robin Jeffries Dr.P.H. Candidate UCLA School of Public Health rjeffr...@ucla.edu 530-624-0428 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems completely reading in a large sized data set
I have been through the help file archives a number of times, and still cannot figure out what is wrong. I have a tab-delimited text file. 76Mb, so while it's large.. it's not -that- large. I'm running Win7 x64 w/4G RAM and R 2.10.1 When I open this data in Excel, i have 27 rows and 450932 rows, excluding the first row containing variable names. I am trying to get this into R as a dataset for analysis. z-Data/media1y.txt f=file(zz,'r') # open the file rl = readLines(f,1) # Read the first line colnames-strsplit(rl, '\t') p = length(colnames[[1]]) # counte the number of columns nobs-450932 close(f) Using: d1-matrix(scan(zz,skip=1,sep=\t,fill=TRUE,what=rep(character,p), nlines=nobs),ncol=p,nrow=nobs, byrow=TRUE, dimnames=list(NULL,colnames[[1]])) produces the error Read 5761719 items Warning message: In matrix(scan(zz, skip = 1, sep = \t, fill = TRUE, what = rep(character, : data length [5761719] is not a sub-multiple or multiple of the number of rows [10] Now, 5761719/27 = 213397. If I change nobs-213397 it reads in the file with no errors. It produces a matrix that I can work with from here. But the file obviously is not complete. At first I thought it might be reading the first x amount of rows. So I sorted by the first variable alphabetically in Excel before saving it as a txt file and reading it into R. head(d1) shows the correct first 6 rows, but when I ask for tail(d1) the entry for the first variable in the last row is [213397,] WSAH The 213397th row in Excel, starts with MM1 and the actual last row starts with YE. The WSA in question can be found on Excel row # 397548 That, confuses the heck out of me. There are no blank lines. Since there are 1000 categories for that first variable, i'm not going to manually match all of the frequencies, but the first 10 were exact, MM1 was correct, and the last few before WSA was also correct. WSA itself had 3001 observations in R, whereas Excel has 3093. That also makes it seem that R is stopping reading the table at some point. It shouldn't be a memory issue right? object.size(d1) 56328480 bytes memory.size(max=TRUE) [1] 444.06 memory.size(max=NA) [1] 3583.88 memory.size(max=FALSE) [1] 251.09 As a side question, i'm reading it all in as characters for now because when i tried to define a vector of column types wht -list(rep(character,7),0,logical,0,character)) to use in scan(), it still read everything in as character. I'm also not sure about the 's, I had to put them in to get list() to even accept that. Or c(). Any ideas with this? Thanks! -- Robin Jeffries Dr.P.H. Candidate Department of Biostatistics UCLA School of Public Health [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (Solved) Problems completely reading in a large sized data set
I'm not quite sure why, but reading in the *sorted* data (imported into Excel, sorted, written to a text file) worked perfectly fine with read.delim(). Thanks to those that replied! -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.