[R] reading a dense file of binary number
hi all, i have a file of the following format that i want to read into a matrix: 010101001110101 10101001010 01001010010 ... it has no headers or row names. I tried to use read.table(), but it doesn't allow me to specify nothing as the column separator (specifying sep='' means whitespace for that function). read.fwf doesn't seem appropriate either. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] background normalization in rma() in the affy package
Hi, i was looking into the documentation for the rma() function in affy() package, and was trying to figure out how exactly the background normalization is done. I read all three papers cited in the rma() documentation, but the most detailed explanation i could find was in Irizary et al., 2003, where they state that they compute B(PM_{ijn}) = E[s_{ijn} | PM_{ijn}] where s_{ijn} is assumed to be exponential, and bg_{ijn} is normal. I still don't understand what value is being computed here, neither am i clear on what the correction looks like. i.e. if s_{ijn} is an exponentially-distributed random variable, how is bg_{ijn} fit into this? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] p values in coxph()
Hi, I'm interested in building a Cox PH model for survival modeling, using 2 covariates (x1 and x2). x1 represents a 'baseline' covariate, whereas x2 represents a 'new' covariate, and my goal is to figure out where x2 adds significant predictive information over x1. Ideally, I could get a p-value for doing this. Originally, I thought of doing some kind of likelihood ratio test (LRT), where i measure the (partial) likelihood of the model with just x1, then with x1 and x2, then it becomes a LRT with 1 degree of freedom. But when i use the summary() function for coxph(), i get the following output (shown at the bottom). I have two questions: 1) What exactly are the p-values in the Pr(|z|) representing? I understand that the coefficients have standard errors, etc., but i'm not sure how the p-value there is calculated. 2) At the bottom, where it shows the results of an LRT with 2df, i don't quite understand what model the ratio is being tested against. If the current model has two variables (x1 and x2), and those are the extra degrees of freedom, then the baseline should then have 0 variables, but that's not really a Cox model? thanks for any help. Brian summary(coxph(Surv(myTime,Event)~x1+x2)) Call: coxph(formula = Surv(myTime, Event) ~ x1 + x2) n= 211 coef exp(coef) se(coef) z Pr(|z|) x1 0.03594 1.03660 0.17738 0.203 0.83942 x2 0.53829 1.71308 0.17775 3.028 0.00246 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 exp(coef) exp(-coef) lower .95 upper .95 x1 1.037 0.96470.7322 1.468 x2 1.713 0.58371.2091 2.427 Rsquare= 0.111 (max possible= 0.975 ) Likelihood ratio test= 21.95 on 2 df, p=1.714e-05 Wald test= 20.29 on 2 df, p=3.924e-05 Score (logrank) test = 22.46 on 2 df, p=1.328e-05 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R sign test for censored data
does anyone know a statistical test implemented in R that can do a sign test for difference of medians, except that can handle censored data? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function to compute pvalue for comparing two ROC curves
hi, I'm looking to compare two area under ROC values for different classifiers on the same data -- is there an r function to do this? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing hazard ratios
hi, I'm looking for a package to compare two hazard ratios (and assign statistical significance) obtained from two different predictive models. I know of the hr.comp2 function from the survcomp package, but was wondering if there's any other packages out there. thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Hi John, Thanks so much for the informative reply! I'm currently trying to optimize ~10,000 parameters simultaneously - for some reason, when I compare the memory usage for L-BFGS-B and BFGS, the L-BFGS-B only uses about 1/7 of the memory, with all default input parameters, I'm a bit surprised that it isn't a lot less, but BFGS is definitely converging a lot slower. My other question is that, L-BFGS-B is returning 'non-finite' errors with respect to the gradient function I'm supplying, because again, all the parameters i'm optimizing need to be non-negative (so i'm optimizing the log of the parameters), but the gradient at some point divides by each parameter, so when some of the parameters go to 0, the gradient becomes infinite. Do you (or anyone else) have any suggestions for how to prevent this? Is the only way to force the parameters to be larger than some number close to 0 (i.e. 1e-10), or modify the gradient function to set the entry of small parameters to 0? Thanks! Brian. On Fri, Feb 25, 2011 at 10:51 AM, Prof. John C Nash nas...@uottawa.cawrote: There are considerable differences between the algorithms. And BFGS is an unfortunate nomenclature, since there are so many variants that are VERY different. It was called variable metric in my book from which the code was derived, and that code was from Roger Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and more complicated algorithm with some pretty nice properties. The code is much larger. Re: less memory -- this will depend on the number of parameters, but to my knowledge there are no good benchmark studies of memory and performance. Perhaps someone wants to propose one for Google Summer of Code (see http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011 ). The optimx package can call Rvmmin which has box constraints (also Rcgmin that is intended for very low memory). Also several other methods with box constraints, including L-BFGS-B. Worth a try if you are seeking a method for multiple production runs. Unfortunately, we seem to have some CRAN check errors on Solaris and some old releases -- platforms I do not have -- so it may be a few days or more until we sort out the issues, which seem to be related to alignment of the underlying packages for which optimx is a wrapper. Use of transformation can be very effective. But again, I don't think there are good studies on whether use of box constraints or transformations is better and when. Another project, which I have made some tentative beginings to carry out. Collaborations welcome. Best, JN On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote: Message: 86 Date: Fri, 25 Feb 2011 00:11:59 -0500 From: Brian Tsai btsa...@gmail.com To: r-help@r-project.org Subject: [R] BFGS versus L-BFGS-B Message-ID: aanlktimszvkjbuhv-bbr1easpx9ootjxqcujgujr5...@mail.gmail.com Content-Type: text/plain Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? - Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BFGS versus L-BFGS-B
Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? - Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cv.glmnet errors
Hi, I am trying to do multinomial regression using the glmnet package, but the following gives me an error (for no reason apparent to me): library(glmnet) cv.glmnet(x=matrix(c(1,2,3,4,5,6,1,2,3,4,5,6), nrow=6),y=as.factor(c(1,2,1,2,3,3)),family='multinomial',alpha=0.5, nfolds=2) The error i get is: Error in if (outlist$msg != Unknown error) return(outlist) : argument is of length zero If i change the number of folds to 1, i get a seg fault: *** caught segfault *** address 0x0, cause 'memory not mapped' Traceback: 1: .Fortran(lognet, parm = alpha, nobs, nvars, nc, as.double(x), y, offset, jd, vp, ne, nx, nlam, flmin, ulam, thresh, isd, maxit, kopt, lmu = integer(1), a0 = double(nlam * nc), ca = double(nx * nlam * nc), ia = integer(nx), nin = integer(nlam), nulldev = double(1), dev = double(nlam), alm = double(nlam), nlp = integer(1), jerr = integer(1), PACKAGE = glmnet) 2: lognet(x, is.sparse, ix, jx, y, weights, offset, type, alpha, nobs, nvars, jd, vp, ne, nx, nlam, flmin, ulam, thresh, isd, vnames, maxit, HessianExact, family) 3: glmnet(x[!which, ], y_sub, lambda = lambda, offset = offset_sub, weights = weights[!which], ...) 4: cv.glmnet(x = matrix(c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), nrow = 6), y = as.factor(c(1, 2, 1, 2, 3, 3)), family = multinomial, alpha = 0.5, nfolds = 1) Possible actions: any ideas? Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logsumexp function in R
Hi, I was wondering if anyone has implemented a numerically stable function to compute log(sum(exp(x))) ? Thanks! Brian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 errorbarh
Hi, I'm having problems using the 'width' aesthetic attribute for the geom_errorbarh. This is the same problem reported earlier here, but I'll try to write the problem more clearly: http://www.mail-archive.com/r-help@r-project.org/msg62371.html The problem I'm having is that, the 'width' attribute is supposed to set the height of the endpoints of the whiskers, and respectively, the width of the endpoints of the whiskers in geom_errorbar. The width attribute works fine in geom_errorbar (sets width of whiskers fine), but it has no effect in geom_errorbarh (height of whiskers is the same no matter what value). Does anyone else still have this problem? Also, i'm not looking for the 'size' attribute, which seems to scale the line thickness of the entire errorbar. Thanks! Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 errorbarh
Hi Ben, Indeed this is what i wanted, thanks. height does make more sense, I guess I was just reading the ggplot2 documentation directly, which still refers to width. Brian On Sun, Dec 12, 2010 at 5:12 PM, Ben Bolker bbol...@gmail.com wrote: Brian Tsai btsai00 at gmail.com writes: Hi, I'm having problems using the 'width' aesthetic attribute for the geom_errorbarh. This is the same problem reported earlier here, but I'll try to write the problem more clearly: http://www.mail-archive.com/r-help at r-project.org/msg62371.html The problem I'm having is that, the 'width' attribute is supposed to set the height of the endpoints of the whiskers, and respectively, the width of the endpoints of the whiskers in geom_errorbar. The width attribute works fine in geom_errorbar (sets width of whiskers fine), but it has no effect in geom_errorbarh (height of whiskers is the same no matter what value). Does anyone else still have this problem? Looking at the answers in the previous thread, I think the commenters are trying to answer your question but be missing the point. I assume you have horizontal errorbars that look like this |--*---| and you want them to look like this: | | |--*---| | | In this case, it would be the height aesthetic rather than the width aesthetic that you would want to modify the length of the end-caps on the bar. If this is *not* what you want, it would help if you could somehow draw a picture of your desired graph (e.g. manipulate the graph you have in an image editor, or use ASCII art as above) ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drawing dot plots with size, shape affecting dot characteristics
Hi all, I'm interested in doing a dot plot where *both* the size and color (more specifically, shade of grey) change with the associated value. I've found examples online for ggplot2 where you can scale the size of the dot with a value: http://had.co.nz/ggplot2/graphics/6a053f23cf5bdfe5155ab53d345a5e0b.png Or scale the color with the value: http://had.co.nz/ggplot2/graphics/b17bf93530ff6695afb366e65677c17f.png both of which are from here: http://had.co.nz/ggplot2/geom_point.html but I've been playing around with ggplot2 and couldn't figure out how to do both at the same time - ideally i want size to increase with a value, and the shade of grey to get lighter with increasing value. Any help's appreciated, thanks! Brian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet - choosing the number of features
Hi, I am trying to use the glmnet package to do some simple feature selection. However, I would ideally like to be able to specify the number of features to return (the glmnet package, as far as I can tell, only allows specification of a regularization parameter, lambda, that in turn returns a model with a specific number of non-zero features). Is there a straightforward way of calculating the lambda value that will return a specific number of features? I realize there is a range of lambdas that should give a certain number of non-zero features, but is there an easy way of figuring out what this range is? Thanks! Brian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmpath crossvalidation
Hi all, I'm relatively new to using R, and have been trying to fit an L1 regularization path using coxpath from the glmpath library. I'm interested in using a cross validation framework, where I crossvalidate on a training set to select the lambda that achieves the lowest error, then use that value of lambda on the entire training set, before applying to a test set. This seems to entail somehow using cv.coxpath , inspecting the cv.error attribute, then using the corresponding lambda in coxpath. However, the lambda values in cv.coxpath are defined in terms of fractions (fraction of the largest value that lambda can be sensibly), whereas it doesn't seem like you can specify lambda with respect to its largest value in coxpath. Any ideas? Thanks! Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.