[R] questions about the output of gee and its summary
Dear R-helpers, I am using the package gee to run a marginal model. Here is the output. In my simulated data, both x and z are time-varying, so I include their interaction terms with time indicator (i.e. tind=0, if time 1, and 1 if time 2) The data is simulated, so the true parameter of z both at time 1 and time 2 is 5, which is very close from the model output for time 1, z = 5.0757760, and for time 2, z is 5.0757760-0.6379866 = ~5 model=gee(y~x+z+x*tind+z*tind, family=gaussian(link = identity), id=sid, corstr=exchangeable) Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27 running glm to get initial regression estimate (Intercept) x ztind x:tind z:tind 2.9342186 1.5002601 5.0757760 2.0846327 0.1869748 -0.6379866 However, when I use the summary command, the coefficients are changed. Am I missing anything here ?? summary(model) GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA gee S-function, version 4.13 modified 98/01/27 (1998) Model: Link: Identity Variance to Mean Relation: Gaussian Correlation Structure: Exchangeable Call: gee(formula = y ~ x + z + x * tind + z * tind, id = sid, family = gaussian(link = identity), corstr = exchangeable) Summary of Residuals: Min 1Q Median 3QMax -5.9273676 -2.0072725 -0.7169515 2.3709969 8.2377283 Coefficients: Estimate Naive S.E.Naive z Robust S.E. Robust z (Intercept) 4.1450504 0.331866699 12.490106 0.26416 15.661403 x 1.5155102 0.008479614 178.723972 0.006854627 221.093020 z 0.6463947 0.48094 5.815617 0.100379444 6.439513 tind1.5986872 0.163851622 9.756920 0.175947744 9.086148 x:tind 0.1434216 0.005288708 27.118450 0.005924767 24.207123 z:tind 4.2951055 0.168647198 25.467992 0.166776520 25.753658 Estimated Scale Parameter: 6.800334 Number of Iterations: 10 Any helps would be very much appreciated! Thank you, Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help!! warning:only the first element has been used
thanks a lot to u guys ! both works well! -- View this message in context: http://r.789695.n4.nabble.com/help-warning-only-the-first-element-has-been-used-tp3505837p3506459.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence intervals and polynomial fits
On May 7, 2011, at 16:15 , Ben Haller wrote: On May 6, 2011, at 4:27 PM, David Winsemius wrote: On May 6, 2011, at 4:16 PM, Ben Haller wrote: As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly correlated, for values close to zero. Not just for x close to zero: cor( (10:20)^2, (10:20)^3 ) [1] 0.9961938 cor( (100:200)^2, (100:200)^3 ) [1] 0.9966219 Wow, that's very interesting. Quite unexpected, for me. Food for thought. Thanks! Notice that because of the high correlations between the x^k, their parameter estimates will be correlated too. In practice, this means that the c.i. for the quartic term contains values for which you can compensate with the other coefficients and still have an acceptable fit to data. (Nothing strange about that; already in simple linear regression, you allow the intercept to change while varying the slope.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] venn diagramm
Dear Patrick, Thanks for your reply, I know its hectic to c such graphs, as they are difficult to interpret, but I got this interesting link from R http://www.oga-lab.net/RGM2/func.php?rd_id=gplots:venn, mentioned about venn diagram for 5 subsets. ## Example using a list of item names belonging to the ## specified group. ## ## construct some fake gene names.. oneName - function() paste(sample(LETTERS,5,replace=TRUE),collapse=) geneNames - replicate(1000, oneName()) ## GroupA - sample(geneNames, 400, replace=FALSE) GroupB - sample(geneNames, 750, replace=FALSE) GroupC - sample(geneNames, 250, replace=FALSE) GroupD - sample(geneNames, 300, replace=FALSE) input -list(GroupA,GroupB,GroupC,GroupD) input venn(input) But not sure how to give files, as in this example I think they are giving at terminal, I am not sure this is hard to understand for me. But I will try it with different sides and I hope I will get something of of it. But if you also extract some good plz let me know. I also got a online tool called venny to draw for 4 subsets. Thanks khush On Sat, May 7, 2011 at 10:28 PM, Breheny, Patrick patrick.breh...@uky.eduwrote: I've never actually used it with 5 subsets, but the 'venn' function in the gplots package claims to be able to make a Venn diagram with up to 5 subsets. --Patrick From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of khush [bioinfo.kh...@gmail.com] Sent: Saturday, May 07, 2011 9:54 AM To: r-help@r-project.org Subject: [R] venn diagramm Dear all, I have a set of five datasets with string, for which I need to draw the venn diagram. there are tools available to draw venn diagram online, but limited to three sets. I can also generate venn for three from limma package, but do not know how to proceed with five subsets. Help me in drawing the same any script . Thank you Khush [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] %in% operator - NOT IN
Hello everyone, I am attempting to use the %in% operator with the ! to produce a NOT IN type of operation. Why does this not work? Suggestions? data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1) data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000 Error: unexpected '!' in data2[data1$char1 ! Thanks! Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] %in% operator - NOT IN
G'day Dan, On Sun, 8 May 2011 05:06:27 -0400 Dan Abner dan.abne...@gmail.com wrote: Hello everyone, I am attempting to use the %in% operator with the ! to produce a NOT IN type of operation. Why does this not work? Suggestions? data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1) data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000 Error: unexpected '!' in data2[data1$char1 ! Try (untested) R data2[!(data1$char1 %in% c(string1,string2)),1]-max(data1$x1)+1000 HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] %in% operator - NOT IN
Hi, On 8 May 2011 21:18, Berwin A Turlach berwin.turl...@gmail.com wrote: G'day Dan, On Sun, 8 May 2011 05:06:27 -0400 Dan Abner dan.abne...@gmail.com wrote: Hello everyone, I am attempting to use the %in% operator with the ! to produce a NOT IN type of operation. Why does this not work? Suggestions? Alternatively, example(`%in%`) or `%ni%` = Negate(`%in%`) HTH, baptiste data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1) data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000 Error: unexpected '!' in data2[data1$char1 ! Try (untested) R data2[!(data1$char1 %in% c(string1,string2)),1]-max(data1$x1)+1000 HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about val.surv in R
Dear R users: I tried to use val.surv to give an internal validation of survival prediction model. I used the sample sources. # Generate failure times from an exponential distribution set.seed(123) # so can reproduce results n - 1000 age - 50 + 12*rnorm(n) sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) t - -log(runif(n))/h units(t) - 'Year' label(t) - 'Time to Event' ev - ifelse(t = cens, 1, 0) t - pmin(t, cens) S - Surv(t, ev) # I use cph instead of psm in the example f - cph(S ~ age + sex, y=TRUE) w - val.surv(f) I got an error: Error in survreg.distributions[[fit$dist]] : attempt to select less than one element Could some one explain for me? Error in survreg.distributions[[fit$dist]] : attempt to select less than one element Error in survreg.distributions[[fit$dist]] : attempt to select less than one element *Yao Zhu* *Department of Urology Fudan University Shanghai Cancer Center Shanghai, China* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] %in% operator - NOT IN
On 08-May-11 09:18:55, Berwin A Turlach wrote: G'day Dan, On Sun, 8 May 2011 05:06:27 -0400 Dan Abner dan.abne...@gmail.com wrote: Hello everyone, I am attempting to use the %in% operator with the ! to produce a NOT IN type of operation. Why does this not work? Suggestions? data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1) data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000 Error: unexpected '!' in data2[data1$char1 ! Try (untested) R data2[!(data1$char1 %in% c(string1,string2)),1]-max(data1$x1)+1000 HTH. Cheers, Berwin Berwin's suggestion should work -- it is the general way to negate the result of an %in. As to Why does this not work?, the point to note is that %in% is a binary operator. If you enter ?%in% you will be taken to the help page for match, where it is pointed out that: ?%in%? is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand. Specifically, therefore, the syntax of %in% requires X %in% Y where X and Y are objects to which the functional definition of %in% applies (see the same help page): '%in%' is currently defined as '%in% - function(x, table) match(x, table, nomatch = 0) 0' In your expression (effectively X ! %in% Y) the item which immediately precedes %in% is the !, and this is not a valid item! Based on the above functional definition, you could define your own binary operator %!in% as %!in% - function(x,table) match(x,table, nomatch = 0) == 0 or similar -- I have not tested this so cannot guarantee it! However, it is the way to proceed if you want a NOT IN. Then the usage could be: data2[data1$char1 %!in% c(string1,string2),1]-max(data1$x1)+1000 Hoping ths helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 08-May-11 Time: 10:35:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split character vector by multiple keywords simultaneously
Andrew Robinson-6 wrote: A hack would be to use gsub() to prepend e.g. XXX to the keywords that you want, perform a strsplit() to break the lines into component strings, and then substr() to extract the pieces that you want from those strings. Cheers Andrew Thanks, that got me started. I am sure there are much easier ways of doing this, but in case someone comes looking, here's my solution: keywordlist - c(Company name:, General manager:, Manager:) # Attach XXX to the beginning of each keyword: for (i in 1:length(keywordlist)) { temp - gsub(keywordlist[i],paste(XXX,keywordlist[i],sep=),temp) } # Split each row into a list: temp - strsplit(temp,XXX) # Eliminate empty elements: temp - lapply(temp, function(x) x[which(x!='')]) # Since each keyword happens to include a colon at the end, split each list element generated above into exactly two parts, pre-colon for the keyword and post-colon for the value. Since values may contain colons themselves, avoid spurious matches by using n=2 in str_split_fixed function from stringr package: library(stringr) temp - lapply(temp,function(x) str_split_fixed(x,':',n=2)) # Convert each list element into a data frame. The transpose makes sure that the first row of each data frame is the set of keywords. Each data frame has 2 rows - one with the keywords and the second with the values: temp - lapply(temp, function(x) replace(as.data.frame(t(x)),,t(x))) # Copy the first row of each data frame to the name of the corresponding column: for (i in 1:length(temp)) { names(temp[[i]]) - as.character(temp[[i]][1,]) } # Now join all the data frames in the list by column names. This way it doesn't matter if some keywords are absent in some cases: final_data - do.call(rbind.fill,temp) # We now have one large data frame with the odd numbered rows containing the keywords and the even numbered rows containing the values. Since we already have the keywords in the name, we can eliminate the odd numbered rows: final_data - final_data[seq(2,dim(final_data)[1],2),] -S. -- View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3506776.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.table vs. read.table and the argument fill
On May 7, 2011, at 10:38 AM, Carl Witthoft wrote: Just wondering how come read.table lets you specify fill=TRUE for ragged arrays, but so far as I can tell, no equivalent for write.table? I imagine the answer is something along the lines of read.table creates a rectangular structure and write.table outputs a rectangular structure, so they are congruent. If I wanted to accomplish writing a ragged list I would consider using lapply , paste, and writeLines. Not a big deal, since I'm perfectly comfortable w/ write and scan and the other file I/O goodies. A foolish inconsistency... and all that. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] %in% operator - NOT IN
At 19:06 08/05/2011, you wrote: Hello everyone, I am attempting to use the %in% operator with the ! to produce a NOT IN type of operation. Why does this not work? Suggestions? data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1) data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000 Error: unexpected '!' in data2[data1$char1 ! Thanks! Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi Dan See the last of the examples for ?match %w/o% - function(x, y) x[!x %in% y] #-- x without y (1:10) %w/o% c(3,7,12) I think it was Peter Dalgaard who pointed this out some years ago HTH Regards Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mac...@northnet.com.au __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.table vs. read.table and the argument fill
Well, it could be a list variable. foo- 1:7 bar-1:9 rab-list(foo,bar) I suppose I could do something like oof-rbind(foo,bar) write.table(oof) #ignore the warnings and then ignore or delete the redundant items in the output file. On 5/8/11 1:51 AM, Joshua Wiley wrote: Hi Carl, What would the equivalent argument for write.table do? Or perhaps to rephrase my question what type of R object do you have in mind to write that is a ragged array? Josh On Sat, May 7, 2011 at 7:38 AM, Carl Witthoftc...@witthoft.com wrote: Just wondering how come read.table lets you specify fill=TRUE for ragged arrays, but so far as I can tell, no equivalent for write.table? Not a big deal, since I'm perfectly comfortable w/ write and scan and the other file I/O goodies. A foolish inconsistency... and all that. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate the mean of a group in a table
I see you already have three solutions but, just for the heck of it, here's another. I am trying to get familiar with the reshape2 package and your question was a good exercise for me. With your data set named xx: library(reshape2) yy - melt(xx, id=c(period, treatment, session, type,stage), measured=c(wage_accepted)) dcast(yy, period + stage ~ variable, mean) As for your request for tutorials for loops and if-else materials, I cannot make any better suggestions than you already have. However, in R, if you are thinking of using these it usually means you are not thinking properly in R. There are usually faster and better was of achieving the desired results in R. Sometimes a loop or if-else is appropriate but usually in many fewer instances than in most other languages. By the way type is a reserved(? name in R. Not a problem here but it's safer not to use reserved words in R. --- On Sat, 5/7/11, tornanddesperate matthiasn...@gmx.ch wrote: From: tornanddesperate matthiasn...@gmx.ch Subject: [R] how to calculate the mean of a group in a table To: r-help@r-project.org Received: Saturday, May 7, 2011, 2:24 PM Hi its me again I don't mean to get on your nerves, but the use of R proofs to be a bit more complicated than envisaged. I would like to calculate the mean of a group of values, here wage_accepted. The group is determined by the stage and period, so in the end there should be a column with the values of the wages in period 1 stage1, period 1 stage 2, period 2 stage1... Unfortunately, I haven't much of a clue on how to program this. Could you tell me how the function should roughly look like – if-else, loops included or not – in the end ? treatment session period stage wage_accepted type 1 1 1 1 1 25 low 2 1 1 1 1 19 low 3 1 1 1 1 15 low 4 1 1 1 2 32 high 5 1 1 1 2 13 low 6 1 1 1 2 14 low 7 1 1 2 1 17 low 8 1 1 2 1 12 low I am also very grateful if someone could point out a good internet tutorial resource for R functions such as if-else and loops and how to use them in conjunction with tables. It should contain simple examples if possible. As ever indepted to you, Matthias -- View this message in context: http://r.789695.n4.nabble.com/how-to-calculate-the-mean-of-a-group-in-a-table-tp3505986p3505986.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about val.surv in R
Please specify the package(s) you are using. In this case it should be rms. val.surv is mainly for an out-of-sample validation, as it does not penalize for overfitting. calibrate.cph is probably what you should be using. To use val.surv in the fashion you are trying to use it, specify y=TRUE, surv=TRUE to cph and specify the time point you are validating with the u argument to val.surv (e.g., u=1 for 1 year survival probability validation). Frank yz wrote: Dear R users: I tried to use val.surv to give an internal validation of survival prediction model. I used the sample sources. # Generate failure times from an exponential distribution set.seed(123) # so can reproduce results n - 1000 age - 50 + 12*rnorm(n) sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) t - -log(runif(n))/h units(t) - 'Year' label(t) - 'Time to Event' ev - ifelse(t = cens, 1, 0) t - pmin(t, cens) S - Surv(t, ev) # I use cph instead of psm in the example f - cph(S ~ age + sex, y=TRUE) w - val.surv(f) I got an error: Error in survreg.distributions[[fit$dist]] : attempt to select less than one element Could some one explain for me? Error in survreg.distributions[[fit$dist]] : attempt to select less than one element Error in survreg.distributions[[fit$dist]] : attempt to select less than one element *Yao Zhu* *Department of Urology Fudan University Shanghai Cancer Center Shanghai, China* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/question-about-val-surv-in-R-tp350p3507015.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting confidence bands from predict.nls
Much quicker than asking for help on the list is to read the help file (which you have been asked to do in the posting guide you hopefully read). ?predict.nls tells us: interval A character string indicating if prediction intervals or a confidence interval on the mean responses are to be calculated. At present this argument is ignored. Best, Uwe Ligges On 07.05.2011 06:17, Penny Bilton wrote: I am trying to find a confidence band for a fitted non-linear curve. I see that the predict.nls function has an interval argument, but a previous post indicates that this argument has not been implemented. Is this still true? I have tried various ways to extract the interval information from the model object without success. My code is: Model.predict - predict(My.nls.model, se.fit=TRUE, interval = confidence, level = 0.95) , where My.nls.model is an nls object, I was able to extract the predictions okay. Thank you for your help. Penny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting confidence bands from predict.nls
On Sat, May 7, 2011 at 12:17 AM, Penny Bilton pennybil...@xnet.co.nz wrote: I am trying to find a confidence band for a fitted non-linear curve. I see that the predict.nls function has an interval argument, but a previous post indicates that this argument has not been implemented. Is this still true? I have tried various ways to extract the interval information from the model object without success. My code is: Model.predict - predict(My.nls.model, se.fit=TRUE, interval = confidence, level = 0.95) , where My.nls.model is an nls object, I was able to extract the predictions okay. You can get these intervals using nls2. The as.lm function has an nls method which returns the lm model tangent to an nls model and use can use predict.lm on that. library(nls2) fm - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD) predict(as.lm(fm), interval = confidence) fit lwr upr 1 7.887451 3.701701 12.07320 2 12.524979 8.219483 16.83047 3 15.251674 11.813306 18.69004 4 16.854870 13.668094 20.04164 5 17.797489 14.026668 21.56831 6 18.677578 13.393630 23.96153 predict(as.lm(fm), interval = prediction) fitlwr upr 1 7.887451 -0.3349547 16.10986 2 12.524979 4.2409738 20.80898 3 15.251674 7.3833942 23.11995 4 16.854870 9.0932340 24.61651 5 17.797489 9.7783530 25.81663 6 18.677578 9.8453897 27.50977 Warning message: In predict.lm(as.lm(fm), interval = prediction) : Predictions on current data refer to _future_ responses -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting confidence bands from predict.nls
Le 07/05/2011 06:17, Penny Bilton a écrit : I am trying to find a confidence band for a fitted non-linear curve. I see that the predict.nls function has an interval argument, but a previous post indicates that this argument has not been implemented. Is this still true? I have tried various ways to extract the interval information from the model object without success. My code is: Model.predict - predict(My.nls.model, se.fit=TRUE, interval = confidence, level = 0.95) , where My.nls.model is an nls object, I was able to extract the predictions okay. Thank you for your help. Penny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You can use the bootstrap methodology using the predicted values assessed through the N replications. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rearranging variables in table in non-alphabetical (manually specified) order
Dear all, I'm trying to rearrange variables in a table in a custum order for using it with levelplot. So far I could only find examples showing how to sort alphabetically. Here is a short example: a - c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz) b - c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel) food - matrix(c(a,b),7) as.data.frame(food) - tmp as.data.frame(table(tmp)) - X levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food) Food is now ordered: Pommes, Schnitzel, Wurst. But I need: Schnitzel, Pommes, Wurst. How can I define the order? I'm happy about every suggestion. Thanks in advance! Andre ___ Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Syntax for iter.max in rms
Hello, I would like to increase the number of iterations for running a Buckley-James regression model in the rms package, but something is apparently syntactically wrong. The following code initially is exactly as it appears in the help page (which runs successfully), then a failure to converge message (resulting from specifying an 'identity' link argument, the error message by adding both the iter.max control and 'identity' link arguments, and finally, the error message when testing just an iter.max argument. This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04. Thank you in advance, and all insights and criticisms are appreciated. Mike library(rms) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, y=TRUE) No convergence in 50 steps Failure in bj.fit f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link = identity, : unused argument(s) (iter.max = 200) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max = 200, : unused argument(s) (iter.max = 200) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence intervals and polynomial fits
From: pda...@gmail.com Date: Sun, 8 May 2011 09:33:23 +0200 To: rh...@sticksoftware.com CC: r-help@r-project.org Subject: Re: [R] Confidence intervals and polynomial fits On May 7, 2011, at 16:15 , Ben Haller wrote: On May 6, 2011, at 4:27 PM, David Winsemius wrote: On May 6, 2011, at 4:16 PM, Ben Haller wrote: As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly correlated, for values close to zero. Not just for x close to zero: cor( (10:20)^2, (10:20)^3 ) [1] 0.9961938 cor( (100:200)^2, (100:200)^3 ) [1] 0.9966219 Wow, that's very interesting. Quite unexpected, for me. Food for thought. Thanks! Notice that because of the high correlations between the x^k, their parameter estimates will be correlated too. In practice, this means that the c.i. for the quartic term contains values for which you can compensate with the other coefficients and still have an acceptable fit to data. (Nothing strange about that; already in simple linear regression, you allow the intercept to change while varying the slope.) I was trying to compose a longer message but at least for even/odd it isn't hard to find a set of values for which cor is zero or to find a set of points that make sines of different frequencies have non-zero correlation- this highlights the fact that the computer isn't magic and it needs data to make basis functions different from each other. For background, you probably want to look up Taylor Series and Orthogonal Basis. I would also suggest using R to add noise to your input and see what that does to your predictions or for that matter take simple known data and add noise although I think in principal you can do this analytically. You can always project a signal onto some subspace and get an estimate of how good your estimate is, but that is different from asking how well you can reconstruct your signal from a bunch of projections. If you want to know, what can I infer about the slope of my thing at x=a that is a specific question about one coefficient but at this point statisticians can elaborate about various issues with the other things you ignore. Also, I think you said something about correclated at x=0 but you can change your origin, (x-a)^n and expand this in a finite series in x^m to see what happens here. Also, if you are using hotmail don't think that a dot product is not html LOL since hotmail know you must mean html when you use less than even in text email... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rearranging variables in table in non-alphabetical (manually specified) order
On 11-05-08 11:10 AM, André Júng wrote: Dear all, I'm trying to rearrange variables in a table in a custum order for using it with levelplot. So far I could only find examples showing how to sort alphabetically. Here is a short example: a- c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz) b- c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel) food- matrix(c(a,b),7) as.data.frame(food) - tmp as.data.frame(table(tmp)) - X levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food) Food is now ordered: Pommes, Schnitzel, Wurst. But I need: Schnitzel, Pommes, Wurst. How can I define the order? I'm happy about every suggestion. The general answer to that question is to make b into a factor with the levels in a specified order, e.g. by b - factor(b, levels=c(Schnitzel, Pommes, Wurst)) However, this doesn't survive the other strange things you are doing. If you simplify the rest, it should be fine. For example: food - data.frame(a,b) X - as.data.frame(table(food)) levelplot(Freq ~ a * b, data= X) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rearranging variables in table in non-alphabetical (manually specified) order
On 08.05.2011 17:10, André Júng wrote: Dear all, I'm trying to rearrange variables in a table in a custum order for using it with levelplot. So far I could only find examples showing how to sort alphabetically. Here is a short example: a- c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz) b- c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel) Hmmm, why don't you ask your supervisor about your homework? I wonder who your supervisor is. Anyway, some hint: Replace b by a factor with levels in the desired order. food- matrix(c(a,b),7) as.data.frame(food) - tmp Please replace the two lines above by something like food - data.frame(names=a, food=b) beacuse 1. your matrix will destroy the factor again and is not required at all, and 2. never ever use - which makes your code almost unreadable. Uwe Ligges as.data.frame(table(tmp)) - X levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food) Food is now ordered: Pommes, Schnitzel, Wurst. But I need: Schnitzel, Pommes, Wurst. How can I define the order? I'm happy about every suggestion. Thanks in advance! Andre ___ Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Syntax for iter.max in rms
The option `iter.max' should be an element of the Control list. If you read the help file carefully, you would have noticed this. So, try this: f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', control=list(iter.max=200), x=TRUE, y=TRUE) Identity link is challenging to fit computationally. Try setting `trave=TRUE' to see the convergence of the parameters. f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', control=list(iter.max=200, trace=TRUE), x=TRUE, y=TRUE) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Mike Harwood [harwood...@gmail.com] Sent: Sunday, May 08, 2011 9:27 AM To: r-help@r-project.org Subject: [R] Syntax for iter.max in rms Hello, I would like to increase the number of iterations for running a Buckley-James regression model in the rms package, but something is apparently syntactically wrong. The following code initially is exactly as it appears in the help page (which runs successfully), then a failure to converge message (resulting from specifying an 'identity' link argument, the error message by adding both the iter.max control and 'identity' link arguments, and finally, the error message when testing just an iter.max argument. This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04. Thank you in advance, and all insights and criticisms are appreciated. Mike library(rms) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, y=TRUE) No convergence in 50 steps Failure in bj.fit f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link = identity, : unused argument(s) (iter.max = 200) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max = 200, : unused argument(s) (iter.max = 200) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted box or violin plots in Lattice?
On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote: Is it possible to create weighted boxplots or violin plots in lattice? It seems that you can specify weights for panel.histogram() and panel.densityplot(), but not for panel.bwplot or panel.violin(). Not for panel.histogram() either. It's not immediately obvious how you would get weighted boxplots. For weighted violin plots, it should not be too difficult to write your own panel function generalizing panel.violin(); just pass in 'weights' and 'subscripts', and get the per-panel weights inside the panel function as weights[subscripts] (as in panel.densityplot). The only difference is that densityplot() implements non-standard evaluation for 'weights', which bwplot() does not, so you have to specify 'weights' explicitly. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cumsum in Lattice Panel Function
On Fri, May 6, 2011 at 9:24 PM, Elliot Joel Bernstein elliot.bernst...@fdopartners.com wrote: I'm trying to create an xyplot with a groups argument where the y-variable is the cumsum of the values stored in the input data frame. I almost have it, but I can't get it to automatically adjust the y-axis scale. How do I get the y-axis to automatically scale as it would have if the cumsum values had been stored in the data frame? Here is the code I have so far: require(lattice) dates - seq(as.Date(2011-01-01), as.Date(2011-04-30), days) g - 1:3 dat - data.frame(date = rep(dates, length(g)), group = rep(g, each = length(dates)), value = rnorm(length(dates)*length(g)) + 0.05) xyplot(value ~ date, data = dat, group = group, type = 'l', grid = TRUE, panel = panel.superpose, panel.groups = function(x, y, ...) { panel.xyplot(x, cumsum(y), ...) }) I want the result to look the same as if I had done dat$cumvalue - with(dat, unsplit(lapply(split(value, group), cumsum), group)) xyplot(cumvalue ~ date, data = dat, group = group, type = 'l', grid = TRUE) You need something along the lines of xyplot(value ~ date, data = dat, group = group, type = 'l', grid = TRUE, panel = panel.superpose, panel.groups = function(x, y, ...) { panel.xyplot(x, cumsum(y), ...) }, prepanel = function(x, y, groups, ...) { yy - unlist(tapply(y, groups, cumsum)) list(ylim = range(yy, finite = TRUE)) }) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to alter circle size
On Fri, May 6, 2011 at 10:21 PM, Dat Mai dat.d@gmail.com wrote: Hello all, I'm trying to create a heatmap using 2 matrices I have: z and v. Both matrices represent different correlations for the same independent variables. The problem I have is that I wish to have the values from matrix z to be represented by color intensity while having the values from matrix v to be represented by circle size. I currently have the following in front of me and an unsure of what to add or change in order to achieve that goal. panel.corrgram.2 = function(x, y, z, subscripts, at = pretty(z), scale = 0.8, ...) { require(grid, quietly = TRUE) x - as.numeric(x)[subscripts] y - as.numeric(y)[subscripts] z - as.numeric(z)[subscripts] zcol - level.colors(z, at = at, ...) for (i in seq(along = z)) { lims - range(0, z[i]) tval - 2 * base::pi * seq(from = lims[1], to = lims[2], by = 0.01) grid.polygon(x = x[i] + .5 * scale * c(0, sin(tval)), y = y[i] + .5 * scale * c(0, cos(tval)), default.units = native, gp = gpar(fill = zcol[i])) grid.circle(x = x[i], y = y[i], r = .5 * scale, default.units = native) } } Probably this modification panel.corrgram.2 - function(x, y, z, v, subscripts, at = pretty(v), scale = 0.8, ...) { require(grid, quietly = TRUE) x - as.numeric(x)[subscripts] y - as.numeric(y)[subscripts] z - as.numeric(z)[subscripts] zcol - level.colors(v, at = at, ...) Rest unchanged followed by levelplot(signif(z,3), v = v, panel = panel.corrgram.2, ...) will do it. -Deepayan k=levelplot(signif(z,3), scales = list(x = list(rot = 45)), col.regions=col.sch, panel = panel.corrgram.2, label = num.circ, xlab=, ylab=, main=paste(output,receptor response)) print(k) -- Best, Dat Mai PhD Rotation Student Albert Einstein College of Medicine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package update
I tried to update my packages using update.packages() I got the following message: The downloaded packages are in ‘/tmp/RtmpyDYdTX/downloaded_packages’ Warning in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : 'lib = /usr/lib/R/library' is not writable Error in install.packages(update[instlib == l, Package], l, contriburl = contriburl, : unable to install package How do I fix this ? -- View this message in context: http://r.789695.n4.nabble.com/package-update-tp3507479p3507479.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new to loops
I have never made a loop on my own to do anything in R. But I am hoping someone can help me build one for the following issue: I need to make a univariate logistic regression for each of my variables (about 62 of them), then I need to gather up each of their coefficients (not the intercepts), each of their 95% confidence intervals, and each of thier odds ratios and place them in a matrix to showcase them for my thesis. currently, I am writing them all out one by one with the cbond method, which has taken me a better part of a day so far and I know there has to be able to be a way to make a loop that can do this whole process, I just havent been able to figure it out yet. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with mysql and R: partitioning by quintile
Hi, I have a mysql table with fields userid,track,frequency e.g u1,1,10 u1,2,100 u1,3,110 u1,4,200 u1,5,120 u1,6,130 . u2,1,23 . . where frequency is the number of times a music track is played by a userid I need to turn my 'frequency' table into a rating table (it's for a recommender system). So, for each user, I need to categorise the frequency of tracks played by quintile so that each particular track can have 5 ratings (1-5), with the ratings allocated as follows: inter-quintile range 100-80% = rating 5, inter-quintile range 80-60% = rating 4, ..., inter-quintile range 20-0% = rating 1) Hence, I need to create a table with fields userid,track,rating: u1,1,1 u1,2, 3 ... Can anybody help me to do this with R? Regards Gawesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Another quantmod question
I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) close zero close.1 2007-01-03 1417.600 1416.60 2007-01-04 1419.340 1418.34 2007-01-05 1410.710 1409.71 2007-01-08 1413.840 1412.84 2007-01-09 1413.110 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close = close+1) close zero close.1 2007-01-03 1416.600 1417.60 2007-01-04 1418.340 1419.34 2007-01-05 1409.710 1410.71 2007-01-08 1412.840 1413.84 2007-01-09 1412.110 1413.11 Now the names on the first two columns are ok, but the third column is still wrong. Again, why is that? Apparently it's not letting me assign a name to a column that comes from something that already has a name. Is that the way it should be? I don't get that same problem on a simpler example. IX - cbind(I=0, X=(1:3)) IX I X [1,] 0 1 [2,] 0 2 [3,] 0 3 cbind(Y = 1, Z = IX[, I], W = IX[, X]) Y Z W [1,] 1 0 1 [2,] 1 0 2 [3,] 1 0 3 Is this a peculiarity to xts objects? Thanks. *-- Russ * * * P.S. Once again I feel frustrated because it's taken me far more time than it deserves to track down and characterize this problem. I can fix it by using the names function. But I shouldn't have to do that. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
On May 8, 2011, at 3:07 PM, Russ Abbott wrote: I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) I suspect that you are actually using xts objects that you are incorrectly calling 'array's. If something is puzzling about the behavior of an R object the first thing to do is see what you are really dealing with so ... str(object) If you load the xts package and type ?cbind.xts , you get a help page for merge.xts. (In base R I do not know of a way to assign columns the way you propose within a `merge` call.) Here is the code for cbind.xts: cbind.xts function (..., all = TRUE, fill = NA, suffixes = NULL) { merge.xts(..., all = all, fill = fill, suffixes = suffixes) } environment: namespace:xts close zero close.1 2007-01-03 1417.600 1416.60 2007-01-04 1419.340 1418.34 2007-01-05 1410.710 1409.71 2007-01-08 1413.840 1412.84 2007-01-09 1413.110 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close = close+1) close zero close.1 2007-01-03 1416.600 1417.60 2007-01-04 1418.340 1419.34 2007-01-05 1409.710 1410.71 2007-01-08 1412.840 1413.84 2007-01-09 1412.110 1413.11 Now the names on the first two columns are ok, but the third column is still wrong. Again, why is that? Apparently it's not letting me assign a name to a column that comes from something that already has a name. Is that the way it should be? I don't get that same problem on a simpler example. IX - cbind(I=0, X=(1:3)) IX I X [1,] 0 1 [2,] 0 2 [3,] 0 3 cbind(Y = 1, Z = IX[, I], W = IX[, X]) Y Z W [1,] 1 0 1 [2,] 1 0 2 [3,] 1 0 3 Is this a peculiarity to xts objects? Thanks. *-- Russ * * * P.S. Once again I feel frustrated because it's taken me far more time than it deserves to track down and characterize this problem. I can fix it by using the names function. But I shouldn't have to do that. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Hi Russ, On Sun, May 8, 2011 at 12:07 PM, Russ Abbott russ.abb...@gmail.com wrote: I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 It would be appreciated in the future if you provided the object via dput() or some such that is easy to paste in. When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) close zero close.1 2007-01-03 1417.60 0 1416.60 2007-01-04 1419.34 0 1418.34 2007-01-05 1410.71 0 1409.71 2007-01-08 1413.84 0 1412.84 2007-01-09 1413.11 0 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? Yes. mat - matrix(1:10, dimnames = list(NULL, A)) cbind(X = 11:20, Y = mat + 1) cbind(X = 11:20, Y = mat[, A] + 1) In particular note that: class(mat) class(mat[, A]) It is silly to expect to be able to pass a single name for an entire matrix, while it makes complete sense that that would work for a vector. The problem with naming the column the same thing as the object containing it, is our limited human minds get ramfeezled (R does just fine, as you can see). If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close = close+1) close zero close.1 2007-01-03 1416.60 0 1417.60 2007-01-04 1418.34 0 1419.34 2007-01-05 1409.71 0 1410.71 2007-01-08 1412.84 0 1413.84 2007-01-09 1412.11 0 1413.11 Now the names on the first two columns are ok, but the third column is still wrong. Again, why is that? Apparently it's not letting me assign a name to a column that comes from something that already has a name. Is that the way it should be? I don't get that same problem on a simpler example. IX - cbind(I=0, X=(1:3)) IX I X [1,] 0 1 [2,] 0 2 [3,] 0 3 cbind(Y = 1, Z = IX[, I], W = IX[, X]) Y Z W [1,] 1 0 1 [2,] 1 0 2 [3,] 1 0 3 Is this a peculiarity to xts objects? Nope, but do check the type of object you are working with---there are often different methods for different objects so I would not assumet that all things would work the same. Cheers, Josh Thanks. *-- Russ * * * P.S. Once again I feel frustrated because it's taken me far more time than it deserves to track down and characterize this problem. I can fix it by using the names function. But I shouldn't have to do that. [[alternative HTML version deleted]] Please post using plain text. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
I understand Josh's example: mat - matrix(1:10, dimnames = list(NULL, A)) cbind(X = 11:20, Y = mat + 1) cbind(X = 11:20, Y = mat[, A] + 1) In the line, cbind(X = 11:20, Y = mat + 1), it would be nice if an error or warning message were issued to the effect that the Y = part is ignored or not applicable or something to that effect. I still don't understand why cbind.xts doesn't do what I expect. If I try what Josh suggests on my xts example, the result is still confusing. cbind.xts(close, A = close[,close]) close close.1 2007-01-03 1416.60 1416.60 2007-01-04 1418.34 1418.34 2007-01-05 1409.71 1409.71 2007-01-08 1412.84 1412.84 2007-01-09 1412.11 1412.11 If this helps: * * * dput(close) * * structure(c(1416.6, 1418.34, 1409.71, 1412.84, 1412.11), .indexCLASS = Date, .indexTZ = structure(, .Names = TZ), src = yahoo, updated = structure(1304875186.625, class = c(POSIXct, * * POSIXt)), class = c(xts, zoo), index = structure(c(1167811200, * * 1167897600, 1167984000, 1168243200, 1168329600), tzone = structure(, .Names = TZ), tclass = Date), .Dim = c(5L, * * 1L), .Dimnames = list(NULL, close)) * * * I was also unable to convert Josh's matrix example to xts. as.xts(mat, order.by = rownames()) Error in dimnames(x) : 'x' is missing * * *The order.by parameter is required, but I don't understand the error message and haven't been able to figure out how to fix it. If this helps:* * * * dput(mat)structure(1:10, .Dim = c(10L, 1L), .Dimnames = list(NULL, A)) * * * *This suggests that mat already has defined dimnames. So why does as.xts give an error message about dimnames? I was unable to figure out the problem by looking at any of the help files. In particular the help file for **as.xts.methods didn't seem to offer any clarifications--at least none that I could understand.* * * * * *-- Russ * On Sun, May 8, 2011 at 12:26 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: class(mat) class(mat[, A]) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about the output of gee and its summary
On Sun, May 8, 2011 at 6:36 PM, Carrie Li carrieands...@gmail.com wrote: Dear R-helpers, I am using the package gee to run a marginal model. Here is the output. In my simulated data, both x and z are time-varying, so I include their interaction terms with time indicator (i.e. tind=0, if time 1, and 1 if time 2) The data is simulated, so the true parameter of z both at time 1 and time 2 is 5, which is very close from the model output for time 1, z = 5.0757760, and for time 2, z is 5.0757760-0.6379866 = ~5 model=gee(y~x+z+x*tind+z*tind, family=gaussian(link = identity), id=sid, corstr=exchangeable) Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27 running glm to get initial regression estimate (Intercept) x z tind x:tind z:tind 2.9342186 1.5002601 5.0757760 2.0846327 0.1869748 -0.6379866 However, when I use the summary command, the coefficients are changed. Am I missing anything here ?? Yes. The printout above gives the starting values obtained by running glm(), not the results of gee(). If you use summary() on the model object, or just print the model object you will get the results of gee(). -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted box or violin plots in Lattice?
On Mon, May 9, 2011 at 6:35 AM, Deepayan Sarkar deepayan.sar...@gmail.com wrote: On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote: Is it possible to create weighted boxplots or violin plots in lattice? It seems that you can specify weights for panel.histogram() and panel.densityplot(), but not for panel.bwplot or panel.violin(). Not for panel.histogram() either. It's not immediately obvious how you would get weighted boxplots. The way the survey world does it is to get weighted quantiles and work from them, as in survey:::svyboxplot(). The only tricky decision is what to do with outliers, since I haven't been able to think of any good way of indicating weights on them. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in AnnotationDbi package - makeProbePackage
Dear all, We have developed our own Affymetrix chip (Custom Express Array, PM-only with two species). I want to analyse the data with the limma package, but for that I need to built my own CDF package, probe package and built the filters to analyse one specie or another. I'm using the makeProbePackage available in the AnnotationDbi (for a R-2.13.0) but I got the following error message: Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument I already tried an old version of R-2.9.1 with a package matchprobes (that is no longer update for R-2.13.0) but the same error occurs. Does anyone knows how could I overcome this problem? Thank you very much for your attention. Best Regards, Helga -- Helga Garcia PhD student Applied and Environmental Mycology Lab/Molecular Thermodynamics Lab ITQB I - UNL Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting n a scale up to 6 decimal places
Dear All, I am trying to plot some element of the below lists which contains 10,000 rows. For instance lfdr_true on x-axis vs hi and and I also include estimates as points on the plot. However, as you can notice the real difference comes in the later decimal places and not in the first 2 to 3 decimal places. Can someone suggest how to expand the scale to 6 places of decimals? head(all_simulated_samples[[1]]) gene lowerhi estimate covered lfdr_true 100.962 0.9913568TRUE 0.9564683 200.999 0.9965426TRUE 0.9184360 300.997 0.9941567TRUE 0.9783095 400.990 0.9949874TRUE 0.9342528 500.971 0.9943081TRUE 0.9053046 600.921 0.9854756TRUE 0.9927538 Following is the format of code I am using after extracting some values from list: x1-min(lfdr.true) x2-max(lfdr.true) plot(lfdr.true, original.estimate, pch=19, col = dark red,xlim = c(x1,x2)) points(upper.limit, pch=24, col = dark green) However, it doesn't help in differentiating the points I am plotting as they are so close upto first two decimal places. I would really appreciate any suggestions or comments. Thank you. Best Regards, Tasneem -- *Dr. Tasneem Zaihra* *PhD(Statistics)* *Assistant Professor* *UNB-SJ**, Saint John, NB, CA* *Tel:506-648-5624 (office)* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ARMA
Hello,Could somebody tell me what is the difference between theese 3 calls of functionsarma(x,order=c(1,0)), arima(x,order=c(1,0,0)) ar(x,order=1)?I expected same residuals of theese three models,but unexpectably for the first two R requiredinitial value of something (what?)...Thanks in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with mysql and R: partitioning by quintile
try this: # create some data x - data.frame(userid = paste('u', rep(1:20, each = 20), sep = '') + , track = rep(1:20, 20) + , freq = floor(runif(400, 10, 200)) + , stringsAsFactors = FALSE + ) # get the quantiles for each track tq - tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1)) # create a matrix with the rownames as the tracks to use in the findInterval tqm - do.call(rbind, tq) # now put the ratings require(data.table) x.dt - data.table(x) x.new - x.dt[, + list(userid = userid + , freq = freq + , rating = findInterval(freq +# use track as index into quantile matrix +, tqm[as.character(track[1L]),] +, rightmost.closed = TRUE +) + 1L + ) + , by = track] head(x.new) track userid freq rating [1,] 1 u1 10 1 [2,] 1 u2 15 1 [3,] 1 u3 126 4 [4,] 1 u4 117 3 [5,] 1 u5 76 2 [6,] 1 u6 103 3 On Sun, May 8, 2011 at 2:48 PM, gj gaw...@gmail.com wrote: Hi, I have a mysql table with fields userid,track,frequency e.g u1,1,10 u1,2,100 u1,3,110 u1,4,200 u1,5,120 u1,6,130 . u2,1,23 . . where frequency is the number of times a music track is played by a userid I need to turn my 'frequency' table into a rating table (it's for a recommender system). So, for each user, I need to categorise the frequency of tracks played by quintile so that each particular track can have 5 ratings (1-5), with the ratings allocated as follows: inter-quintile range 100-80% = rating 5, inter-quintile range 80-60% = rating 4, ..., inter-quintile range 20-0% = rating 1) Hence, I need to create a table with fields userid,track,rating: u1,1,1 u1,2, 3 ... Can anybody help me to do this with R? Regards Gawesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Hi Russ, Colnames don't get rewritten if they already exist. The reason is due to performance and how cbind is written at the R level. It isn't perfect per se, but the complexity and variety of dispatch that can take place for cbind in R, as it isn't a generic, is quite challenging to get to behave as one may hope. After years of trying I'd say it is nearly impossible to do what you want without causing horrible memory issues on non trivial objects they are use in production systems **using** xts on objects with billions of rows. Your simple case that has a simple workaround would cost everyone using in the other 99.999% of cases to pay a recurring cost that isn't tolerable. If this is frustrating to you you should stop using the class. Jeff Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com www.lemnica.com On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote: I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) close zero close.1 2007-01-03 1417.600 1416.60 2007-01-04 1419.340 1418.34 2007-01-05 1410.710 1409.71 2007-01-08 1413.840 1412.84 2007-01-09 1413.110 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close = close+1) close zero close.1 2007-01-03 1416.600 1417.60 2007-01-04 1418.340 1419.34 2007-01-05 1409.710 1410.71 2007-01-08 1412.840 1413.84 2007-01-09 1412.110 1413.11 Now the names on the first two columns are ok, but the third column is still wrong. Again, why is that? Apparently it's not letting me assign a name to a column that comes from something that already has a name. Is that the way it should be? I don't get that same problem on a simpler example. IX - cbind(I=0, X=(1:3)) IX I X [1,] 0 1 [2,] 0 2 [3,] 0 3 cbind(Y = 1, Z = IX[, I], W = IX[, X]) Y Z W [1,] 1 0 1 [2,] 1 0 2 [3,] 1 0 3 Is this a peculiarity to xts objects? Thanks. -- Russ P.S. Once again I feel frustrated because it's taken me far more time than it deserves to track down and characterize this problem. I can fix it by using the names function. But I shouldn't have to do that. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting n a scale up to 6 decimal places
The plot works fine for me with the example data you have given. Maybe it is something in your par settings. Try putting a ylim in your plot the same as you have done for the xlim: y1-min(original.estimate) y2-max(original.estimate) plot(lfdr.true, original.estimate, pch=19, col = dark red,xlim = c(x1,x2),ylim=c(y1,y2)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ARMA
boki2b wrote: Hello,Could somebody tell me what is the difference between theese 3 calls of functionsarma(x,order=c(1,0)), arima(x,order=c(1,0,0)) ar(x,order=1)?I expected same residuals of theese three models,but unexpectably for the first two R requiredinitial value of something (what?)...Thanks in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Firstly, all of the functions you mention can be explored in more detail by typing ?xxx, where xxx is ar, arima or arma. Note, you will need the tseries package to run the arma function. (require(tseries)) Secondly, there is some debate around interpretation of some of the output from these functions. Check out one viewpoint at http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm (ISSUE 1: When is the intercept the mean?) Finally, I have tried below to put the functions on a similar footing to allow you to compare their output. # create data set.seed(12345) d = rnorm(30) library(tseries) # need for arma function # 1. arma # fits using conditional least squares, intercept is the intercept arma(d,order=c(1,0)) ar1 intercept -0.105660.07046 # 2. arima # intercept is the mean, intercept derived as mean*(1-AR1) arima(d,order=c(1,0,0),method=CSS) ar1 intercept -0.1057 0.0638 So, intercept = 0.0638 * (1 + 0.1057) = 0.0705 ... same as arma above # 3. ar intercept is the intercept ar(d,aic=FALSE, order.max=1,method=ols,intercept=TRUE, demean=FALSE) 1 -0.1057 Intercept: -0.07054 (0.1731) There is a nice document on CRAN that you may find useful. http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/ARMA-tp3507846p3507963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with mysql and R: partitioning by quintile
One way to get the ratings would be to use the ave() function: rating = ave(x$freq,x$track, FUN=function(x)cut(x,quantile(x,(0:5)/5),include.lowest=TRUE)) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sun, 8 May 2011, gj wrote: Hi, I have a mysql table with fields userid,track,frequency e.g u1,1,10 u1,2,100 u1,3,110 u1,4,200 u1,5,120 u1,6,130 . u2,1,23 . . where frequency is the number of times a music track is played by a userid I need to turn my 'frequency' table into a rating table (it's for a recommender system). So, for each user, I need to categorise the frequency of tracks played by quintile so that each particular track can have 5 ratings (1-5), with the ratings allocated as follows: inter-quintile range 100-80% = rating 5, inter-quintile range 80-60% = rating 4, ..., inter-quintile range 20-0% = rating 1) Hence, I need to create a table with fields userid,track,rating: u1,1,1 u1,2, 3 ... Can anybody help me to do this with R? Regards Gawesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in AnnotationDbi package - makeProbePackage
On 05/08/2011 02:15 PM, Helga Garcia wrote: Dear all, We have developed our own Affymetrix chip (Custom Express Array, PM-only with two species). I want to analyse the data with the limma package, but for that I need to built my own CDF package, probe package and built the filters to analyse one specie or another. I'm using the makeProbePackage available in the AnnotationDbi (for a R-2.13.0) but I got the following error message: Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument I already tried an old version of R-2.9.1 with a package matchprobes (that is no longer update for R-2.13.0) but the same error occurs. Does anyone knows how could I overcome this problem? Hi Helga -- Please ask this question on the Bioconductor mailing list, where you will get expert advice. http://bioconductor.org/help/mailing-list/ Be sure that your packages are up-to-date http://bioconductor.org/install/ and that you include the output of sessionInfo() in your post. Martin Thank you very much for your attention. Best Regards, Helga -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional values of a vector XXXX
Hello everyone, Below is a metadata summary of raw data in a data frame (which itself is a data frame). I want to add 2 columns to the data frame below that will contain the min and max for integer and numeric class columns and NA for factors and character vectors. Can anyone suggestion the most succinct way of going about this? Variable ClassMode 1 id integer numeric 2 x1 numeric numeric 3 x2 numeric numeric 4 x3 numeric numeric 5y numeric numeric 6 gender factor numeric 7char1 factor numeric I have the following, but it does not work: x.contents.3-data.frame(id=1:ncol(x),Min=NA,Max=NA,row.names=id) x.contents.3[!(x.contents.2$Class(factor,character)),1]-sapply(x,min) x.contents.3[!(x.contents.2$Class(factor,character)),2]-sapply(x,max) === I receive the following: x.contents.3-data.frame(id=1:ncol(x),Min=NA,Max=NA,row.names=id) x.contents.3[!(x.contents.2$Class(factor,character)),1]-sapply(x,min) Error in Summary.factor(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, : min not meaningful for factors x.contents.3[!(x.contents.2$Class(factor,character)),2]-sapply(x,max) Error in Summary.factor(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, : max not meaningful for factors Thanks! Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Hi Jeff, The xts class has some very nice features, and you have done a valuable service in developing it. My primary frustration is how difficult it seems to be to find out what went wrong when my code doesn't work. I've been writing quite sophisticated code for a fairly long time. It's not that I'm new to software development. The column name rule is a good example. I'm willing to live with the rule that column names are not changed for efficiency sake. What's difficult for me is that I never saw that rule anywhere before. Of course, I'm not an R expect. I've been using it for only a couple of months. But still, I would have expected to run into a rule like that. Worse, since the rule is in conflict with the explicit intent of cbind--one can name columns when using cbind; in fact the examples illustrate how to do it--it would really be nice of cbind would issue a warning when one attempts to rename a column in violation of that rule. Instead, cbind is silent, giving no hint about what went wrong. It's those sorts of things that have caused me much frustration. And it's these sorts of things that seem pervasive in R. One never knows what one is dealing with. Did something not work because there is a special case rule that I haven't heard of? Did it not work because a special convenience was programmed into a function in a way that conflicted with normal use? Since these sorts of things seem to come up so often, I find myself feeling that there is no good way to track down problems, which leads to a sense of helplessness and confusion. That's not what one wants in a programming language. -- Russ On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan jeff.a.r...@gmail.com wrote: Hi Russ, Colnames don't get rewritten if they already exist. The reason is due to performance and how cbind is written at the R level. It isn't perfect per se, but the complexity and variety of dispatch that can take place for cbind in R, as it isn't a generic, is quite challenging to get to behave as one may hope. After years of trying I'd say it is nearly impossible to do what you want without causing horrible memory issues on non trivial objects they are use in production systems **using** xts on objects with billions of rows. Your simple case that has a simple workaround would cost everyone using in the other 99.999% of cases to pay a recurring cost that isn't tolerable. If this is frustrating to you you should stop using the class. Jeff Jeffrey Ryan|Founder| jeffrey.r...@lemnica.com jeffrey.r...@lemnica.com www.lemnica.com On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote: I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) close zero close.1 2007-01-03 1417.600 1416.60 2007-01-04 1419.340 1418.34 2007-01-05 1410.710 1409.71 2007-01-08 1413.840 1412.84 2007-01-09 1413.110 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close = close+1) close zero close.1 2007-01-03 1416.600 1417.60 2007-01-04 1418.340 1419.34 2007-01-05 1409.710 1410.71 2007-01-08 1412.840 1413.84 2007-01-09 1412.110 1413.11 Now the names on the first two columns are ok, but the third column is still wrong. Again, why is that? Apparently it's not letting me assign a name to a column that comes from something that already has a name. Is that the way it should be? I don't get that same problem on a simpler example. IX - cbind(I=0, X=(1:3)) IX I X [1,] 0 1 [2,] 0 2 [3,] 0 3 cbind(Y = 1, Z = IX[, I], W = IX[, X]) Y Z W [1,] 1 0 1 [2,] 1 0 2 [3,] 1 0 3 Is this a peculiarity to xts objects? Thanks. *-- Russ * * * P.S. Once again I feel frustrated because it's taken me far more time than it deserves to track down and characterize this problem. I can fix it by using the names function. But I shouldn't have to do that. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] new to loops
Not knowing what format your data is in or what model you are using... df # is your data frame with columns the variables you are running regressions for datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA) # a table to put your results in for(i in 1:length(names(df)[2:10])) { fit - glm(data[,1] ~ data[,i], data = df, etc...) datout[i,] - fit[e.g, 1:4] # determine what values in your model output are what you need } datout # a table with all your output for each variable On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote: I have never made a loop on my own to do anything in R. But I am hoping someone can help me build one for the following issue: I need to make a univariate logistic regression for each of my variables (about 62 of them), then I need to gather up each of their coefficients (not the intercepts), each of their 95% confidence intervals, and each of thier odds ratios and place them in a matrix to showcase them for my thesis. currently, I am writing them all out one by one with the cbond method, which has taken me a better part of a day so far and I know there has to be able to be a way to make a loop that can do this whole process, I just havent been able to figure it out yet. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in AnnotationDbi package - makeProbePackage
Dear all, We have developed our own Affymetrix chip (Custom Express Array, PM-only with two species). I want to analyse the data with the limma package, but for that I need to built my own CDF package, probe package and built the filters to analyse one specie or another. I'm using the makeProbePackage available in the AnnotationDbi (for a R-2.13.0) but I got the following error message: Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument I already tried an old version of R-2.9.1 with a package matchprobes (that is no longer update for R-2.13.0) but the same error occurs. Does anyone knows how could I overcome this problem? Thank you very much for your attention. Best Regards, Helga -- Helga Garcia PhD student Applied and Environmental Mycology Lab/Molecular Thermodynamics Lab ITQB I - UNL Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pretty printing numbers
Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing numbers
On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote: Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? Try this inserting a round() step before the final printing: x = 309283.929 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f) [1] 309,283.93 HTH, Peter cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing numbers
formatC(round(2189.745, 2), big.mark=,,format=f) [1] 2,189.7400 Unfortunately this does not work Worik On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder peter.langfel...@gmail.com wrote: On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote: Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? Try this inserting a round() step before the final printing: x = 309283.929 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f) [1] 309,283.93 HTH, Peter cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing numbers
On May 8, 2011, at 8:02 PM, Worik R wrote: formatC(round(2189.745, 2), big.mark=,,format=f) [1] 2,189.7400 Unfortunately this does not work Because you did not follow his example. -- David. Worik On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder peter.langfel...@gmail.com wrote: On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote: Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? Try this inserting a round() step before the final printing: x = 309283.929 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f) [1] 309,283.93 HTH, Peter cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing numbers
On Mon, May 9, 2011 at 12:06 PM, David Winsemius dwinsem...@comcast.netwrote: On May 8, 2011, at 8:02 PM, Worik R wrote: formatC(round(2189.745, 2), big.mark=,,format=f) [1] 2,189.7400 Unfortunately this does not work Because you did not follow his example. That is true. I did not notice the drop0trailing argument. That is perfect thank you. All the pieces were there, I should have been able to make it work with out having to ask, so thanks for your help. Worik -- David. Worik On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder peter.langfel...@gmail.com wrote: On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote: Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? Try this inserting a round() step before the final printing: x = 309283.929 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f) [1] 309,283.93 HTH, Peter cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing numbers
formatC(round(2189.745, 2), big.mark=,,format=f, drop0trailing = TRUE) On Sun, May 8, 2011 at 9:02 PM, Worik R wor...@gmail.com wrote: formatC(round(2189.745, 2), big.mark=,,format=f) [1] 2,189.7400 Unfortunately this does not work Worik On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder peter.langfel...@gmail.com wrote: On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote: Friends I am trying to format a number to a string so 2189.745 goes to 2,189.35 and 309283.929 goes to 309,283.93 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f) but it does not get the number of decimals correct. Specifying digits does not work as that is significant digits. I could use a switch statement switching on the size of the number to be formated to set the digits parameter but that is complex and leaves me insecure that thre may be other problems I have not uncovered. Is there a simple way of using library functions to insert big.marks and get the number of trailing digits correct? Perhaps some way of using sprintf? Try this inserting a round() step before the final printing: x = 309283.929 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f) [1] 309,283.93 HTH, Peter cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hosmer-Lemeshow 'goodness of fit'
I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic regression model. I found some code here: http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.html The R code is above is a little complicated for me but I'm having trouble with my answer: Hosmer-Lemeshow: p=0.6163585 le Cessie and Houwelingen test (Design library): p=0.2843620 The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model? Thanks in advance, -Rob Schutt Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia -- # Compute the Hosmer-Lemeshow 'goodness-of-fit' test cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG + relevel (as.factor (num.obst.vessels),one) + Current.smoker + DM + HTN + ace.inhibitor + MI, family = binomial(link = logit)) hosmerlem = function(y, yhat, g=10) { cutyhat = cut(yhat, breaks = quantile(yhat, probs=seq(0, 1, 1/g)), include.lowest=TRUE) obs = xtabs(cbind(1 - y, y) ~ cutyhat) expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat) chisq = sum((obs - expect)^2/expect) P = 1 - pchisq(chisq, g - 2) return(list(chisq=chisq,p.value=P)) } hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) 33 # Compute the le Cessie and Houwelingen test f - lrm(Collaterals ~ CHF + Age + CABG + relevel (as.factor (num.obst.vessels),one) + Current.smoker + DM + HTN + ace.inhibitor + MI, x=TRUE, y=TRUE) library(Design) resid(f, 'gof') Output: hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) $chisq [1] 6.275889 $p.value [1] 0.6163585 resid(f, 'gof') Sum of squared errors Expected value|H0SD 118.5308396 118.3771115 0.1435944 Z P 1.0705717 0.2843620 -- View this message in context: http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508127.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendation on B for validate.lrm () ?
Thanks so much for the reply it was exceptionally helpful! A couple of questions: 1. I was under the impression that k-fold with B=10 would train on 9/10, validate on 1/10, and repeat 10 times for each different 1/10th. Is this how the procedure works in R? 2. Is the reason you recommend repeating k-fold 100 times because the partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might obtain slightly different results? -- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508143.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
I am sorry,Andrew,I don't get you. Please forgive my poor English. -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3508131.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
Thanks Mike. Your suggestion is really helpful.I did with the your instruction , it really works out. What's more,can you use this package http://cran.r-project.org/web/packages/minpack.lm/index.html it use Levenberg-Marquardt algorithm. Can this package do with four parameters? Thanks again -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3508126.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new to loops
So in my first try before I got your message, this is what I did: orconf-list() ccoef-list() or-list() coef-list() out-list() for (i in 1:49){ out[[i]]-glm(y~var[[i]],family=binomial(link=logit)) coef[[i]]-out[[i]]$coef[2] or[[i]]-exp(out[[i]]$coef[2]) bond-matrix(out[[i]]$coef[2], exp(out[[i]]$coef[2]),confint(out[[i]]$coef[2]),exp(confint(out[[i]]$coef[2]))) } But it did not work due to confint(out[[i]]$coef[2] and the exp one. Said Error in object$coefficients : $ operator is invalid for atomic vectors. would I would to identify conf_low and conf_high as two separate things? Thanks On Sun, May 8, 2011 at 4:31 PM, Scott Chamberlain-3 [via R] ml-node+3508106-1763482049-235...@n4.nabble.com wrote: Not knowing what format your data is in or what model you are using... df # is your data frame with columns the variables you are running regressions for datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA) # a table to put your results in for(i in 1:length(names(df)[2:10])) { fit - glm(data[,1] ~ data[,i], data = df, etc...) datout[i,] - fit[e.g, 1:4] # determine what values in your model output are what you need } datout # a table with all your output for each variable On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote: I have never made a loop on my own to do anything in R. But I am hoping someone can help me build one for the following issue: I need to make a univariate logistic regression for each of my variables (about 62 of them), then I need to gather up each of their coefficients (not the intercepts), each of their 95% confidence intervals, and each of thier odds ratios and place them in a matrix to showcase them for my thesis. currently, I am writing them all out one by one with the cbond method, which has taken me a better part of a day so far and I know there has to be able to be a way to make a loop that can do this whole process, I just havent been able to figure it out yet. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.htmlhttp://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html?by-user=t Sent from the R help mailing list archive at Nabble.com. __ [hidden email]http://user/SendEmail.jtp?type=nodenode=3508106i=0by-user=tmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ [hidden email]http://user/SendEmail.jtp?type=nodenode=3508106i=1by-user=tmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508106.html To unsubscribe from help with a vector loop problem, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3507366code=c2V2YW5uYWRhcmtzdGFyQGdtYWlsLmNvbXwzNTA3MzY2fC0xMzE0Mjc1OTM3. -- View this message in context: http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508141.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hosmer-Lemeshow 'goodness of fit'
Please read the documentation carefully, and replace the Design package with the newer rms package. The older Hosmer-Lemeshow test requires binning and has lower power. It also does not penalize for overfitting. The newer goodness of fit test in rms/Design should not agree with Hosmer-Lemeshow. Frank viostorm wrote: I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic regression model. I found some code here: http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.html The R code is above is a little complicated for me but I'm having trouble with my answer: Hosmer-Lemeshow: p=0.6163585 le Cessie and Houwelingen test (Design library): p=0.2843620 The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model? Thanks in advance, -Rob Schutt Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia -- # Compute the Hosmer-Lemeshow 'goodness-of-fit' test cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG + relevel (as.factor (num.obst.vessels),one) + Current.smoker + DM + HTN + ace.inhibitor + MI, family = binomial(link = logit)) hosmerlem = function(y, yhat, g=10) { cutyhat = cut(yhat, breaks = quantile(yhat, probs=seq(0, 1, 1/g)), include.lowest=TRUE) obs = xtabs(cbind(1 - y, y) ~ cutyhat) expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat) chisq = sum((obs - expect)^2/expect) P = 1 - pchisq(chisq, g - 2) return(list(chisq=chisq,p.value=P)) } hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) 33 # Compute the le Cessie and Houwelingen test f - lrm(Collaterals ~ CHF + Age + CABG + relevel (as.factor (num.obst.vessels),one) + Current.smoker + DM + HTN + ace.inhibitor + MI, x=TRUE, y=TRUE) library(Design) resid(f, 'gof') Output: hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) $chisq [1] 6.275889 $p.value [1] 0.6163585 resid(f, 'gof') Sum of squared errors Expected value|H0SD 118.5308396 118.3771115 0.1435944 Z P 1.0705717 0.2843620 - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508185.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendation on B for validate.lrm () ?
Yes that's how it works, but a single run does not provide sufficient precision unless your sample size is enormous. When you partition into tenths again the partitions will be different so yes there is some randomness. Averaging over 100 times averages out the randomness. Or just use the bootstrap with B=300 (depending on sample size). Frank viostorm wrote: Thanks so much for the reply it was exceptionally helpful! A couple of questions: 1. I was under the impression that k-fold with B=10 would train on 9/10, validate on 1/10, and repeat 10 times for each different 1/10th. Is this how the procedure works in R? 2. Is the reason you recommend repeating k-fold 100 times because the partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might obtain slightly different results? - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508187.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] discriminant analysis
I am a student of ecology from Brazil and I need a tutorial on discriminant analysis, can someone help me? sylvia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discriminant analysis
There is a good tutorial here: http://www.statsoft.com/textbook/discriminant-function-analysis/ I doubt your question is appropriate for this list ... good luck! David Cross d.cr...@tcu.edu www.davidcross.us On May 8, 2011, at 7:28 PM, Sylvia Rocha wrote: I am a student of ecology from Brazil and I need a tutorial on discriminant analysis, can someone help me? sylvia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Russ, On May 8, 2011 6:29 PM, Russ Abbott russ.abb...@gmail.com wrote: Hi Jeff, The xts class has some very nice features, and you have done a valuable service in developing it. My primary frustration is how difficult it seems to be to find out what went wrong when my code doesn't work. I've been writing quite sophisticated code for a fairly long time. It's not that I'm new to software development. The column name rule is a good example. I'm willing to live with the rule that column names are not changed for efficiency sake. What's difficult for me is that I never saw that rule anywhere before. Of course, I'm not an R expect. I've been using it for only a couple of months. But still, I would have expected to run into a rule like that. Worse, since the rule is in conflict with the explicit intent of cbind--one can name columns when using cbind; in fact the examples illustrate how to do it--it would really be nice of cbind would issue a warning when one attempts to rename a column in violation of that rule. Instead, cbind is silent, giving no hint about what went wrong. Naming columns is not the explicit intent of cbind. The explicit intent is to combine objects by columns. Please don't overstate the case. While the examples for the generic show naming columns, neither ?cbind.zoo or ?cbind.xts have such examples. That's a hint. It's those sorts of things that have caused me much frustration. And it's these sorts of things that seem pervasive in R. One never knows what one is dealing with. Did something not work because there is a special case rule that I haven't heard of? Did it not work because a special convenience was programmed into a function in a way that conflicted with normal use? Since these sorts of things seem to come up so often, I find myself feeling that there is no good way to track down problems, which leads to a sense of helplessness and confusion. That's not what one wants in a programming language. If that's not what one wants, one can always write their own programming language. Seriously, it seems like you want to rant more than understand what's going on. You have the R and xts help pages and the source code. The Note section of help(cbind) tells you that the method dispatch is different. It even tells you what R source file to look at to see how dispatching is done. Compare the relevant source files from base::cbind and xts::cbind.xts, look at the R Language Definition manual to see how method dispatch is normally done. But you've been writing quite sophisticated code for a fairly long time, so I'm not telling you anything you don't know... you just don't think you should have to do the legwork. -- Russ -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan jeff.a.r...@gmail.com wrote: Hi Russ, Colnames don't get rewritten if they already exist. The reason is due to performance and how cbind is written at the R level. It isn't perfect per se, but the complexity and variety of dispatch that can take place for cbind in R, as it isn't a generic, is quite challenging to get to behave as one may hope. After years of trying I'd say it is nearly impossible to do what you want without causing horrible memory issues on non trivial objects they are use in production systems **using** xts on objects with billions of rows. Your simple case that has a simple workaround would cost everyone using in the other 99.999% of cases to pay a recurring cost that isn't tolerable. If this is frustrating to you you should stop using the class. Jeff Jeffrey Ryan | Founder | jeffrey.r...@lemnica.com jeffrey.r...@lemnica.com www.lemnica.com On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote: I'm having troubles with the names of columns. quantmod deal with stock quotes. I've created an array of the first 5 closing prices from Jan 2007. (Is there a problem that the name is the same as the variable name? There shouldn't be.) close close 2007-01-03 1416.60 2007-01-04 1418.34 2007-01-05 1409.71 2007-01-08 1412.84 2007-01-09 1412.11 When I try to create a more complex array by adding columns, the names get fouled up. Here's a simple example. cbind(changed.close = close+1, zero = 0, close) close zero close.1 2007-01-03 1417.60 0 1416.60 2007-01-04 1419.34 0 1418.34 2007-01-05 1410.71 0 1409.71 2007-01-08 1413.84 0 1412.84 2007-01-09 1413.11 0 1412.11 The first column should be called changed.close, but it's called close. The second column has the right name. The third column should be called close but it's called close.1. Why is that? Am I missing something? If I change the order of the columns and let close have its original name, there is still a problem. cbind(close, zero = 0, changed.close =
[R] reading a .mdf file in R
Hi, I have a very large .mdf database (36 GB) that I want to import to R. I'm using a computer with 64 GB of RAM, running on windows server 2008 R2 with SQL. Could anyone guide me through the process. I have absolutely no idea what to do. Thanks, Mauricio Romero [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
On Sun, May 8, 2011 at 1:17 PM, Russ Abbott russ.abb...@gmail.com wrote: I understand Josh's example: mat - matrix(1:10, dimnames = list(NULL, A)) cbind(X = 11:20, Y = mat + 1) cbind(X = 11:20, Y = mat[, A] + 1) In the line, cbind(X = 11:20, Y = mat + 1), it would be nice if an error or warning message were issued to the effect that the Y = part is ignored or not applicable or something to that effect. I still don't understand why cbind.xts doesn't do what I expect. If I try what Josh suggests on my xts example, the result is still confusing. cbind.xts(close, A = close[,close]) close close.1 2007-01-03 1416.60 1416.60 2007-01-04 1418.34 1418.34 2007-01-05 1409.71 1409.71 2007-01-08 1412.84 1412.84 2007-01-09 1412.11 1412.11 If this helps: dput(close) structure(c(1416.6, 1418.34, 1409.71, 1412.84, 1412.11), .indexCLASS = Date, .indexTZ = structure(, .Names = TZ), src = yahoo, updated = structure(1304875186.625, class = c(POSIXct, POSIXt)), class = c(xts, zoo), index = structure(c(1167811200, 1167897600, 1167984000, 1168243200, 1168329600), tzone = structure(, .Names = TZ), tclass = Date), .Dim = c(5L, 1L), .Dimnames = list(NULL, close)) It does. Thank you. It sounds like you want to keep the xts class, so this may not be useful to you, but if you are willing to drop it at the point you are cbind()ing things together: cbind(changed.close = as.matrix(close + 1)[, close], zero = 0, close) should do the trick. Cheers, Josh I was also unable to convert Josh's matrix example to xts. as.xts(mat, order.by = rownames()) Error in dimnames(x) : 'x' is missing The order.by parameter is required, but I don't understand the error message and haven't been able to figure out how to fix it. If this helps: dput(mat) structure(1:10, .Dim = c(10L, 1L), .Dimnames = list(NULL, A)) This suggests that mat already has defined dimnames. So why does as.xts give an error message about dimnames? I was unable to figure out the problem by looking at any of the help files. In particular the help file for as.xts.methods didn't seem to offer any clarifications--at least none that I could understand. -- Russ On Sun, May 8, 2011 at 12:26 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: class(mat) class(mat[, A]) -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Hi Russ, We're of course getting into some incredibly fine-level detail on how all of this works. I'll try and explain issues as I recall them over the development of xts and cbind.xts xts started as an extension of zoo. zoo is an extension of 'ts' (greatly simplified comparison of course, but stay with me) Achim and Gabor have put tremendous effort into the design of zoo - with a primary focus on keeping it consistent with base R behavior. That is, try not to introduce unnecessary changes to the interface an R user is accustomed to. The logic being that this makes for a more consistent interface as well as a easier learning curve and hence greater/faster adoption rate. 'xts' extends this, though with a bit more flexibility in terms of consistency. Why? Simply put - some things about R annoyed me coming from a time-series background. Number one was the fact that lag() is backwards. Backwards from expectation, nearly all literature, and all standard definitions. So xts breaks with lag(, n=1) behavior. This is obviously confusing to some - but was the gamble I was willing to take - consistency (with R) be damned! ;-) So, now back to cbind. cbind and merge in zoo-land (and xts by extension) are the same. This isn't the case for other classes that use these - but that is 'allowable' and 'expected' under a class dispatch system. The docs for ?cbind state: For cbind (rbind) the column (row) names are taken from the colnames (rownames) of the arguments if these are matrix-like. Otherwise from the names of the arguments or where those are not supplied and deparse.level 0, by deparsing the expressions given, for deparse.level = 1 only if that gives a sensible name (a symbol, see is.symbol). Based on that, I'd argue that xts does it right. Of course I'll also point out that this is incorrect thinking as well - since this is a description for the generic - and not for xts. But again in a highly configurable object/class system, where you start to make a distinction of right and wrong is itself up for debate. At the other end of the argument spectrum is _why not_. That is, why can't cbind.xts handle the names to replace the colnames of objects passed in. Here is where I'll point out that I am really just going by memory. Three major items are involved in cbind. One is that dispatch is quite unlike nearly every other dispatch in R. This is a fact - nothing to do with xts. * cbind isn't a generic (it's an .Internal call) * it uses ... * cbind can be called in numerous ways (I'll list only the common ones - but with R you can do even crazier things) do.call(cbind, do.call(cbind.xts, cbind, cbind.xts, merge, merge.xts, do.call(merge, do.call(merge.xts The rules of dispatch on cbind are really at a level that R-help has no business discussing. The second part is where things actually get tricky though. They all behave differently with respect to how args are handled - when eval'd, etc. I'm sure you have read how R strains itself on 'big data'. This is true and false. Improper use (or just naive use) can cause object copies in places you really don't want. Much of xts at this point is implemented in custom C code. The gain here is that you can make it eas(ier) to avoid copies until you need them by writing in C. Obvious, but needs to be said. To figure out what the columns have - and if names are attached to the objects in the pairlist (the ... in this context) - you have to be very careful. Touch anything in the wrong place or wrong time and you lose a figurative arm and leg to memory copies. So, in 99.% of cases - where you aren't naming (which would be an extra feature above and beyond c(olumn) binding [the reason for cbind] - you run a very real risk of getting nailed for copies you don't want. On 10MM obs that is almost manageable. On 100's of millions or billions - it is kill -9 time. To compound the issue - recall all of those different dispatch methods. Yep - they all behave just a bit differently. How? Honestly - I don't know or care. I simply know you can't easily make the behavior consistent amongst those calls. I have tried. And tried. End of day, and a very long R-help email, xts is different than base R. It is even different than it's 'parent' zoo behavior. But in exchange for this difference (and bit of learning/adjustment) you get a class that is faster than anything else. Period. x - .xts(1:1e7, 1:1e7) # our time series object m - coredata(x) # a matrix str(x) An xts object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing: Data: int [1:1000, 1] 1 2 3 4 5 6 7 8 9 10 ... Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago xts Attributes: NULL str(m) int [1:1000, 1] 1 2 3 4 5 6 7 8 9 10 ... system.time(x[,1]) # get the first column user system elapsed 0.017 0.000 0.017 system.time(m[,1]) # ditto user system elapsed
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Thank you all for your input. Unfortunately my problem is not yet resolved. Before I respond to individual comments I make a clarification: In Stata, using the same likelihood function as above, I can reproduce EXACTLY (to 3 decimal places or more, which is exactly considering I am using different software) the results from model 8 of the paper. I take this as an indication that I am using the same likelihood function as the authors, and that it does indeed work. The reason I am trying to estimate the model in R is because while Stata reproduces model 8 perfectly it has convergence difficulties for some of the other models. Peter Dalgaard, Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Thanks for the suggestion. Using these starting values produces the exact estimate that Dave Fournier emailed me. If these are the exact solution then why did the author publish different answers which are completely reproducible in Stata and Tsp? Ravi, Thanks for introducing optimx to me, I am new to R. I completely agree that you can get higher log-likelihood values than what those obtained with optim and the starting values suggested by Peter. In fact, in Stata, when I reproduce the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139. Furthermore if I estimate model 8 without symmetry imposed on the system I reproduce the Likelihood Ratio reported in the paper to 3 decimal places as well, suggesting that the log-likelihoods I am reporting differ from those in the paper only due to a constant. Thanks for your comments, I am still highly interested in knowing why the results of the optimisation in R are so different to those in Stata? I might try making my convergence requirements more stringent. Kind regards, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted box or violin plots in Lattice?
On Mon, May 9, 2011 at 2:20 AM, Thomas Lumley tlum...@uw.edu wrote: On Mon, May 9, 2011 at 6:35 AM, Deepayan Sarkar deepayan.sar...@gmail.com wrote: On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote: Is it possible to create weighted boxplots or violin plots in lattice? It seems that you can specify weights for panel.histogram() and panel.densityplot(), but not for panel.bwplot or panel.violin(). Not for panel.histogram() either. It's not immediately obvious how you would get weighted boxplots. The way the survey world does it is to get weighted quantiles and work from them, as in survey:::svyboxplot(). The only tricky decision is what to do with outliers, since I haven't been able to think of any good way of indicating weights on them. So I guess with bwplot() one would have to write a version of boxplot.stats() that uses weighted quantiles, and then write a modified panel function. A crude approximation (if the weights don't vary too much) may be to scale and round the weights to integers, and then repeat each data point; e.g., newx - rep(x, w), etc. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another quantmod question
Jeff, Clearly you (and others) have put a lot of work into xts -- and I'm the beneficiary. So I'll stop complaining. Thanks for the class (both code and explanation). *-- Russ * On Sun, May 8, 2011 at 8:23 PM, Jeff Ryan jeff.a.r...@gmail.com wrote: Hi Russ, We're of course getting into some incredibly fine-level detail on how all of this works. I'll try and explain issues as I recall them over the development of xts and cbind.xts xts started as an extension of zoo. zoo is an extension of 'ts' (greatly simplified comparison of course, but stay with me) Achim and Gabor have put tremendous effort into the design of zoo - with a primary focus on keeping it consistent with base R behavior. That is, try not to introduce unnecessary changes to the interface an R user is accustomed to. The logic being that this makes for a more consistent interface as well as a easier learning curve and hence greater/faster adoption rate. 'xts' extends this, though with a bit more flexibility in terms of consistency. Why? Simply put - some things about R annoyed me coming from a time-series background. Number one was the fact that lag() is backwards. Backwards from expectation, nearly all literature, and all standard definitions. So xts breaks with lag(, n=1) behavior. This is obviously confusing to some - but was the gamble I was willing to take - consistency (with R) be damned! ;-) So, now back to cbind. cbind and merge in zoo-land (and xts by extension) are the same. This isn't the case for other classes that use these - but that is 'allowable' and 'expected' under a class dispatch system. The docs for ?cbind state: For cbind (rbind) the column (row) names are taken from the colnames (rownames) of the arguments if these are matrix-like. Otherwise from the names of the arguments or where those are not supplied and deparse.level 0, by deparsing the expressions given, for deparse.level = 1 only if that gives a sensible name (a symbol, see is.symbol). Based on that, I'd argue that xts does it right. Of course I'll also point out that this is incorrect thinking as well - since this is a description for the generic - and not for xts. But again in a highly configurable object/class system, where you start to make a distinction of right and wrong is itself up for debate. At the other end of the argument spectrum is _why not_. That is, why can't cbind.xts handle the names to replace the colnames of objects passed in. Here is where I'll point out that I am really just going by memory. Three major items are involved in cbind. One is that dispatch is quite unlike nearly every other dispatch in R. This is a fact - nothing to do with xts. * cbind isn't a generic (it's an .Internal call) * it uses ... * cbind can be called in numerous ways (I'll list only the common ones - but with R you can do even crazier things) do.call(cbind, do.call(cbind.xts, cbind, cbind.xts, merge, merge.xts, do.call(merge, do.call(merge.xts The rules of dispatch on cbind are really at a level that R-help has no business discussing. The second part is where things actually get tricky though. They all behave differently with respect to how args are handled - when eval'd, etc. I'm sure you have read how R strains itself on 'big data'. This is true and false. Improper use (or just naive use) can cause object copies in places you really don't want. Much of xts at this point is implemented in custom C code. The gain here is that you can make it eas(ier) to avoid copies until you need them by writing in C. Obvious, but needs to be said. To figure out what the columns have - and if names are attached to the objects in the pairlist (the ... in this context) - you have to be very careful. Touch anything in the wrong place or wrong time and you lose a figurative arm and leg to memory copies. So, in 99.% of cases - where you aren't naming (which would be an extra feature above and beyond c(olumn) binding [the reason for cbind] - you run a very real risk of getting nailed for copies you don't want. On 10MM obs that is almost manageable. On 100's of millions or billions - it is kill -9 time. To compound the issue - recall all of those different dispatch methods. Yep - they all behave just a bit differently. How? Honestly - I don't know or care. I simply know you can't easily make the behavior consistent amongst those calls. I have tried. And tried. End of day, and a very long R-help email, xts is different than base R. It is even different than it's 'parent' zoo behavior. But in exchange for this difference (and bit of learning/adjustment) you get a class that is faster than anything else. Period. x - .xts(1:1e7, 1:1e7) # our time series object m - coredata(x) # a matrix str(x) An xts object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing: Data: int
Re: [R] Hosmer-Lemeshow 'goodness of fit'
Again, thanks so much for your help. Is there a for dummies version for interpreting the le Cessie and Houwelingen test. I read the 1991 biometrics paper but honestly got lost in the math. Is it interpreted the same way the Hosmer-Lemeshow test is? ie, a non-significant result means model fits well. I guess what does, what would my p-value of p=0.284362 tell you about my model? I would like to describe the goodness of fit of my model in a paper but I'm worried the average medical reader would not know how to interpret the result of this test whereas there are lots of references on interpreting Hosmer-Lemeshow. (my cross validated c-statistic was 0.69 and r^2 was 0.15) Thanks again for your all help! (also with my k-fold crossvalidation question!) -Rob Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia -- View this message in context: http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508219.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new to loops
Maybe this is what you're looking for: # x is your set of explanatory variables (10 of them): x - array(rnorm(1), dim=c(1000,10)) # y is your dependent variable: y - rbinom(1000, 1, 0.3) # run the regression on each column of x: reg - apply(x, 2, function(z) glm(y ~ z, family=binomial(link='logit'))) # the previous output is a list. Now collect whatever you need from each list element: bond - lapply(reg, function(z) c(z$coeff[2], exp(z$coeff[2]), confint(z)[2,], exp(confint(z)[2,]))) # collect everything together as a matrix: bond - do.call(rbind, bond) bond -S. SevannaD wrote: So in my first try before I got your message, this is what I did: orconf-list() ccoef-list() or-list() coef-list() out-list() for (i in 1:49){ out[[i]]-glm(y~var[[i]],family=binomial(link=logit)) coef[[i]]-out[[i]]$coef[2] or[[i]]-exp(out[[i]]$coef[2]) bond-matrix(out[[i]]$coef[2], exp(out[[i]]$coef[2]),confint(out[[i]]$coef[2]),exp(confint(out[[i]]$coef[2]))) } But it did not work due to confint(out[[i]]$coef[2] and the exp one. Said Error in object$coefficients : $ operator is invalid for atomic vectors. would I would to identify conf_low and conf_high as two separate things? Thanks On Sun, May 8, 2011 at 4:31 PM, Scott Chamberlain-3 [via R] ml-node+3508106-1763482049-235...@n4.nabble.com wrote: Not knowing what format your data is in or what model you are using... df # is your data frame with columns the variables you are running regressions for datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA) # a table to put your results in for(i in 1:length(names(df)[2:10])) { fit - glm(data[,1] ~ data[,i], data = df, etc...) datout[i,] - fit[e.g, 1:4] # determine what values in your model output are what you need } datout # a table with all your output for each variable On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote: I have never made a loop on my own to do anything in R. But I am hoping someone can help me build one for the following issue: I need to make a univariate logistic regression for each of my variables (about 62 of them), then I need to gather up each of their coefficients (not the intercepts), each of their 95% confidence intervals, and each of thier odds ratios and place them in a matrix to showcase them for my thesis. currently, I am writing them all out one by one with the cbond method, which has taken me a better part of a day so far and I know there has to be able to be a way to make a loop that can do this whole process, I just havent been able to figure it out yet. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.htmllt;http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html?by-user=tgt; Sent from the R help mailing list archive at Nabble.com. __ [hidden email]lt;http://user/SendEmail.jtp?type=nodeamp;node=3508106amp;i=0amp;by-user=tgt;mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ [hidden email]lt;http://user/SendEmail.jtp?type=nodeamp;node=3508106amp;i=1amp;by-user=tgt;mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508106.html To unsubscribe from help with a vector loop problem, click herelt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codeamp;node=3507366amp;code=c2V2YW5uYWRhcmtzdGFyQGdtYWlsLmNvbXwzNTA3MzY2fC0xMzE0Mjc1OTM3gt;. -- View this message in context: http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508483.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.