Re: [R] Logistic Regression - Interpreting SENS (Sensitiv ity) and SPEC (Specificity)
Maithili Shiva maithili_shiva at yahoo.com writes: I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). The simple and wrong answer is to use these data directly to compute sensitivity (fraction of hits). This measure is useless, but I encounter it often in medical publications. You can get a more reasonable answer by using cross-validation. Check, for example, Frank Harrell's http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicting from a local regression and plotting in lattice
Alex Karner aakarner at ucdavis.edu writes: I realize these limitations. However, I know that my actual dataset is reasonably well behaved in the range I want to predict, and I'm not using the predicted values for any further analysis, only for schematic purposes in the plot. I'm still curious if this type of extension of a loess line is possible, notwithstanding its statistical shortcomings. I don't understand why you want to use a local method for a global job. Why not use a spline, a logistic regression (also well behaved!) or some exp-derivate if it is a growth curve? Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Blowing up portions of a graph
Hi, I have a really large graph and would like to zoom in on portions of the graph and post them as blocks below the graph.Is there an add on package to do this? -- Rajesh.J I skate to where the puck is going to be, not where it has been. - Wayne Gretzky - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] png(): Linux vs Windows
On Sun, 12 Oct 2008, [EMAIL PROTECTED] wrote: On 12-Oct-08 22:09:46, jim holtman wrote: Seem to work fine on my R 2.7.2 version of Windows: png(file=myplot.png, bg=transparent, units='cm', width=12,height=15, res=200) plot(1:10) rect(1, 5, 3, 7, col=white) dev.off() Did you check the version they are using. Hi Jim, Thanks! I've now learned that it is R 2.5.1 (which I see is from June 2007). Ted. This difference *is* the version of R, not 'Linux vs Windows'. The NEWS for 2.6.0 says o jpeg(), png(), bmp() (Windows), dev2bitmap() and bitmap() have a new argument 'units' to specify the units of 'width' and 'height'. (that is something you could have checked without access to R under Windows). On Sun, Oct 12, 2008 at 6:02 PM, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Quick question. I have the following line in an R code file which runs fine on Linux: if(PNG) png(GraphName,width=12,height=15,units=cm,res=200) I learn that, when the same code was run on a Windows machine, there was the following error: Error in png(GraphName,width=12,height=15,units=cm,res=200): unused argument(s) (units = cm) Sorry to be a bother -- but could a Windows Ruser put me wise on any differences between png() on Windows and Linux? (And, sorry, I don't know what version of R, nor what version of Windows, this occurred on). Thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 23:02:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 23:24:21 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave-LaTEX question
On Sun, Oct 12, 2008 at 1:39 AM, Felipe Carrillo [EMAIL PROTECTED] wrote: I am working on a publication and I have heard about LaTEX but I haven't actually tried to learn about it until today. I've found a few There are two more packages that might be of interest: RReportGenerator [1] and relax [2]. Liviu [1] http://alnitak.u-strasbg.fr/~wraff/RReportGenerator/index.php [2] http://cran.r-project.org/web/packages/relax/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numeric derivation
On Sun, 12 Oct 2008, David Winsemius wrote: Two follow-up questions: A) I get an error message when using Harrell's describe() function on one of my variable, telling me that sum() is not meaningful for a difftime object. Why should sum() not be meaningful for a collection of interval lengths? That's not what it actually says. It says it is 'not defined' -- it could be defined but it has not been. Just add a function sum.difftime() with appropriate code (and watch out that different difftime objects can be in different units). describe(pref900) Error in Summary.difftime(c(1075, 3429, 2915, 2002, 967, 1759, 532, 589, : 'sum' not defined for difftime objects summary() is informative and throws no error, but does not report means. Even with na.rm=TRUE, sum fails: sum(pref900$deatht, na.rm=TRUE) Error in Summary.difftime(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, : 'sum' not defined for difftime objects My interest in the sum of difftime objects comes from my interest in calculating the number of person-years of observation in various categories. I have durations created by subtracting times. B) The help pages are not particularly expansive regarding the output of deltat() but your answer suggests that it should work on non-time objects as well? Am I correct in assuming you meant that diff(x)/deltat(x) should be meaningful for any numeric x. -- David Winsemius R 2.7.1 / Mac OS 10.5.4 / Intel CPUs s On Oct 12, 2008, at 10:34 AM, Gabor Grothendieck wrote: ?deltat On Sun, Oct 12, 2008 at 9:45 AM, Oliver Bandel [EMAIL PROTECTED] wrote: Zitat von Gabor Grothendieck [EMAIL PROTECTED]: If you simply want successive differences use diff: x - seq(4)^2 diff(x) tx - ts(x) diff(tx) [...] Oh, cool, thanks. But what about diff / delta_t ? Do I have to calculate it by my own, or is there already a function for making a difference-qoutient? This would be fine to have, because for example coming from space vs. time to velocity vs. time and acceleration vs. time (and further derivatives) are also a time-series. The possibility of using the advantages of the time series class here, would be fine. Ciao, Oliver __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)
Dieter Menne wrote: Maithili Shiva maithili_shiva at yahoo.com writes: I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). The simple and wrong answer is to use these data directly to compute sensitivity (fraction of hits). This measure is useless, but I encounter it often in medical publications. You can get a more reasonable answer by using cross-validation. Check, for example, Frank Harrell's http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf But if he has a hold out sample, isn't he already cross-validating?? I wonder if you're answering the right question there. Could he just be looking for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation right.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen DenmarkPh: (+45) 35327918 ~~ - ([EMAIL PROTECTED])FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)
Peter Dalgaard p.dalgaard at biostat.ku.dk writes: But if he has a hold out sample, isn't he already cross-validating?? I wonder if you're answering the right question there. Could he just be looking for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation right.) You are right. My brain was biased by some ongoing discussion. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear expenditure model
I have already used aidsEst() function in the micEcon package for the estimation of elasticities. But with the LES, I plan to estimate the minimal consumption level...which is not possible with the AIDS model, do I? Arne Henningsen wrote: Hi Marie! On Friday 10 October 2008 12:40:23, Marie Vandresse wrote: I would like to estimate a linear expendire with Systemfit package. (method: SUR) If I remember correctly, the linear expenditure system (LES) is linear in income but non-linear in the parameters. Hence, you have to estimate a system of non-linear equations. Unfortunately, the nlsystemfit() function in the systemfit package that estimates systems of non-linear equations is still under development and has convergence problems rather often. Since the systemfit() function in the systemfit package that estimates systems of linear equations is very reliable [1], I suggest that you choose a demand system that is linear in parameters (e.g. the Almost Ideal Demand System, AIDS) [1] http://www.jstatsoft.org/v23/i04 As someone could show me how to define the equations? If you use the aidsEst() function in the micEcon package [2], you don't have to specify the equations of the Almost Ideal Demand System yourself. [2] http://www.micEcon.org Best wishes, Arne -- Marie Vandresse Bureau fédéral du Plan Avenue des Arts 47-49 1000 Bruxelles Bureau 212 Tél. +32.(0)2.507.73.62 Tél. +32.(0)2.507.73.73 http://www.plan.be Think before you print ! ** Disclaimer: This e-mail may contain confidential informa...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). My prolem is how to interpret these results? What I have arrived at are the absolute figures. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : using predict() or fitted() from a model with offset; unsolved, included reproducible code
Thanks for your reply Mark, but no, using predict on the new data.frame does not help here. I had first thought that the probelm was due the explanatory variable (age) and the offset one (date) being very similar (highly correlated, I am trying to tease their effect apart, and hoped offset would help in this since I know the relationship with age already). But this appears not to be the case. Simply, the predicted (or fitted) values for the offset model always return predicted values based on the effect of the variable within offset(), completely ignoring the explanatory variable and that it is supposed to offset the effect in the first place: such as, i get the same predicted values for the 2 very different models below. The summary table and coefficents remain perfectly valid though (and very different). lmAO-glm(MassChange24h~T1+offset(-2*AGE), family=gaussian,na.action=na.exclude) lmAO-glm(MassChange24h~AGE, family=gaussian,na.action=na.exclude) Has anyone got any experience in predicting from models that include an offset term? Am I not specifying the offset term correctly in the model? Please get back to me if you have the slightest idea of what is going on. Or if you would know of another way than offset for my purposes I include below reproducible code with dummy data. Models do not fit, but they work. Thank you Samuel Riou ## AGE- c(1:10) MassChange24h-c(10,8,6,4,2,0,-2,-4,-6,-8) T1-c(10,11,12,13,14,15,16,17,18,19) ### variable for which I want the effect, taking into acount the known effect of AGE T-c(A,B,A,B,A,B,A,B,A,B) ## added for testing T-c(1,2,3,4,5,6,5,4,3,2) ## added for testing #no offset lmA-glm(MassChange24h~T1, na.action=na.exclude, family=gaussian) summary(lmA) fitted(lmA) #linear offset lmAO-glm(MassChange24h~T1+offset(-2*AGE), family=gaussian,na.action=na.exclude) ### model lmAO-glm(MassChange24h~T1+offset(AGE), family=gaussian,na.action=na.exclude) lmAO-glm(MassChange24h~AGE, family=gaussian,na.action=na.exclude) ###the fitted values from the offset model are the same as from this one! summary(lmAO) ## table is fine, shows the effect of T1, taking into account the offset fitted(lmAO) ## Problem : getting same values as for model lmA nd1-expand.grid(T1=c(10,11,12,13,14,15,16,17,18,19)) Pred-predict(lmA, nd1, type=response) nd1-expand.grid(T1=c(10,11,12,13,14,15,16,17,18,19)) Pred-predict(lmAO, nd1, type=response) get same values as for model lmA , and changing T variable in the offset model, again i get the same predicted values...very strange # - Message d'origine De : [EMAIL PROTECTED] [EMAIL PROTECTED] À : [EMAIL PROTECTED] Envoyé le : Dimanche, 12 Octobre 2008, 20h16mn 36s Objet : RE: [R] using predict() or fitted() from a model with offset hi: I haven't use fitted much but when you used predict, did you send in the new dataframe ? the code below says that you didn't but i don't know if that would fix it anyway. On Sun, Oct 12, 2008 at 6:36 AM, [EMAIL PROTECTED] wrote: Dear R-users, I have come across some difficulties using the offset argument in a model. There is not much information in ?offset, ?lm, ?glm and therefore have resorted to post here. Offset appears to work in the models I run becuase I get the expected coefficients when comparing offset and non-offset models . BUT the fitted values obtained from fitted() are identical in both models!! Why is this? Is there an argument to add to fitted() so that it takes the offset into accout? Note that I have included the offset in the formula, as seen below in the code. I have also tried to use predict, with exactly the same result: the offset is ignored. This applies to both lms and glms. Am I missing something here? Thank you Samuel Riou CODE #no offset lmA-lm(MassChange24h~DATEN1, subset(Chicks1, Year==2007 AGE10), na.action=na.exclude) summary(lmA) #linear offset lmAO-lm(MassChange24h~DATEN1+offset(-0.37356*AGE), subset(Chicks1, Year==2007 AGE10), na.action=na.exclude) summary(lmAO) print(Chicks$DATEN1[Year==2007 AGE10]) print(t(fitted(lmA))) NEW-cbind(as.vector(t(fitted(lmA))), Chicks$DATEN1[Year==2007 AGE10]) NEW-as.data.frame(NEW) m1-aggregate(NEW[1],NEW[2],mean, na.rm=TRUE) plot(m1$V1~m1$V2, pch=20, col=black) Pred-predict(lmA) print(Chicks$DATEN1[Year==2007 AGE10]) print(t(fitted(lmAO))) NEW-cbind(as.vector(t(fitted(lmAO))), Chicks$DATEN1[Year==2007 AGE10]) NEW-as.data.frame(NEW) m1-aggregate(NEW[1],NEW[2],mean, na.rm=TRUE) points(m1$V1~m1$V2, pch=20, col=red) ###but the fitted values dont seem to take into account the offset Pred-predict(lmAO) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)
On Mon, 13 Oct 2008, Peter Dalgaard wrote: Dieter Menne wrote: Maithili Shiva maithili_shiva at yahoo.com writes: I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). The simple and wrong answer is to use these data directly to compute sensitivity (fraction of hits). This measure is useless, but I encounter it often in medical publications. You can get a more reasonable answer by using cross-validation. Check, for example, Frank Harrell's http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf But if he has a hold out sample, isn't he already cross-validating?? I wonder if you're answering the right question there. Could he just be looking for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation right.) Strictly no, she is 'validating' (no cross- involved). Cross-validation would be a better idea for much smaller sample sizes (we don't know how many regressors are involved, so say hundreds unless there are more than ten regressors). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.pass
If you want to remove the N, then you can work with the indices: x [1] NA B NA B B NA N A B B A NA A N N N A B B A # if you want the indices of the non-N, then (indx - which(is.na(x) | x != N)) [1] 1 2 3 4 5 6 8 9 10 11 12 13 17 18 19 20 x[indx] [1] NA B NA B B NA A B B A NA A A B B A On Mon, Oct 13, 2008 at 7:48 AM, Laura Bonnett [EMAIL PROTECTED] wrote: I have a data frame. It has lots of patient information, their age, their gender, etc etc. I need to keep all this information whilst selecting relevant rows. So, in the example of code I provided I want to remove all those patients who have entry N in the column with.Wcode. The dimension of the data is 378 i.e. 378 patients and currently I am replacing any entries in column with.Wcode with the letter O as this is another level of the same column. Does that make more sense? nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } How can I therefore not replace NA with level O but instead, ignore the NAs and effectively gloss over them? Thank you, Laura On Mon, Oct 13, 2008 at 12:42 PM, jim holtman [EMAIL PROTECTED] wrote: Not sure exactly what you are trying to do since you did not provide commented, minimal, self-contained, reproducible code. Let me take a guess in that you also have to test for NAs: x - sample(c(N, A, B, NA), 20, TRUE) x [1] A A B NA N NA NA B B N N N B A NA A B NA A NA x != N [1] TRUE TRUE TRUENA FALSENANA TRUE TRUE FALSE FALSE FALSE TRUE TRUENA TRUE TRUENA [19] TRUENA x[x != N] [1] A A B NA NA NA B B B A NA A B NA A NA (!is.na(x)) (x != N) [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE [19] TRUE FALSE x[(!is.na(x)) (x != N)] [1] A A B B B B A A B A On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett [EMAIL PROTECTED] wrote: Hi All, I have a data frame which has columns comprised mainly of NAs. I know there are functions na.pass and na.omit etc which can be used in these situations however I can't them to work in this case. I have a function which returns the data according to some rule i.e. removal of N in this code: nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } However, I really don't want to replace the NAs with O. I'd just like to gloss over them. I can't just delete them because the structure of the data frame needs to be maintained. Can anyone suggest how I can write in a line or two to ignore the NAs instead of replacing them? I've tried this code but it doesn't work! nep - function(data) { dummy - rep(0,378) for(i in 1:378){ na.pass(data$with.Wcode[i]) if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } Thank you, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Timestamps and manipulations
Is this what you want: x.df timestamp id 1 2008-05-27 22:57:00 763830873067 2 2008-05-27 23:00:00 763830873067 3 2008-05-27 23:01:00 763830873067 4 2008-05-27 23:01:00 763830873067 5 2008-06-05 11:34:00 763830873067 6 2008-05-29 23:08:00 765253440317 7 2008-05-29 23:06:00 765253440317 8 2008-05-29 22:52:00 765253440317 9 2008-05-29 22:52:00 765253440317 10 2008-05-29 23:04:00 765253440317 11 2008-06-27 19:34:00 765253440317 12 2008-07-09 15:45:00 765329002557 13 2008-07-06 19:24:00 765329002557 14 2008-07-09 15:46:00 765329002557 15 2008-07-07 13:05:00 765329002557 16 2008-05-16 22:40:00 765329002557 17 2008-06-08 11:24:00 765329002557 18 2008-06-08 12:33:00 765329002557 x.df$time - ifelse(x.df$timestamp as.POSIXct(2008-07-01), 1, 0) x.df timestamp id time 1 2008-05-27 22:57:00 7638308730670 2 2008-05-27 23:00:00 7638308730670 3 2008-05-27 23:01:00 7638308730670 4 2008-05-27 23:01:00 7638308730670 5 2008-06-05 11:34:00 7638308730670 6 2008-05-29 23:08:00 7652534403170 7 2008-05-29 23:06:00 7652534403170 8 2008-05-29 22:52:00 7652534403170 9 2008-05-29 22:52:00 7652534403170 10 2008-05-29 23:04:00 7652534403170 11 2008-06-27 19:34:00 7652534403170 12 2008-07-09 15:45:00 7653290025571 13 2008-07-06 19:24:00 7653290025571 14 2008-07-09 15:46:00 7653290025571 15 2008-07-07 13:05:00 7653290025571 16 2008-05-16 22:40:00 7653290025570 17 2008-06-08 11:24:00 7653290025570 18 2008-06-08 12:33:00 7653290025570 # time difference by id sapply(split(x.df$timestamp, x.df$id), function(.time){ + difftime(max(.time), min(.time), units='secs') + }) 763830873067 765253440317 765329002557 736620 2493720 4640760 On Mon, Oct 13, 2008 at 6:57 AM, Michael Pearmain [EMAIL PROTECTED] wrote: Hi All, I've a couple of questions i've been struggling with using the time features, can anyone help? sample data Timestampuser_id 27/05/08 22:57 763830873067 27/05/08 23:00 763830873067 27/05/08 23:01 763830873067 27/05/08 23:01 763830873067 05/06/08 11:34 763830873067 29/05/08 23:08 765253440317 29/05/08 23:06 765253440317 29/05/08 22:52 765253440317 29/05/08 22:52 765253440317 29/05/08 23:04 765253440317 27/06/08 19:34 765253440317 09/07/08 15:45 765329002557 06/07/08 19:24 765329002557 09/07/08 15:46 765329002557 07/07/08 13:05 765329002557 16/05/08 22:40 765329002557 08/06/08 11:24 765329002557 08/06/08 12:33 765329002557 My first question is how can i create a new var creating a filter based on a date? I've tried as.POSIXct.strptime below as well but to no avail.. can anyone give any advice? Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y %H:%M)) Mcookie$time - ifelse(Mcookie$timestamp strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0) My second questions refers to finding the time difference in seconds between the first time a user sees something Vs the last.. and engagment time essentially, i see there is the difftime function, is there a more elegant way of working this out then my thoughts (Pysdo code below) sort data by user_id and Timestamp take the head of user_id as new_time_var take the tail of user_id as new_time_var2 use difftime(new_time_var, new_time_var2, units=secs) Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.pass
Not sure exactly what you are trying to do since you did not provide commented, minimal, self-contained, reproducible code. Let me take a guess in that you also have to test for NAs: x - sample(c(N, A, B, NA), 20, TRUE) x [1] A A B NA N NA NA B B N N N B A NA A B NA A NA x != N [1] TRUE TRUE TRUENA FALSENANA TRUE TRUE FALSE FALSE FALSE TRUE TRUENA TRUE TRUENA [19] TRUENA x[x != N] [1] A A B NA NA NA B B B A NA A B NA A NA (!is.na(x)) (x != N) [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE [19] TRUE FALSE x[(!is.na(x)) (x != N)] [1] A A B B B B A A B A On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett [EMAIL PROTECTED] wrote: Hi All, I have a data frame which has columns comprised mainly of NAs. I know there are functions na.pass and na.omit etc which can be used in these situations however I can't them to work in this case. I have a function which returns the data according to some rule i.e. removal of N in this code: nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } However, I really don't want to replace the NAs with O. I'd just like to gloss over them. I can't just delete them because the structure of the data frame needs to be maintained. Can anyone suggest how I can write in a line or two to ignore the NAs instead of replacing them? I've tried this code but it doesn't work! nep - function(data) { dummy - rep(0,378) for(i in 1:378){ na.pass(data$with.Wcode[i]) if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } Thank you, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gower distance between a individual and a population
Hi the list, I need to compute Gower distance between a specific individual and all the other individual. The function DAISY from package cluster compute all the pairwise dissimilarities of a population. If the population is N individuals, that is arround N^2 distances to compute. I need to compute the distance between a specific individual and all the other individual, that is only N distances to compute. Is there a function that can do it ? Christophe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS Data
Hello everybody, I would like to read a SAS Data data1.sas7bdat in R! Is this possible? Thank you a lot in advance ;), Stefo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS Data
Stefo Ratino wrote: Hello everybody, I would like to read a SAS Data data1.sas7bdat in R! Is this possible? Only if you have access to SAS. The file format is proprietary and noone has bothered to decipher it. Thank you a lot in advance ;), Stefo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS Data
library(foreign) Rdata=read.ssd(Z:/MyFolder,data1,sascmd = C:/Program Files/SAS/SAS 9.1/sas.exe) Rdata Y D1 D2 D3 1 100 1 0 0 2 101 1 0 0 3 105 1 0 0 4 200 0 1 0 5 201 0 1 0 6 205 0 1 0 7 300 0 0 1 8 301 0 0 1 9 305 0 0 1 where 'data1' is the SAS datafile to be read, ' Z:\\MyFolder' is the physical path where 'data1.sas7bdat' is situated and 'sascmd' refers to the path where SAS is installed in your system. Please confirm that your SAS is installed in the same path. Also note that you have to install the 'foreign' package to do this. BR, Shubha Shubha Karanth | Amba Research Ph +91 80 3980 8031 | Mob +91 94 4886 4510 Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefo Ratino Sent: Monday, October 13, 2008 5:20 PM To: r-help@r-project.org Subject: [R] SAS Data Hello everybody, I would like to read a SAS Data data1.sas7bdat in R! Is this possible? Thank you a lot in advance ;), Stefo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail may contain confidential and/or privileged i...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating GUIs for R
On Sun, Oct 12, 2008 at 4:50 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On 12 October 2008 at 12:53, cls59 wrote: | On a related note... does anyone know good resources for binding a C++ | program to the R library? RCpp, at http://rcpp.r-forge.r-project.org, formerly known as RCppTemplate, is pretty mature and testing having been around since 2004 or 2005. Introductory documentation could be better, feedback welcome. | Basically, I would like to start with just a plain vanilla R session running | inside a Qt widget. Any suggestions? Isn't RKWard a Qt-based GUI for R? They probably have some reusable console code in there. Deepayan once did just that in a test application. I am not sure if that was ever made public. Cheers, Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave from Kile
Hello Does anybody have experience with Sweave run from Kile? I'm trying to make it run but have problems and don't know if the instructions are false or I do something wrong (my knowledge in bash and shell is too low to understand it)... I discovered recently Sweave and wanted to run it from my latex editor, Kile. I found and followed these instructions: If you want to be able to call Sweave outside of R, you will need to install a shell script (see footnote 4). To install the script, copy it to /usr/local/bin, then open the Konsole program and type sudo chmod +x /usr/local/bin/Sweave.sh to make it executable. Next, you may want to tell Kile where to find the Sweave.sh shell script. Open Kile and click Settings → Configure Kile. Click the Tools tab on the left-hand side of the preferences window, and select Build. Click the New Tool button at the bottom of the preferences window. Name the new tool Sweave, click next, and then Finish. In the resulting screen, type Sweave.sh in the top box, and −ld \%source' in the bottom box. I followed these instructions but have 2 problems: 1: finished with exit status 126 SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * /bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée in english: permission not given Do you see where the problem is? 2: If I run kile with sudo (sudo Kile), the problem disappears but a new one comes SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * Run Sweave and postprocess with LaTeX directly from command line −ld is not a supported file type! It should be one of: .lyx, .Rnw, .Snw., .nw or .tex Is the instructions false? Or do I do something wrong? Thank you much for your help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave from Kile
I use Sweave on a Mac from my LaTeX editor and it requires a similar shell script to typeset the document. Mine uses R it's self (instead of sh) and looks like: #!/usr/bin/Rscript args - commandArgs(T) fname - strsplit(args[1],'\\.')[[1]][1] Sweave(paste(fname,'Rnw',sep='.')) system(paste('pdflatex',paste(fname,'tex',sep='.'))) This script is run as so: sweave /path/to/source.Rnw It works pretty good, although I think I used: chmod 755 sweave To make sure it would execute. You can also check ls -l to make sure you have ownership of the file, if not you might need to hit it with chown. -Charlie Matthieu Stigler-2 wrote: Hello Does anybody have experience with Sweave run from Kile? I'm trying to make it run but have problems and don't know if the instructions are false or I do something wrong (my knowledge in bash and shell is too low to understand it)... I discovered recently Sweave and wanted to run it from my latex editor, Kile. I found and followed these instructions: If you want to be able to call Sweave outside of R, you will need to install a shell script (see footnote 4). To install the script, copy it to /usr/local/bin, then open the Konsole program and type sudo chmod +x /usr/local/bin/Sweave.sh to make it executable. Next, you may want to tell Kile where to find the Sweave.sh shell script. Open Kile and click Settings → Configure Kile. Click the Tools tab on the left-hand side of the preferences window, and select Build. Click the New Tool button at the bottom of the preferences window. Name the new tool Sweave, click next, and then Finish. In the resulting screen, type Sweave.sh in the top box, and −ld \%source' in the bottom box. I followed these instructions but have 2 problems: 1: finished with exit status 126 SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * /bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée in english: permission not given Do you see where the problem is? 2: If I run kile with sudo (sudo Kile), the problem disappears but a new one comes SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * Run Sweave and postprocess with LaTeX directly from command line −ld is not a supported file type! It should be one of: .lyx, .Rnw, .Snw., .nw or .tex Is the instructions false? Or do I do something wrong? Thank you much for your help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://www.nabble.com/Sweave-from-Kile-tp19955007p19955763.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gower distance between a individual and a population
If you used daisy, is there a problem with converting the resulting object to a full dissimilarity matrix and extracting the relevant row/column you need for the target site? Well, the lost of efficiantcy is huge. I need to compute the distance several time on data base that count 1000 or even 10 000 subjects. 10 000^2 cost a lot in term of time, whereas 10 000 does not. A solution would be to re-write DAISY and adapt it. But since I do not know fortran, I prefers first to ask if someone already did it... Christophe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bivariate non-parametric smoothing
The functions `gam' in packages `mgcv' or `gam', will do this, as will the ssanova* functions in `gss' (there are numerous alternatives) On Friday 10 October 2008 23:32, Verschuere Benjamin wrote: Hi, I was wondering if there is a function in R which performs bivariate non parametric smoothing which allows for the possibility of including some weights in the smoothing (for each data points in my grid I have some predefined weights that I would like to include in the smoothing). Thanks, Ben _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Hi Maithili, There are two good papers that illustrate how to compare classifiers using Sensitivity and Specificity and their extensions (e.g., likelihood ratios, young index, KL distance, etc). See: 1) Biggerstaff, Brad, 2000, Comparing diagnostic tests: a simple graphic using likelihood ratios, Statistics in Medicine, 19:649-663. 2) Lee, Wen-Chung, 1999, Selecting diagnostic tests for ruling out or ruling in disease: the use of the Kllback-Leibler distance, International Epidemiological Association, 28:521-525. Please let me know if have problems finding the aforementioned papers. Kind Regards, Pedro -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Maithili Shiva Sent: Monday, October 13, 2008 3:28 AM To: r-help@r-project.org Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC) Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). My prolem is how to interpret these results? What I have arrived at are the absolute figures. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Pedro.Rodriguez at sungard.com writes: There are two good papers that illustrate how to compare classifiers using Sensitivity and Specificity and their extensions (e.g., likelihood ratios, young index, KL distance, etc). See: 1) Biggerstaff, Brad, 2000, Comparing diagnostic tests: a simple graphic using likelihood ratios, Statistics in Medicine, 19:649-663. 2) Lee, Wen-Chung, 1999, Selecting diagnostic tests for ruling out or ruling in disease: the use of the Kllback-Leibler distance, International Epidemiological Association, 28:521-525. Both papers refer to medical applications, and even the most basic books on medical statistics explain the concepts in the context of incidence and prevalance of a disease. Interpreting sensitivity and specificity is much more a problem of the context than one of R and statistics: note that her application was in econometrics. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Timestamps and manipulations
Try this: x$Timestamp - as.POSIXct(strptime(as.character(x$Timestamp), %d/%m/%y %H:%M)) x$time - as.numeric(x$Timestamp as.POSIXct(strptime(07-08-2008 00:00, %d-%m-%Y %H:%M))) with(x, tapply(Timestamp, user_id, function(x)diff(range(x), units=secs), simplify = F)) On Mon, Oct 13, 2008 at 7:57 AM, Michael Pearmain [EMAIL PROTECTED]wrote: Hi All, I've a couple of questions i've been struggling with using the time features, can anyone help? sample data Timestampuser_id 27/05/08 22:57 763830873067 27/05/08 23:00 763830873067 27/05/08 23:01 763830873067 27/05/08 23:01 763830873067 05/06/08 11:34 763830873067 29/05/08 23:08 765253440317 29/05/08 23:06 765253440317 29/05/08 22:52 765253440317 29/05/08 22:52 765253440317 29/05/08 23:04 765253440317 27/06/08 19:34 765253440317 09/07/08 15:45 765329002557 06/07/08 19:24 765329002557 09/07/08 15:46 765329002557 07/07/08 13:05 765329002557 16/05/08 22:40 765329002557 08/06/08 11:24 765329002557 08/06/08 12:33 765329002557 My first question is how can i create a new var creating a filter based on a date? I've tried as.POSIXct.strptime below as well but to no avail.. can anyone give any advice? Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y %H:%M)) Mcookie$time - ifelse(Mcookie$timestamp strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0) My second questions refers to finding the time difference in seconds between the first time a user sees something Vs the last.. and engagment time essentially, i see there is the difftime function, is there a more elegant way of working this out then my thoughts (Pysdo code below) sort data by user_id and Timestamp take the head of user_id as new_time_var take the tail of user_id as new_time_var2 use difftime(new_time_var, new_time_var2, units=secs) Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in plot.gam?
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in gam plot?
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulations using differential equations
Dear R-users, I am trying to perform some simulations from a model defined by ordinary differential equations. I would be grateful if someone could indicate some functions/packages/examples I could look at. Thank you in advance. Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] na.pass
Hi All, I have a data frame which has columns comprised mainly of NAs. I know there are functions na.pass and na.omit etc which can be used in these situations however I can't them to work in this case. I have a function which returns the data according to some rule i.e. removal of N in this code: nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } However, I really don't want to replace the NAs with O. I'd just like to gloss over them. I can't just delete them because the structure of the data frame needs to be maintained. Can anyone suggest how I can write in a line or two to ignore the NAs instead of replacing them? I've tried this code but it doesn't work! nep - function(data) { dummy - rep(0,378) for(i in 1:378){ na.pass(data$with.Wcode[i]) if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } Thank you, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulations using differential equations
See http://cran.r-project.org/web/packages/odesolve/index.html 2008/10/13 [EMAIL PROTECTED] Dear R-users, I am trying to perform some simulations from a model defined by ordinary differential equations. I would be grateful if someone could indicate some functions/packages/examples I could look at. Thank you in advance. Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulations using differential equations
and http://cran.at.r-project.org/web/packages/deSolve/ 2008/10/13 megha patnaik [EMAIL PROTECTED] See http://cran.r-project.org/web/packages/odesolve/index.html 2008/10/13 [EMAIL PROTECTED] Dear R-users, I am trying to perform some simulations from a model defined by ordinary differential equations. I would be grateful if someone could indicate some functions/packages/examples I could look at. Thank you in advance. Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation structure in gls or lme/lmer with several observations per day
Hi, To simplify, suppose I have 2 observations each day for three days. I would like to define the correlation structure of these 6 observations as follows: the correlation of 2 observations on the same day is, say, alpha, the correlation for 2 observations one day apart is rho and the correlation for 2 observations 2 days apart is rho^2. I.e. I would like to have an AR1 correlation + a correlation for the same day. I tried with gls and lme from the nlme package, but with no success. One difficulty arises since corAR1 seems to require only one observation per day (see example below). Any idea on how to implement it, either with special correlation structures, or through random effects in lme/lmer ? should I try to define a new correlation structure corMultiAR1 ? If so, where can I find help on how to write such a piece of code ( nlme:::corAR1 is not clear to me) ? Or is there a way to define a general parametrised covariance matrix in gls ? Olivier obs6 - matrix( c(1,2,3,4,5,6, 1,1,2,2,3,3), byrow=F, nc=2) dimnames(obs6) - list(NULL, c(y,time)) obs6 - data.frame(obs6) obs6 y time 1 11 2 21 3 32 4 42 5 53 6 63 gls (y~1, correl=corAR1(0.0,~time), data=obs6) Error in Initialize.corAR1(X[[1L]], ...) : Covariate must have unique values within groups for corAR1 objects -- Olivier Renaud http://www.unige.ch/~renaud/ Methodology Data Analysis - Psychology Dept - University of Geneva UniMail, Office 4142 - 40, Bd du Pont d'Arve - CH-1211 Geneva 4 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting dataframe by rownames to be excluded
Prof Brian Ripley ripley at stats.ox.ac.uk writes: Yes: DF[is.na(match(row.names(DF), exclude_me)), ] Assuming everything is possible in R: would it be possible to make the below work without breaking existing code? a - data.frame(x=1:10) rownames(a) = letters[1:10] exclude = c(a,c) a[is.na(match(row.names(a), exclude)), ] # not really that easy to remember a[-c(1,3),] # In analogy a[-c(exclude),] #invalid argument to unary operator Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Timestamps and manipulations
Hi All, I've a couple of questions i've been struggling with using the time features, can anyone help? sample data Timestampuser_id 27/05/08 22:57 763830873067 27/05/08 23:00 763830873067 27/05/08 23:01 763830873067 27/05/08 23:01 763830873067 05/06/08 11:34 763830873067 29/05/08 23:08 765253440317 29/05/08 23:06 765253440317 29/05/08 22:52 765253440317 29/05/08 22:52 765253440317 29/05/08 23:04 765253440317 27/06/08 19:34 765253440317 09/07/08 15:45 765329002557 06/07/08 19:24 765329002557 09/07/08 15:46 765329002557 07/07/08 13:05 765329002557 16/05/08 22:40 765329002557 08/06/08 11:24 765329002557 08/06/08 12:33 765329002557 My first question is how can i create a new var creating a filter based on a date? I've tried as.POSIXct.strptime below as well but to no avail.. can anyone give any advice? Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y %H:%M)) Mcookie$time - ifelse(Mcookie$timestamp strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0) My second questions refers to finding the time difference in seconds between the first time a user sees something Vs the last.. and engagment time essentially, i see there is the difftime function, is there a more elegant way of working this out then my thoughts (Pysdo code below) sort data by user_id and Timestamp take the head of user_id as new_time_var take the tail of user_id as new_time_var2 use difftime(new_time_var, new_time_var2, units=secs) Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ldBands (Hmisc)
All, I'm getting the same error message as that discussed in a previous post (Feb 3, 2006). The reply to that post was to insure that the ld98 program was in the system path (as also suggested in the help on ldBands). I have done this but this does not change the result. Any advice much appreciated. David sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-3 loaded via a namespace (and not attached): [1] cluster_1.11.11 grid_2.7.2 lattice_0.17-13 tools_2.7.2 ## run example from help on ldBands b - ldBands(5, pr=FALSE) Error in (head + 1):length(w) : NA/NaN argument ## my Path variable is specified as follows on Windows XP, with the ld98 sitting in the WinLD directory: C:\Program Files\MiKTeX 2.6\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem ;C:\Program Files\WinLD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.pass
I have a data frame. It has lots of patient information, their age, their gender, etc etc. I need to keep all this information whilst selecting relevant rows. So, in the example of code I provided I want to remove all those patients who have entry N in the column with.Wcode. The dimension of the data is 378 i.e. 378 patients and currently I am replacing any entries in column with.Wcode with the letter O as this is another level of the same column. Does that make more sense? nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } How can I therefore not replace NA with level O but instead, ignore the NAs and effectively gloss over them? Thank you, Laura On Mon, Oct 13, 2008 at 12:42 PM, jim holtman [EMAIL PROTECTED] wrote: Not sure exactly what you are trying to do since you did not provide commented, minimal, self-contained, reproducible code. Let me take a guess in that you also have to test for NAs: x - sample(c(N, A, B, NA), 20, TRUE) x [1] A A B NA N NA NA B B N N N B A NA A B NA A NA x != N [1] TRUE TRUE TRUENA FALSENANA TRUE TRUE FALSE FALSE FALSE TRUE TRUENA TRUE TRUENA [19] TRUENA x[x != N] [1] A A B NA NA NA B B B A NA A B NA A NA (!is.na(x)) (x != N) [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE [19] TRUE FALSE x[(!is.na(x)) (x != N)] [1] A A B B B B A A B A On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett [EMAIL PROTECTED] wrote: Hi All, I have a data frame which has columns comprised mainly of NAs. I know there are functions na.pass and na.omit etc which can be used in these situations however I can't them to work in this case. I have a function which returns the data according to some rule i.e. removal of N in this code: nep - function(data) { dummy - rep(0,378) for(i in 1:378){ if(is.na(data$with.Wcode)[i]) data$with.Wcode[i] - O } for(i in 1:378){ if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } However, I really don't want to replace the NAs with O. I'd just like to gloss over them. I can't just delete them because the structure of the data frame needs to be maintained. Can anyone suggest how I can write in a line or two to ignore the NAs instead of replacing them? I've tried this code but it doesn't work! nep - function(data) { dummy - rep(0,378) for(i in 1:378){ na.pass(data$with.Wcode[i]) if(data$with.Wcode[i]==N) dummy[i] - i } return(data[-dummy,]) } Thank you, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Blowing up portions of a graph
There is the zoomplot function in the TeachingDemos package that allows you to zoom in/out the current plot. But it is a bit of a kludge. The better option is probably to just set the xlim and ylim arguments in a new plot command. You can use the locator function as one way to find the coordinates to pass to xlim and ylim. For adding the zoomed areas, you can use the layout function to set up the device with one big area on top and multiple smaller areas below to place the zooms in, or you can use the subplot function from the TeachingDemos package to add the zooms to uninteresting/empty areas of the current plot. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of rajesh j Sent: Monday, October 13, 2008 12:35 AM To: r-help@r-project.org Subject: [R] Blowing up portions of a graph Hi, I have a really large graph and would like to zoom in on portions of the graph and post them as blocks below the graph.Is there an add on package to do this? -- Rajesh.J I skate to where the puck is going to be, not where it has been. - Wayne Gretzky - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)
Dieter Menne wrote: Maithili Shiva maithili_shiva at yahoo.com writes: I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). The simple and wrong answer is to use these data directly to compute sensitivity (fraction of hits). This measure is useless, but I encounter it often in medical publications. Exactly. Using classification accuracy, sensitivity, specificity means that you are not using the model's predicted probabilities in a reasonable or powerful way. Credit scoring models need to demonstrate absolute calibration accuracy. Frank You can get a more reasonable answer by using cross-validation. Check, for example, Frank Harrell's http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulations using differential equations
and see page 8 of: http://www.r-project.org/doc/Rnews/Rnews_2003-3.pdf R as a Simulation Platform in Ecological Modelling Thomas Petzoldt Conclusions The examples described so far, plus the experience with R as data analysis environment for measure- ment and simulation data, allows to conclude that R is not only a great tool for statistical analysis and data processing but also a general-purpose simula- tion environment, both in research and especially in teaching. -- David Winsemius Heritage Labs On Oct 13, 2008, at 11:37 AM, megha patnaik wrote: and http://cran.at.r-project.org/web/packages/deSolve/ 2008/10/13 megha patnaik [EMAIL PROTECTED] See http://cran.r-project.org/web/packages/odesolve/index.html 2008/10/13 [EMAIL PROTECTED] Dear R-users, I am trying to perform some simulations from a model defined by ordinary differential equations. I would be grateful if someone could indicate some functions/packages/examples I could look at. Thank you in advance. Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva I can't understand why you are interested in probabilities that are in backwards time order. Frank Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). My prolem is how to interpret these results? What I have arrived at are the absolute figures. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using a discrete function in nls()
I am trying to fit a discrete function to my dataset using nls(). fit-nls(T2~form(SOA,t1weight,t2weight,d1weight), start=list(t1weight=1,t2weight=1,d1weight=1), data=data1, trace=TRUE) The problem is that my function (form) includes a discrete function and in that function I used the variable SOA to define the discrete function (see below). form-function(SOA,t1weight,t2weight,d1weight){ decay_functionT1_1 - 0 decay_functionT1_2 - rep(t1weight,ttime) decay_functionT1_3 - t1weight*exp(-x/q) decay_functionT1_3[decay_functionT1_3threshold]- 0 T1 - c(decay_functionT1_1, decay_functionT1_2, decay_functionT1_3) decay_functionT2_1 - rep(0,SOA) decay_functionT2_2 - rep (1,ttime) decay_functionT2_3 - decay_t2(x1) decay_functionT2_3[decay_functionT2_3threshold]- 0 T2 - c(decay_functionT2_1, decay_functionT2_2, decay_functionT2_3) When I call nls() with my function a get an error message: Error in rep(0, SOA) : invalid 'times' argument That is propably due to the way nls() calls my function with the variable SOA. Can you help me to fix that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear expenditure model
On Monday 13 October 2008 09:14:23, Marie Vandresse wrote: I have already used aidsEst() function in the micEcon package for the estimation of elasticities. But with the LES, I plan to estimate the minimal consumption level...which is not possible with the AIDS model, do I? You are right; it is not possible to estimate the minimum consumption level with the Almost Ideal Demand System. Unfortunately, I don't see an easy way to estimate the LES in R using system estimation techniques. BTW: I am not convinced that the estimated parameters of the LES that are the minimum consumption levels in theory are good estimates for the actual minimum consumption levels in real life. Arne Arne Henningsen wrote: Hi Marie! On Friday 10 October 2008 12:40:23, Marie Vandresse wrote: I would like to estimate a linear expendire with Systemfit package. (method: SUR) If I remember correctly, the linear expenditure system (LES) is linear in income but non-linear in the parameters. Hence, you have to estimate a system of non-linear equations. Unfortunately, the nlsystemfit() function in the systemfit package that estimates systems of non-linear equations is still under development and has convergence problems rather often. Since the systemfit() function in the systemfit package that estimates systems of linear equations is very reliable [1], I suggest that you choose a demand system that is linear in parameters (e.g. the Almost Ideal Demand System, AIDS) [1] http://www.jstatsoft.org/v23/i04 As someone could show me how to define the equations? If you use the aidsEst() function in the micEcon package [2], you don't have to specify the equations of the Almost Ideal Demand System yourself. [2] http://www.micEcon.org Best wishes, Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gower distance between a individual and a population
On Mon, 2008-10-13 at 16:28 +0200, [EMAIL PROTECTED] wrote: If you used daisy, is there a problem with converting the resulting object to a full dissimilarity matrix and extracting the relevant row/column you need for the target site? Well, the lost of efficiantcy is huge. I need to compute the distance several time on data base that count 1000 or even 10 000 subjects. 10 000^2 cost a lot in term of time, whereas 10 000 does not. A solution would be to re-write DAISY and adapt it. But since I do not know fortran, I prefers first to ask if someone already did it... Sorry, I didn't intend to suggest that what was there was good enough for your purposes. I appreciate the loss of efficiency, and simply wondered if it would work for your purposes, given that the solution in analogue::distance is coded in R. I am progressing, slowly to convert analogue::distance to C, which should run faster, but other areas of the package have taken priority over that just now. analogue::distance may work for you in your case. I'd be interested in finding out (off-list) how you get on with it if you do use it. All the best, G Christophe Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting dataframe by rownames to be excluded
On Mon, 13 Oct 2008, Dieter Menne wrote: Prof Brian Ripley ripley at stats.ox.ac.uk writes: Yes: DF[is.na(match(row.names(DF), exclude_me)), ] Assuming everything is possible in R: would it be possible to make the below work without breaking existing code? It would be possible, but not I think desirable. c(exclude) is fine (works now, does nothing useful except strip attributes). But -char vector will give an error: that's not necessarily the end, as `[` is a SPECIALSXP and so is passed unevaluated arguments. However, its first step is method dispatch and that evaluates all the arguments, so a substantial internal rewrite would be needed. It would be fairly easy to make subset(a, subset=-exclude) work, and select=-col_name already works. I think though that messing with `[` would be too dangerous, and would also lead to expectations that all its methods should accept this notation (and hence many would need to be re-written, including [.data.frame as used here). And then people would expect this to work on RHS, so [- would need to be re-written a - data.frame(x=1:10) rownames(a) = letters[1:10] exclude = c(a,c) a[is.na(match(row.names(a), exclude)), ] # not really that easy to remember a[-c(1,3),] # In analogy a[-c(exclude),] #invalid argument to unary operator Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoid overlap of labels in a scatterplot
Felix Andrews ha scritto: thigmophobe.labels in the plotrix package tries to avoid label crashes, and Thank you, but I choosed There is also pointLabel() in the maptools package. This work fine. Thank you very much. Nicola __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves. I await the flames that will surely come my way John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva I can't understand why you are interested in probabilities that are in backwards time order. Frank Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). My prolem is how to interpret these results? What I have arrived at are the absolute figures. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to evaluate a cubic Bezier curve (B-spline?) given the four control points
You could look at the xspline function. It approximates b-splines or Bezier curves given control points and shape parameters. It can either plot the spline or return a bunch of point on the curve for comparison to other values. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of Zack Weinberg Sent: Friday, October 10, 2008 4:11 PM To: r-help@r-project.org Subject: [R] how to evaluate a cubic Bezier curve (B-spline?) given the four control points I'm trying to use R to determine the quality of a cubic Bezier curve approximation of an elliptical arc. I know the four control points and I want to compute (x,y) coordinates of many points on the curve. I can't find anything in either the base distribution or CRAN that does this; all the spline-related packages seem to be about *fitting* piecewise Bezier curves to a data set. Presumably, internally they have the capability I need, but it doesn't seem to be exposed in a straightforward way. Help? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split data, but ensure each level of the factor is represented
Hello, I'll use part of the iris dataset for an example of what I want to do. data(iris) iris-iris[1:10,1:4] iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) a [1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine split(iris,a) $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) a [1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns mysplit(iris,a) $`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data, but ensure each level of the factor is represented
Try this: a-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3) split(iris, a) lapply(split(iris, a), dim) On Mon, Oct 13, 2008 at 2:06 PM, Jay [EMAIL PROTECTED] wrote: Hello, I'll use part of the iris dataset for an example of what I want to do. data(iris) iris-iris[1:10,1:4] iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) a [1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine split(iris,a) $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) a [1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns mysplit(iris,a) $`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating GUIs for R
On 10/13/08, Michael Lawrence [EMAIL PROTECTED] wrote: On Sun, Oct 12, 2008 at 4:50 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On 12 October 2008 at 12:53, cls59 wrote: | On a related note... does anyone know good resources for binding a C++ | program to the R library? RCpp, at http://rcpp.r-forge.r-project.org, formerly known as RCppTemplate, is pretty mature and testing having been around since 2004 or 2005. Introductory documentation could be better, feedback welcome. | Basically, I would like to start with just a plain vanilla R session running | inside a Qt widget. Any suggestions? Isn't RKWard a Qt-based GUI for R? They probably have some reusable console code in there. Yes. It seems somewhat intergrated with KDE, so not easily ported. Deepayan once did just that in a test application. I am not sure if that was ever made public. There's a webpage at http://dsarkar.fhcrc.org/R/R-Qt.html See the last section. It's not very active, but should be an adequate proof of concept. This takes the approach of embedding R and creating a GUI using the GUI callbacks described in R-exts; this works in Linux and Mac, but not in Windows, because these callbacks are not supported by R on Windows. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data, but ensure each level of the factor is represented
Thanks so much. On Oct 13, 1:14 pm, Henrique Dallazuanna [EMAIL PROTECTED] wrote: Try this: a-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3) split(iris, a) lapply(split(iris, a), dim) On Mon, Oct 13, 2008 at 2:06 PM, Jay [EMAIL PROTECTED] wrote: Hello, I'll use part of the iris dataset for an example of what I want to do. data(iris) iris-iris[1:10,1:4] iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) a [1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine split(iris,a) $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) a [1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns mysplit(iris,a) $`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data, but ensure each level of the factor is represented
Try this: split(iris, factor(a, levels = 1:3)) On Mon, Oct 13, 2008 at 1:06 PM, Jay [EMAIL PROTECTED] wrote: Hello, I'll use part of the iris dataset for an example of what I want to do. data(iris) iris-iris[1:10,1:4] iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) a [1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine split(iris,a) $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) a [1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns mysplit(iris,a) $`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gamm() and predict()
Dear All, I have a query relating to the use of the ‘predict’ and ‘gamm’ functions. I am dealing with large (approx. 5000) sets of presence/absence data, which I am trying to model as a function of different of environmental covariates. Ideally my models should include individual and colony as random factors. I have been trying to fit binomial models using the gamm function to achieve this. For the sake of simplicity I have adapted a some of the example code from ?gamm to illustrate some problems I have been having predicting values using this approach. ### Begin example ### library(mgcv) ## Generate some example data set.seed(0) n - 400 sig - 2 x0 - runif(n, 0, 1) x1 - runif(n, 0, 1) x2 - runif(n, 0, 1) x3 - runif(n, 0, 1) f - 2 * sin(pi * x0) f - f + exp(2 * x1) - 3.75887 f - f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396 e - rnorm(n, 0, sig) y - f + e ## Change the response to binary y -round(y/max(y)) ## Add a factor to the linear predictor, to be modelled as random fac - rep(1:4,n/4) f - f + fac*3 fac-as.factor(fac) ## Fit an additive model mod-gamm(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,random=list(fac=~1)) ## Generate some new example data new.dat = data.frame(x0 = runif(n, 0, 1), x1 = runif(n, 0, 1), x2 = runif(n, 0, 1),x3 = runif(n, 0, 1), fac = fac) ## Predict response values using the original data and the gam part of the model… predict(mod$gam, type = response) ## Predict response values using the new data and the gam part of the model… predict(mod$gam, type = response, new.dat) ## Predict response values using the original data and the glmm part of the model… predict(mod$lme, level = 0, type = response) ## Predict response values using the new data and the gam part of the model… predict(mod$lme, level = 0, type = response, new.dat) # This produces the error message ‘Error in eval(expr, envir, enclos) : object fixed not found’ ## End example ### My questions are as follows: 1. I presume predict(mod.$gam) produces population level predictions. Is this correct? 2. Is it possible to extract standard errors using predict(mod.$gam) or is there a more suitable approach to estimating confidence in prections made with gamms? 3. It seems that predict(mod.$lme) results in predictions at the level of random factors. Furthermore, these appear to be on the scale of the linear predictor regardless of how level is specified (see ?glmmPQL). Is this correct? 4. The code predict(mod$lme, new.dat) produces an error message, seemingly indicating the the fixed effects are missing from my new data frame (see example). Am I doing something wrong here? 5. Is it possible to predict both population and random factor level predictions using new data with gamm objects? I have read all the relevant help files including those associated with glmmPQL and also Simon Woods book and I am still a bit confused so any help would be gratefully received. I am using R 2.6.1 with Windows XP pro, mgcv version 1.3-29. Thanks, Ewan Wakefield British Antarctic Survey High Cross Madingley Road Cambridge UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graphs in R
How Graphs in R with leveling of point can be done? Please help. -- View this message in context: http://www.nabble.com/graphs-in-R-tp19955281p19955281.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Misura precauzionale: Cambia il tuo codice di accesso!
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem installing tseries under FC7 x86_64
Dear Michal, I had the same problem of you in installing quadprog packages ? Did you resolve it? Can you help me? the error is: * Installing *source* package 'quadprog' ... ** libs gfortran -fpic -g -O2 -c aind.f -o aind.o In order to use gfortran please type either: source /usr/local/free/gfortran.csh . /usr/local/free/gfortran.sh make: *** [aind.o] Error 1 ERROR: compilation failed for package 'quadprog' ** Removing '/usr/people/russo/R/i686-pc-linux-gnu-library/2.5/quadprog' Warning message: installation of package 'quadprog_1.4-11.tar.gz' had non-zero exit status in: install.packages(quadprog_1.4-11.tar.gz, repos = NULL, type = /usr/local/free/gfortran.csh) Thank you in advances! Simone __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Running R at a specific time - alternative to Sys.sleep() ?
Dear R-Help, Is it possible to set R up to run a particular script at specific times of the day? trivial example: If the time is now 8:59:55am and I wish to run a function at 9am, I do the following: my.function - function(x) { p1 - proc.time() Sys.sleep(x) print('Hello R-Help!') proc.time() - p1 } my.function (5) [1] Hello R-Help! user system elapsed 0 0 5 What I would rather do is just put in the time at which I wish R to execute at. Hope that made sense, and thanks for any help in advance! Tony Breyal ### Windows Vista sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] RCurl_0.9-4 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset based on items in a list.
R-help: I have a variable (ID_list) containing about 1800 unique numbers, and a 143066x29 data frame. One of the columns (ID) in my data frame contains a list of ids, many of which appear more than once. I'd like to find the subset of my data frame for which ID matches one of the numbers in ID_list. I'm pretty sure I could write a function to do this--something like: dataSubset-function(df, id_list){ tmp = data.frame() for(i in id_list){ for(j in 1:dim(df)[1]){ if(i==df$ID[j]){ tmp-data.frame(df[j,]) } } } tmp } but this seems inefficient. As I understand it, the subset function won't really solve my problem, but it seems like there must be something out there that will that I must be forgetting. Does anyone know of a way to solve this problem in an efficient way? Thanks! Kyle H. Ambert Graduate Student, Department of Medical Informatics Clinical Epidemiology Oregon Health Science University [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package: StatMatch 0.4
Dear useRs, I am pleased to announce the availability of the new package 'StatMatch' (version 0.4) http://cran.at.r-project.org/web/packages/StatMatch/index.html 'StatMatch' contains some functions to perform Statistical Matching. Statistical Matching methods aim at integrate two samples, referred to the same target population, sharing a certain number of common variables but without overlapping of the units. Note that some functions in 'StatMatch' can also be used to impute missing values in a data set. Best Regards, Marcello D'Orazio -- Marcello D'Orazio ISTAT (Italian National Statistical Institute) Via Cesare Balbo, 16 (1° piano, stanza 153) 00184 ROMA ITALY Tel.: +39 06 4673 2772 Fax: +39 06 4673 2955 Legal Disclaimer: Any views expressed by the sender of this message are not necessarily those of the Italian National Statistical Institute. ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optim and nlm error to estimate a matrix
Dear R users, I'm trying to estimate the matrix of regression parameters. I need to do it numerically, so I used optim and nls. I got the initial parameter estimates from least squares, and input them into those functions. But when I run the optim function, it stops in 30 seconds and shows 'convergence=1'. And if I use the nlm function, then it runs for a while, and finally stops with the code=4. Both of these error codes mean iteration limit exceeded. Since the maxit for optim is 500 for Nelder-Mead by default, I increased the maxit to 1000, but it still gives me the same error code. Can anyone tell me how I can fix the problem? I defined an objective function in the following way : ## Objective function to be minimized obj=function(beta.v){ obj1=rep(0,n) ; beta.m=matrix(beta.v,p,sdf) for (i in 1:n){ yi=Y[i,];xi=X[i,] obj1[i]=rho((1/sigma)*sqrt((yi-xi%*%beta.m)%*%solve(t(H)%*%H)%*%t(yi-xi%*%beta.m))) } sum(obj1) } I tried to find a minimizer from functions below : result1=optim(c(ini.beta),obj) result2=nlm(obj,c(ini.beta)) where ini.beta is the initial parameters obtained from least squares estimation. The most weird thing to me is that result1 gives exactly the same values as the initial values, while result2 gives a little different values from the initial values, and it gives a smaller value of obj, which means nlm moved the initial value a little toward the true minimizer. At first, I thought I put too good initial values, and therefore the algorithm didn't need to move too much, but even if I just put a matrix of 1's, it still stops with the same error codes. *** This is my first time to post a question. I apologize if I didn't explain enough. I would be very happy to listen to anyone's suggestions. Thanks for your time!! Ji Young __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in plots of gam (package:gam)
(Sorry for the duplicate posting, first posting did not contain text in the body of the message.) Greetings, I am attempting to plot the model fits of a generalized additive model using the gam package (gam version 1.0, R v. 2.6.2). The gam object is created without any apparent problem, but when I try to plot (plot(gam_object)), I repeatedly receive the following error: Error in dim(data) - dim : attempt to set an attribute on NULL. Diagnostics are pasted below. Cheers, Chris Taylor Traceback() 14: array(np, 1) 13: predict.gam(object, type = lpmatrix, ...) 12: model.matrix.gam(object) 11: model.matrix(object) 10: predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type == link, response, type), terms = terms, na.action = na.action) 9: predict.glm(object, type = terms, terms = terms, se.fit = TRUE) 8: NextMethod(predict) 7: switch(type, response = { out - predict.gam(object, type = link, se.fit = TRUE, ...) famob - family(object) out$se.fit - drop(out$se.fit * abs(famob$mu.eta(out$fit))) out$fit - fitted(object) out }, link = { out - NextMethod(predict) out$fit - object$additive.predictors TS - out$residual.scale^2 TT - ncol(object$var) out$se.fit - sqrt(out$se.fit^2 + TS * object$var %*% rep(1, TT)) out }, terms = { out - NextMethod(predict) TT - dimnames(s - object$smooth)[[2]] out$fit[, TT] - out$fit[, TT] + s TS - out$residual.scale^2 out$se.fit[, TT] - sqrt(out$se.fit[, TT]^2 + TS * object$var) out }) 6: predict.gam(object, type = terms, terms = terms, se.fit = TRUE) 5: predict(object, type = terms, terms = terms, se.fit = TRUE) 4: preplot.gam(x, terms = terms) 3: plot.gam(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on ABC, 100m resolution: Cr4Rot1) 2: plot(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on ABC, 100m resolution: Cr4Rot1) 1: plot(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on ABC, 100m resolution: Cr4Rot1) sessionInfo() R version 2.6.2 (2008-02-08) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] gstat_0.9-47 rgdal_0.5-25 sp_0.9-25gam_1.0 akima_0.5-1 loaded via a namespace (and not attached): [1] grid_2.6.2 lattice_0.17-10 mgcv_1.4-1 tools_2.6.2 -- J. Christopher Taylor, Ph.D. Applied Ecology and Restoration Research National Ocean Service / NOAA National Centers for Coastal Ocean Science Center for Coastal Fisheries and Habitat Research 101 Pivers Island Road, Beaufort, North Carolina 28516-9722 Ph:(252) 838 0833 Fx:(252) 728 8784 Website: http://www.ccfhr.noaa.gov/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R at a specific time - alternative to Sys.sleep() ?
Use the task scheduler in Windows and have a batch files executed. On Mon, Oct 13, 2008 at 11:44 AM, Tony Breyal [EMAIL PROTECTED] wrote: Dear R-Help, Is it possible to set R up to run a particular script at specific times of the day? trivial example: If the time is now 8:59:55am and I wish to run a function at 9am, I do the following: my.function - function(x) { p1 - proc.time() Sys.sleep(x) print('Hello R-Help!') proc.time() - p1 } my.function (5) [1] Hello R-Help! user system elapsed 0 0 5 What I would rather do is just put in the time at which I wish R to execute at. Hope that made sense, and thanks for any help in advance! Tony Breyal ### Windows Vista sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] RCurl_0.9-4 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphs in R
Do you have an example. I am not sure what you mean. On Mon, Oct 13, 2008 at 9:48 AM, guria [EMAIL PROTECTED] wrote: How Graphs in R with leveling of point can be done? Please help. -- View this message in context: http://www.nabble.com/graphs-in-R-tp19955281p19955281.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset based on items in a list.
I don't know if I understand (small example with R command wouuld help), but, assuming your data.frame is called 'df' subset(df, ID %in% ID_list) Question, is ID_list a list or a vector, and are they really numbers or factors? Kyle. wrote: R-help: I have a variable (ID_list) containing about 1800 unique numbers, and a 143066x29 data frame. One of the columns (ID) in my data frame contains a list of ids, many of which appear more than once. I'd like to find the subset of my data frame for which ID matches one of the numbers in ID_list. I'm pretty sure I could write a function to do this--something like: dataSubset-function(df, id_list){ tmp = data.frame() for(i in id_list){ for(j in 1:dim(df)[1]){ if(i==df$ID[j]){ tmp-data.frame(df[j,]) } } } tmp } but this seems inefficient. As I understand it, the subset function won't really solve my problem, but it seems like there must be something out there that will that I must be forgetting. Does anyone know of a way to solve this problem in an efficient way? Thanks! Kyle H. Ambert Graduate Student, Department of Medical Informatics Clinical Epidemiology Oregon Health Science University [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
John Sorkin wrote: Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves. I await the flames that will surely come my way John John this has been much debated but I fail to see how backwards probabilities are that helpful in judging the usefulness of a test. Why not condition on what we know (the test result and other baseline variables) and quit conditioning on what we are trying to find out (disease status)? The data collected in most studies (other than case-control) allow one to use logistic modeling with the correct time order. Furthermore, sensitivity and specificity are not constants but vary with subjects' characteristics. So they are not even useful as simplifying concepts. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva I can't understand why you are interested in probabilities that are in backwards time order. Frank Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No of correcly classified Bads Bg and also (3) number of wrongly classified bads (Gb) and (4) number of wrongly classified goods (Bg). My prolem is how to interpret these results? What I have arrived at are the absolute figures. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Frank, Perhaps I was not clear in my previous Email message. Sensitivity and specificity do tell us about the quality of a test in that given two tests the one with higher sensitivity will be better at identifying subjects who have a disease in a pool who have a disease, and the more sensitive test will be better at identifying subjects who do not have a disease in a pool of people who do not have a disease. It is true that positive predictive and negative predictive values are of greater utility to a clinician, but as you know these two measures are functions of sensitivity, specificity and disease prevalence. All other things being equal, given two tests one would select the one with greater sensitivity and specificity so in a sense they do measure the quality of a clinical test - but not, as I tried to explain the quality of a statistical model. You are of course correct that sensitivity and specificity are not truly inherent characteristics of a test as their values may change from population-to-population, but paretically speaking, they don't change all that much, certainly not as much as positive and negative predictive values. I guess we will disagree about the utility of sensitivity and specificity as simplifying concepts. Thank you as always for your clear thoughts and stimulating comments. John among those subjects with a disease and the one with greater specificity will be better at indentifying John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM John Sorkin wrote: Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves. I await the flames that will surely come my way John John this has been much debated but I fail to see how backwards probabilities are that helpful in judging the usefulness of a test. Why not condition on what we know (the test result and other baseline variables) and quit conditioning on what we are trying to find out (disease status)? The data collected in most studies (other than case-control) allow one to use logistic modeling with the correct time order. Furthermore, sensitivity and specificity are not constants but vary with subjects' characteristics. So they are not even useful as simplifying concepts. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva I can't understand why you are interested in probabilities that are in backwards time order. Frank Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am working on credit scoring model using logistic regression. I havd main sample of 42500 clentes and based on their status as regards to defaulted / non - defaulted, I have genereted the probability of default. I have a hold out sample of 5000 clients. I have calculated (1) No of correctly classified goods Gg, (2) No
Re: [R] subsetting dataframe by rownames to be excluded
On Oct 13, 2008, at 5:36 AM, Dieter Menne wrote: Prof Brian Ripley ripley at stats.ox.ac.uk writes: Yes: DF[is.na(match(row.names(DF), exclude_me)), ] Assuming everything is possible in R: would it be possible to make the below work without breaking existing code? a - data.frame(x=1:10) rownames(a) = letters[1:10] exclude = c(a,c) a[is.na(match(row.names(a), exclude)), ] # not really that easy to remember a[-c(1,3),] # In analogy a[-c(exclude),] #invalid argument to unary operator Given the negative to your question, I wonder if you would find, as I hope works for me, that it will be easier to remember this (equivalent) form? a[ ! row.names(a) %in% exclude, ] [1] 2 4 5 6 7 8 9 10 ... equivalent because, per the help page, %in% is defined by function(x,table) {match( x, table , nomatch=0) 0} and the nomatch argument converts the NA's properly from a logical perspective. The help page defines a %w/o% function in just such a manner. -- David Winsemius Heritage Labs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot faceting like lattice | variable
I would like to be able to do the xyplot in ggplot below. I read in the archive that Hadley was working on this for the next release, and I can not find the documentation (Aug. 23rd). River.Mile - c(215 ,202, 198, 190, 185, 179, 148, 119, 61) Cu - rnorm(9) Fe - rnorm(9) Mg - rnorm(9) Ti - rnorm(9) Ir - rnorm(9) r - data.frame(River.Mile, Cu, Fe, Mg, Ti, Ir) z - melt.data.frame(r, id.var=River.Mile) #this is what ggplot does qplot(River.Mile, value, facets=(variable~.), data=z) #this is what I would like to do xyplot(value~River.Mile | variable, data=z) Thanks -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
John Sorkin wrote: Frank, Perhaps I was not clear in my previous Email message. Sensitivity and specificity do tell us about the quality of a test in that given two tests the one with higher sensitivity will be better at identifying subjects who have a disease in a pool who have a disease, and the more sensitive test will be better at identifying subjects who do not have a disease in a pool of people who do not have a disease. It is true that positive predictive and negative predictive values are of greater utility to a clinician, but as you know these two measures are functions of sensitivity, specificity and disease prevalence. All other things being equal, given two tests one would select the one with greater sensitivity and specificity so in a sense they do measure the quality of a clinical test - but not, as I tried to explain the quality of a statistical model. That is not very relevant John. It is a function of all those things because those quantities are all deficient. I would select the test that can move the pre-test probability a great deal in one or both directions. You are of course correct that sensitivity and specificity are not truly inherent characteristics of a test as their values may change from population-to-population, but paretically speaking, they don't change all that much, certainly not as much as positive and negative predictive values. They change quite a bit, and mathematically must change if the disease is not all-or-nothing. I guess we will disagree about the utility of sensitivity and specificity as simplifying concepts. Thank you as always for your clear thoughts and stimulating comments. And thanks for yours John. Frank John among those subjects with a disease and the one with greater specificity will be better at indentifying John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM John Sorkin wrote: Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves. I await the flames that will surely come my way John John this has been much debated but I fail to see how backwards probabilities are that helpful in judging the usefulness of a test. Why not condition on what we know (the test result and other baseline variables) and quit conditioning on what we are trying to find out (disease status)? The data collected in most studies (other than case-control) allow one to use logistic modeling with the correct time order. Furthermore, sensitivity and specificity are not constants but vary with subjects' characteristics. So they are not even useful as simplifying concepts. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr Dieter Menne, I sincerely thank you for helping me out with my problem. The thing is taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97% and SPEC = Bb / (Bb + Gb) = 74.38%. Now I have values of SENS and SPEC, which are absolute in nature. My question was how do I interpret these absolue values. How does these values help me to find out wheher my model is good. With regards Ms Maithili Shiva I can't understand why you are interested in probabilities that are in backwards time order. Frank Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC) To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM Hi Hi I am
[R] MiKTEX and texi2dvi
Liviu: Thanks for the links, I'll check them out. On a different note, have you used MiKTEX at all? I have downloaded it but I don't know how to make it work. Sweave and Stangle seem to work fine but when I use texi2dvi it crashes. library(tools) Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.tex Processing code chunks ... 1 : echo term verbatim (label=two) 2 : echo term verbatim (label=reg) 3 : echo term verbatim (label=fig1plot) 4 : term verbatim eps pdf (label=fig1) 5 : term verbatim eps pdf (label=fig2) 6 : term hide (label=foo) 7 : term hide (label=foo2) 8 : echo term verbatim (label=blurfle) 9 : term tex (label=tab1) You can now run LaTeX on 'foo.tex' Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.R texi2dvi(foo.tex,pdf=TRUE) C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra Error in texi2dvi(foo.tex, pdf = TRUE) : running 'texi2dvi' on 'foo.tex' failed Any ideas why texi2dvi is crashing? foo.tex exist on same directory as foo.Rnw but it says that is missing. Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA --- On Sun, 10/12/08, Liviu Andronic [EMAIL PROTECTED] wrote: From: Liviu Andronic [EMAIL PROTECTED] Subject: Re: [R] Sweave-LaTEX question To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Date: Sunday, October 12, 2008, 11:47 PM On Sun, Oct 12, 2008 at 1:39 AM, Felipe Carrillo [EMAIL PROTECTED] wrote: I am working on a publication and I have heard about LaTEX but I haven't actually tried to learn about it until today. I've found a few There are two more packages that might be of interest: RReportGenerator [1] and relax [2]. Liviu [1] http://alnitak.u-strasbg.fr/~wraff/RReportGenerator/index.php [2] http://cran.r-project.org/web/packages/relax/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in the lmer models
Dear Eva; I shouldn't have sent my unhelpful reply to the entire list, since it is now glaringly obvious that I did not carefully read your original question. You are outside my experience, since I have not used lme4, but I wonder if questions about over-dispersion shouldn't be handled by examining grouped residuals? According to the documentation, mer models have a resid method, although the help page it links to appears to be under construction. -- David Winsemius Heritage Laboratories On Oct 13, 2008, at 4:46 AM, Fucikova, Eva wrote: Dear David, Thank you for such a fast answer. Unortunatelly, your suggestion does not work for lmer for some reason. I can probably try to run the model without random effect to find out the overdispersion in the glm. Anyway, thank you very much. Yours sincerely, Eva -Original Message- From: David Winsemius [mailto:[EMAIL PROTECTED] Sent: maandag 13 oktober 2008 3:42 To: Fucikova, Eva Cc: r-help@r-project.org Subject: Re: [R] Overdispersion in the lmer models Have you considered using glm() with family = quasipoisson or family = quasibinomial ? I know from experience that the quasipoisson choice reports an index of dispersion. ?family -- David Winsemius On Oct 12, 2008, at 4:55 AM, Fucikova, Eva wrote: Dear All, I am working with linear mixed-effects models using the lme4 package in R. I created a model using the lmer function including some main effects, a three-way interaction and a random effect. Because I work with a binomial and poisson distribution, I want to know whether there is overdispersion in my data or not. Does anybody know how I can retrieve this information from R? Thank you in advance, Eva Fucikova [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset based on items in a list.
R-help: I have a variable (ID_list) containing about 1800 unique numbers, and a 143066x29 data frame. One of the columns (ID) in my data frame contains a list of ids, many of which appear more than once. I'd like to find the subset of my data frame for which ID matches one of the numbers in ID_list. I'm pretty sure I could write a function to do this--something like: dataSubset-function(df, id_list){ tmp = data.frame() for(i in id_list){ for(j in 1:dim(df)[1]){ if(i==df$ID[j]){ tmp-data.frame(df[j,]) } } } tmp } but this seems inefficient. As I understand it, the subset function won't really solve my problem, but it seems like there must be something out there that will that I must be forgetting. Does anyone know of a way to solve this problem in an efficient way? Thanks! Kyle H. Ambert Graduate Student, Department of Medical Informatics Clinical Epidemiology Oregon Health Science University [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave from Kile
Hi Matthieu, Does anybody have experience with Sweave run from Kile? I'm trying to make it run but have problems and don't know if the instructions are false or I do something wrong (my knowledge in bash and shell is too low to understand it)... ... It would help if you stated that you use mine Sweave.sh i.e. the one from http://cran.r-project.org/contrib/extra/scripts/Sweave.sh. I will assume you do. I will start with the second problem 2: If I run kile with sudo (sudo Kile), the problem disappears but a new one comes SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * Run Sweave and postprocess with LaTeX directly from command line −ld is not a supported file type! It should be one of: .lyx, .Rnw, .Snw., .nw or .tex Is the instructions false? Or do I do something wrong? Is there a single - or double - i.e. --. If I issue the following $ Sweave.sh --ld test.Rnw Run Sweave and postprocess with LaTeX directly from command line --ld is not a supported file type! It should be one of: .lyx, .Rnw, .Snw., .nw or .tex I get the same error. 1: finished with exit status 126 SweaveOnly output: * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave' * Sweave.sh −ld '\example1Leisch.Rnw' * /bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée in english: permission not given It seems that chmod did not behave as you expected. First check file permissions with ls -l /usr/local/bin/Sweave.sh On my computer I get -rwxr-xr-x 1 root root 30K 2008-04-30 11:17 /usr/local/bin/Sweave.sh* Note that x is there three times i.e. anyone can run this script, the user, the group and others. Try with sudo chmod a+x /usr/local/bin/Sweave.sh and check the file permissions. gg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Add notes to sink output
Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
On 14/10/2008, at 9:02 AM, Michael Just wrote: Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? ?cat ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
On 13-Oct-08 20:02:20, Michael Just wrote: Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael Anything on the lines of: sink(test.txt) cat(This text will describe the test\n) cat(\n) summary(x) sink() E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Oct-08 Time: 21:12:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
Dear Michael, You can use cat() as follows: sink(test.txt) cat('Here goes your text','\n','\n') # \n writes white spaces summary(x) sink() See ?cat for more information. HTH, Jorge On Mon, Oct 13, 2008 at 4:02 PM, Michael Just [EMAIL PROTECTED] wrote: Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
Thanks for the swift response. cat it is. Cheers, Michael On Mon, Oct 13, 2008 at 3:14 PM, Jorge Ivan Velez [EMAIL PROTECTED]wrote: Dear Michael, You can use cat() as follows: sink(test.txt) cat('Here goes your text','\n','\n') # \n writes white spaces summary(x) sink() See ?cat for more information. HTH, Jorge On Mon, Oct 13, 2008 at 4:02 PM, Michael Just [EMAIL PROTECTED] wrote: Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stepwise lrm()
Hello, I have the data set of 1 + 49 variables. One of them is binary other are continous. I would like to be able to fit the model with all 49 variables and then run stepwise model selction. I'd appriciate some code snippets... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
- Original Message - From: Frank E Harrell Jr [EMAIL PROTECTED] To: John Sorkin [EMAIL PROTECTED] Cc: r-help@r-project.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, October 13, 2008 2:09 PM Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC) John Sorkin wrote: Frank, Perhaps I was not clear in my previous Email message. Sensitivity and specificity do tell us about the quality of a test in that given two tests the one with higher sensitivity will be better at identifying subjects who have a disease in a pool who have a disease, and the more sensitive test will be better at identifying subjects who do not have a disease in a pool of people who do not have a disease. It is true that positive predictive and negative predictive values are of greater utility to a clinician, but as you know these two measures are functions of sensitivity, specificity and disease prevalence. All other things being equal, given two tests one would select the one with greater sensitivity and specificity so in a sense they do measure the quality of a clinical test - but not, as I tried to explain the quality of a statistical model. That is not very relevant John. It is a function of all those things because those quantities are all deficient. I would select the test that can move the pre-test probability a great deal in one or both directions. Of course, this quantity is known as a likelihood ratio and is a function of sensitivity and specificity. For 2 x 2 data one often speaks of postive likelihood ratio and negative likelihood ratio, but for multi-row contingency table one can define likelihood ratios for a series of cut-off points. This has become a popular approach in evidence-based medicine when diagnostic tests have continuous rather than binary outputs. You are of course correct that sensitivity and specificity are not truly inherent characteristics of a test as their values may change from population-to-population, but paretically speaking, they don't change all that much, certainly not as much as positive and negative predictive values. They change quite a bit, and mathematically must change if the disease is not all-or-nothing. I guess we will disagree about the utility of sensitivity and specificity as simplifying concepts. Thank you as always for your clear thoughts and stimulating comments. And thanks for yours John. Frank John among those subjects with a disease and the one with greater specificity will be better at indentifying John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM John Sorkin wrote: Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves. I await the flames that will surely come my way John John this has been much debated but I fail to see how backwards probabilities are that helpful in judging the usefulness of a test. Why not condition on what we know (the test result and other baseline variables) and quit conditioning on what we are trying to find out (disease status)? The data collected in most studies (other than case-control) allow one to use logistic modeling with the correct time order. Furthermore, sensitivity and specificity are not constants but vary with subjects' characteristics. So they are not even useful as simplifying concepts. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM Maithili Shiva wrote: Dear Mr Peter Dalgaard and Mr
[R] LM intercept
What is the difference when including or not including the intercept when using lm()? x.noint - lm(weight ~ group - 1))# omitting intercept x - lm(weight ~ group)) This has nothing to do with forcing the intercept to 0, correct? Thank you kindly, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM intercept
On 14/10/2008, at 9:42 AM, Michael Just wrote: What is the difference when including or not including the intercept when using lm()? x.noint - lm(weight ~ group - 1))# omitting intercept x - lm(weight ~ group)) This has nothing to do with forcing the intercept to 0, correct? On the contrary. This is *exactly* what it means. cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM intercept
Great, Thanks, Michael On Mon, Oct 13, 2008 at 3:56 PM, Rolf Turner [EMAIL PROTECTED]wrote: On 14/10/2008, at 9:42 AM, Michael Just wrote: What is the difference when including or not including the intercept when using lm()? x.noint - lm(weight ~ group - 1))# omitting intercept x - lm(weight ~ group)) This has nothing to do with forcing the intercept to 0, correct? On the contrary. This is *exactly* what it means. cheers, Rolf Turner ## Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using an image background with graphics
I would like to use a map or aerial photo as a background to plotting solid lines and text, and semi-transparent color contours, in base and lattice graphics. Plot coordinates need to be consistent with the georeferenced background. For example, a color contour plot would have an gray-toned aerial photograph as a background for overprinted semi-transparent color contours of some spatially dependent variable. Can anyone point me in the right direction on how to do this? Thanks, Scott Waichler Pacific Northwest National Laboratory [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable shortlisting for the logistic regression
Hi R helpers, One rather statistical question? What would be the best startegy to shortlist thousands of continous variables automaticaly using R as the preparation for logistic regression modleing! Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using an image background with graphics
On Monday 13 October 2008, Waichler, Scott R wrote: I would like to use a map or aerial photo as a background to plotting solid lines and text, and semi-transparent color contours, in base and lattice graphics. Plot coordinates need to be consistent with the georeferenced background. For example, a color contour plot would have an gray-toned aerial photograph as a background for overprinted semi-transparent color contours of some spatially dependent variable. Can anyone point me in the right direction on how to do this? Thanks, Scott Waichler Pacific Northwest National Laboratory [EMAIL PROTECTED] See spplot() and associated examples of how to use 'sp' class objects. Here is one worked exampled with sp objects: http://casoilresource.lawr.ucdavis.edu/drupal/node/442 -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rotating points on a plot
Anybody know how to rotate shapes generated with the 'points' function? I'm trying to place points around a radial diagram such that the y-axis of individual shapes are oriented with the radii of the circle rather than the y-axis of the larger plot area. Perhaps something analogous to the 'srt' and 'crt' functions for text that appear under 'par'? Thanks, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
An alternative is to use txtStart (and other functions mentioned in the same help page) from the TeachingDemos package. This does the sinking, but can also include the commands as well as allow you to insert comments. The etxt variants allow you to postprocess the whole transcript into a postscript (then pdf) file including selected graphics. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of Michael Just Sent: Monday, October 13, 2008 2:02 PM To: r-help Subject: [R] Add notes to sink output Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stl outlier help request
Currently I find that if I call stl() repeatedly I can use the weights array that is part of the stil output to detect outliers. I also find that if I repeatedly call stl() (replacing the outliers after each call) that the remainder portion of the stil output gets reduced. I am calling it like: for(.index in 1:4) { st - stl(mt, s.window=frequency(mt), robust=TRUE) outliers - which(st $ weights 1e-8) if(length(outliers) 0) { # Replace the outliers with the season + trend mt[outliers] - st$time.series[,seasonal][outliers] + st$time.series[,trend][outliers] } } My question is, is there a better way?. One improvement would be to use the square of the remainder as a stopping criteria rather than a hard-coded loop. Not being familiar with the arguments to stl (inner, outer, etc.) and their bearing on the wieghts I don't know if there is a better way by simply specifying these arguments. So far increasing these arguments above the default values does not seem to reduce the remainder or weights array. I realize that I could look at the source but before I do I would like to request some comments from those who have used this function probably more than I. Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise lrm()
You should note that the author of the lrm function (at least the one in the Design package, I don't know of others) is also one of the most vocal opponents of stepwise regression methods. Using stepwise with lrm() is kind of like borrowing someone's down with violence sign to hit them over the head with. You might want to look at the lasso2 package or get a copy of Frank's book for much better strategies. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of useR Sent: Monday, October 13, 2008 2:35 PM To: r-help@r-project.org Subject: [R] Stepwise lrm() Hello, I have the data set of 1 + 49 variables. One of them is binary other are continous. I would like to be able to fit the model with all 49 variables and then run stepwise model selction. I'd appriciate some code snippets... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MiKTEX-texi2dvi
Sorry, I forgot to include a reproducible example on my last e-mail but here it is: Since the file is large to be included here: The path to the foo.Rnw examples is: www.stat.umn.edu/~charlie/Sweave/foo.Rnw and is suppossed to produce a pdf like this one: http://www.stat.umn.edu/~charlie/Sweave/foo.pdf I have downloaded MiKTEX but I don't know how to make it work. Sweave and Stangle seem to work fine but when I use texi2dvi it crashes. library(tools) Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.tex Processing code chunks ... 1 : echo term verbatim (label=two) 2 : echo term verbatim (label=reg) 3 : echo term verbatim (label=fig1plot) 4 : term verbatim eps pdf (label=fig1) 5 : term verbatim eps pdf (label=fig2) 6 : term hide (label=foo) 7 : term hide (label=foo2) 8 : echo term verbatim (label=blurfle) 9 : term tex (label=tab1) You can now run LaTeX on 'foo.tex' Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.R texi2dvi(foo.tex,pdf=TRUE) C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra Error in texi2dvi(foo.tex, pdf = TRUE) : running 'texi2dvi' on 'foo.tex' failed Any ideas why texi2dvi is crashing? foo.tex exist on same directory as foo.Rnw but it says that is missing.Thanks Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable shortlisting for the logistic regression
Hi Marko, this may be helpful: http://www.ingentaconnect.com/content/bpl/rssb/2008/0070/0001/art5;jsessionid=an2la3spa0n5h.alexandra?format=print Happy modeling! Stephan useR schrieb: Hi R helpers, One rather statistical question? What would be the best startegy to shortlist thousands of continous variables automaticaly using R as the preparation for logistic regression modleing! Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MiKTEX-texi2dvi
One thing to try is to download Sweave.bat from http://batchfiles.googlecode.com and place it in the same directory as the Rnw file (or anywhere on your path) and then from Windows console: Sweave foo.Rnw If MiKTeX is in a standard location Sweave.bat will find it and it will locate R itself from the registry. On Mon, Oct 13, 2008 at 5:42 PM, Felipe Carrillo [EMAIL PROTECTED] wrote: Sorry, I forgot to include a reproducible example on my last e-mail but here it is: Since the file is large to be included here: The path to the foo.Rnw examples is: www.stat.umn.edu/~charlie/Sweave/foo.Rnw and is suppossed to produce a pdf like this one: http://www.stat.umn.edu/~charlie/Sweave/foo.pdf I have downloaded MiKTEX but I don't know how to make it work. Sweave and Stangle seem to work fine but when I use texi2dvi it crashes. library(tools) Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.tex Processing code chunks ... 1 : echo term verbatim (label=two) 2 : echo term verbatim (label=reg) 3 : echo term verbatim (label=fig1plot) 4 : term verbatim eps pdf (label=fig1) 5 : term verbatim eps pdf (label=fig2) 6 : term hide (label=foo) 7 : term hide (label=foo2) 8 : echo term verbatim (label=blurfle) 9 : term tex (label=tab1) You can now run LaTeX on 'foo.tex' Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw) Writing to file foo.R texi2dvi(foo.tex,pdf=TRUE) C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra Error in texi2dvi(foo.tex, pdf = TRUE) : running 'texi2dvi' on 'foo.tex' failed Any ideas why texi2dvi is crashing? foo.tex exist on same directory as foo.Rnw but it says that is missing.Thanks Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
Of course Prof Baer is correct the positive predictive value (PPV) and the negative predictive values (NPV) serve the function of providing conditional post-test probabilities PPV: Post-test probability of disease given a positive test NPV: Post-test probability of no disease given a negative test. Further, PPV is a function of sensitivity (for a given specificity in a population with a given disease prevalence), the higher the sensitivity almost always the greater the PPV (it can by unchanged, but I don't believe it can be lower) and as NPV is a function of specificity (for a given sensitivity in a population with a given disease prevelance), the higher the specificity almost always the greater the NPV (it can by unchanged, but I don't believe it can be lower) . Thus using Prof Harrell's suggestion to use the test that move a pre-test probability a great deal in one or both directions, the test to choose is the one with largest sensitivity and or specificity, and thus sensitivity and specificity are, I believe is a good summary measures of the quality of a clinical test. Finally I think Prof Harrell's observation that sensitivity and specificity change quite a bit, and mathematically must change if the disease is not all-or-nothing while true is a degenerate case of little practical importance. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Robert W. Baer, Ph.D. [EMAIL PROTECTED] 10/13/2008 4:41 PM - Original Message - From: Frank E Harrell Jr [EMAIL PROTECTED] To: John Sorkin [EMAIL PROTECTED] Cc: r-help@r-project.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, October 13, 2008 2:09 PM Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC) John Sorkin wrote: Frank, Perhaps I was not clear in my previous Email message. Sensitivity and specificity do tell us about the quality of a test in that given two tests the one with higher sensitivity will be better at identifying subjects who have a disease in a pool who have a disease, and the more sensitive test will be better at identifying subjects who do not have a disease in a pool of people who do not have a disease. It is true that positive predictive and negative predictive values are of greater utility to a clinician, but as you know these two measures are functions of sensitivity, specificity and disease prevalence. All other things being equal, given two tests one would select the one with greater sensitivity and specificity so in a sense they do measure the quality of a clinical test - but not, as I tried to explain the quality of a statistical model. That is not very relevant John. It is a function of all those things because those quantities are all deficient. I would select the test that can move the pre-test probability a great deal in one or both directions. Of course, this quantity is known as a likelihood ratio and is a function of sensitivity and specificity. For 2 x 2 data one often speaks of postive likelihood ratio and negative likelihood ratio, but for multi-row contingency table one can define likelihood ratios for a series of cut-off points. This has become a popular approach in evidence-based medicine when diagnostic tests have continuous rather than binary outputs. You are of course correct that sensitivity and specificity are not truly inherent characteristics of a test as their values may change from population-to-population, but paretically speaking, they don't change all that much, certainly not as much as positive and negative predictive values. They change quite a bit, and mathematically must change if the disease is not all-or-nothing. I guess we will disagree about the utility of sensitivity and specificity as simplifying concepts. Thank you as always for your clear thoughts and stimulating comments. And thanks for yours John. Frank John among those subjects with a disease and the one with greater specificity will be better at indentifying John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM John Sorkin wrote: Jumping into a thread can be like jumping into a den of lions but here goes . . . Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will
Re: [R] LM intercept
Michael Just wrote: Great, Thanks, Michael On Mon, Oct 13, 2008 at 3:56 PM, Rolf Turner [EMAIL PROTECTED]wrote: On 14/10/2008, at 9:42 AM, Michael Just wrote: What is the difference when including or not including the intercept when using lm()? x.noint - lm(weight ~ group - 1))# omitting intercept x - lm(weight ~ group)) This has nothing to do with forcing the intercept to 0, correct? On the contrary. This is *exactly* what it means. But if group is a factor, this removes the intercept _and_ uses the full set of indicator variables to represent the factor, so you end up with the same model, just parametrized differently. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable shortlisting for the logistic regression
useR wrote: Hi R helpers, One rather statistical question? What would be the best startegy to shortlist thousands of continous variables automaticaly using R as the preparation for logistic regression modleing! Thanks The easiest approach is to use a random number generator. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
John Sorkin wrote: Of course Prof Baer is correct the positive predictive value (PPV) and the negative predictive values (NPV) serve the function of providing conditional post-test probabilities PPV: Post-test probability of disease given a positive test NPV: Post-test probability of no disease given a negative test. Further, PPV is a function of sensitivity (for a given specificity in a population with a given disease prevalence), the higher the sensitivity almost always the greater the PPV (it can by unchanged, but I don't believe it can be lower) and as NPV is a function of specificity (for a given sensitivity in a population with a given disease prevelance), the higher the specificity almost always the greater the NPV (it can by unchanged, but I don't believe it can be lower) . The PPV and NPV can be anything between 0 and 1 regardless of sensitivity and specificity. Just apply the test to populations with a prevalence of 0 or 1. The former gives you a PPV of 0 and an NPV of 1 since none of the positive and none of the negative will be true positive. And vice versa. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable shortlisting for the logistic regression
On Mon, 13 Oct 2008, Frank E Harrell Jr wrote: useR wrote: Hi R helpers, One rather statistical question? What would be the best startegy to shortlist thousands of continous variables automaticaly using R as the preparation for logistic regression modleing! Thanks The easiest approach is to use a random number generator. Frank Got a laugh from me Frank! Can I nominate it for a fortune? David _ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: [EMAIL PROTECTED] Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.