[R] Fitting 3 beta distributions
Hi, I want to fit 3 beta distributions to my data which ranges between 0 and 1. What are the functions that I can easily call and specify that 3 beta distributions should be fitted? I have already looked at normalmixEM and fitdistr but they dont seem to be applicable (normalmixEM is only for fitting normal dist and fitdistr will only fit 1 distribution, not 3). Is that right? Also, my data has 26 million data points. What can I do to reduce the computation time with the suggested function? thanks a lot in advance, eagerly waiting for any input. Best Nitin -- ÎI+IÐ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] deSolve - Function daspk on DAE system - Error
I'm getting this error on the attached code and breaking my head but can't figure it out. Any help is much appreciated. Thanks, Vince CODE: library(deSolve) Res_DAE=function(t, y, dy, pars) { with(as.list(c(y, dy, pars)), { res1 = -dS -dES-k2*ES res2 = -dP + k2*ES eq1 = Eo-E -ES eq2 = So-S -ES -P return(list(c(res1, res2, eq1, eq2))) }) } pars - c(Eo=0.02, So=0.02, k2=250, E=0.01); pars yini - c(S=0.01, ES = 0.01, P=0.0, E=0.01); yini times - seq(0, 0.01, by = 0.0001); times dyini = c(dS=0.0, dES=0.0, dP=0.0) ## Tabular output check of matrix output DAE - daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, atol = 1e-10, rtol = 1e-10) ERROR: daspk-- warning.. At T(=R1) and stepsize H (=R2) the nonlinear solver f nonlinear solver failed to converge repeatedly of with abs (H) = H repeatedly of with abs (H) = HMIN preconditioner had repeated failur 0.0D+00 0.5960464477539D-14 Warning messages: 1: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : repeated convergence test failures on a step - inaccurate Jacobian or preconditioner? 2: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : Returning early. Results are accurate, as far as they go -- View this message in context: http://r.789695.n4.nabble.com/deSolve-Function-daspk-on-DAE-system-Error-tp3864298p3864298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Entering data into a multi-way array?
I am trying to replicate the script, appended below. My data is in OOCalc files. The script (below) synthesizes a dataset (it serves as a tutorial), but I will need to get my data from OOCalc into R for use in that script (which uses arrays). I've worked my way through the script, and understand how most of it works (except the first bit - Step 1 - which is irrelevant to me, anyway). [begin script] ### Supplementary material with the paper ### Interpretation of ANOVA models for microarray data using PCA ### J.R. de Haan et al. Bioinformatics (2006) ### Please cite this paper when you use this code in a publication. ### Written by J.R. de Haan, December 18, 2006 ### Step1: a synthetic dataset of 500 genes is generated with 5 classes ### 1 unresponsive genes (300 genes) ### 2 constant genes (50 genes) ### 3 profile 1 (50 genes) ### 4 profile 2 (50 genes) ### 5 profile 3 (50 genes) #generate synthetic dataset with similar dimensions: # 500 genes, 3 replicates, 10 timepoints, 4 treatments X - array(0, c(500, 3, 10, 4)) labs.synth - c(rep(1, 300), rep(2, 50), rep(3, 50), rep(4, 50), rep(5, 50)) gnames - cbind(labs.synth, labs.synth) #print(dim(gnames)) gnames[1:300,2] - A gnames[301:350,2] - B gnames[351:400,2] - C gnames[401:450,2] - D gnames[451:500,2] - E ### generate 300 noise genes with expressions slightly larger than ### the detection limit (class 1) X[labs.synth==1,1,,] - rnorm(length(X[labs.synth==1,1,,]), mean=50, sd=40) X[labs.synth==1,2,,] - X[labs.synth==1,1,,] + rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10) X[labs.synth==1,3,,] - X[labs.synth==1,1,,] + rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10) # generate 50 stable genes at two levels (class 2) X[301:325,1,,] - rnorm(length(X[301:325,1,,]), mean=1500, sd=40) X[301:325,2,,] - X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0, sd=10) X[301:325,3,,] - X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0, sd=10) X[326:350,1,,] - rnorm(length(X[326:350,1,,]), mean=3000, sd=40) X[326:350,2,,] - X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0, sd=10) X[326:350,3,,] - X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0, sd=10) # generate50 genes with profile 1 (class 3) increase.range - matrix(rep(1:50, 10), ncol=10, byrow=FALSE) profA3 - matrix(rep(c(10, 60, 110, 150, 150, 150, 150, 150, 150, 150) , 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,,1] - profA3 + rnorm(length(profA3), mean=0, sd=40) profB3 - matrix(rep(c(10, 100, 220, 280, 280, 280, 280, 280, 280, 280), 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,1:10,2] - profB3 + rnorm(length(profA3), mean=0, sd=40) profC3 - matrix(rep(c(10, 120, 300, 300, 280, 280, 280, 280, 280, 280), 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,1:10,3] - profC3 + rnorm(length(profA3), mean=0, sd=40) profD3 - matrix(rep(c(100, 75, 50, 50, 50, 50, 50, 50, 75, 100), 50), ncol=10, byrow=TRUE) X[351:400,1,1:10,4] - profD3 + rnorm(length(profA3), mean=0, sd=40) #again replicates X[351:400,2,,] - X[351:400,1,,] + rnorm(length(X[351:400,2,,]), mean=0, sd=10) X[351:400,3,,] - X[351:400,1,,] + rnorm(length(X[351:400,3,,]), mean=0, sd=10) # generate50 genes with profile 2 (class 4) increase.range - matrix(rep(1:50, 10), ncol=10, byrow=FALSE) profA4 - matrix(rep(c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50) , 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,,1] - profA4 + rnorm(length(profA4), mean=0, sd=40) profB4 - matrix(rep(c(10, 100, 220, 280, 200, 150, 100, 50, 50, 50), 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,1:10,2] - profB4 + rnorm(length(profA4), mean=0, sd=40) profC4 - matrix(rep(c(10, 150, 300, 220, 150, 100, 50, 50, 50, 50), 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,1:10,3] - profC4 + rnorm(length(profA4), mean=0, sd=40) profD4 - matrix(rep(c(150, 100, 50, 50, 75, 75, 75, 100, 100, 100), 50), ncol=10, byrow=TRUE) X[401:450,1,1:10,4] - profD4 + rnorm(length(profA4), mean=0, sd=40) #again replicates X[401:450,2,,] - X[401:450,1,,] + rnorm(length(X[401:450,2,,]), mean=0, sd=10) X[401:450,3,,] - X[401:450,1,,] + rnorm(length(X[401:450,3,,]), mean=0, sd=10) # generate50 genes with profile 3 (class 5) increase.range - matrix(rep(1:25, 20), ncol=10, byrow=FALSE) profA4 - matrix(rep((200 - c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,,1] - profA4 + rnorm(length(profA4), mean=0, sd=40) profB4 - matrix(rep((200 - c(10, 100, 180, 200, 200, 150, 100, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,1:10,2] - profB4 + rnorm(length(profA4), mean=0, sd=40) profC4 - matrix(rep((200 - c(10, 150, 200, 180, 150, 100, 50, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,1:10,3] - profC4 + rnorm(length(profA4), mean=0, sd=40) profD4 -
Re: [R] Poor performance of Optim
Thank you for your response! But the problem is when I estimate a model without knowing the true coefficients, how can I know which reltol is good enough? 1e-8 or 1e-10? Why can commercial packages automatically determine the right reltol but R cannot? -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864243.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
What I tried is just a simple binary probit model. Create a random data and use optim to maximize the log-likelihood function to estimate the coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = (u 0), y indicating a binary choice variable) If I estimate coefficient of x, I should be able to get a value close to 0.2 if sample is large enough. Say I got 0.18. If I expand x by twice and reestimate the model, which coefficient should I get? 0.09, right? But with optim, I got something different. When I do the same thing in both Gauss and Matlab, I can exactly get 0.09, evidencing that the coefficient estimator is reliable. But R's optim does not give me a reliable estimator. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3863969.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate Laplace density
Can anyone show how to calculate a multivariate Laplace density? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-Laplace-density-tp3864072p3864072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
Oh, I think I got it. Commercial packages limit the number of decimals shown. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864271.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot: how to fix the ratio of the plot box?
Dear all, this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like a square anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. Cheers, Marius Minimal example: u - runif(10) pdf(width=5, height=5) plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title) dev.off() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
Ben Bolker sent me a private email rightfully correcting me that was factually wrong when I wrote that ML /is/ a numerical method (I had written sloppily and under time pressure). He is of course right to point out that not all maximum likelihood estimators require numerical methods to solve. Further, only numerical optimization will show the behavior discussed in this post for the given reasons. (I hope this post isn't yet another blooper of mine at 5 a.m. in the morning). Best, Daniel Daniel Malter wrote: With respect, your statement that R's optim does not give you a reliable estimator is bogus. As pointed out before, this would depend on when optim believes it's good enough and stops optimizing. In particular if you stretch out x, then it is plausible that the likelihood function will become flat enough earlier, so that the numerical optimization will stop earlier (i.e., optim will think that the slope of the likelihood function is flat enough to be considered zero and stop earlier than it will for more condensed data). After all, maximum likelihood is a numerical method and thus an approximation. I would venture to say that what you describe lies in the nature of this method. You could also follow the good advice given earlier, by increasing the number of iterations or decreasing the tolerance. However, check the example below: for all purposes it's really close enough and has nothing to do with optim being unreliable. n-1000 x-rnorm(n) y-0.5*x+rnorm(n) z-ifelse(y0,1,0) X-cbind(1,x) b-matrix(c(0,0),nrow=2) #Probit reg-glm(z~x,family=binomial(probit)) #Optim reproducing probit (with minor deviations due to difference in method) LL-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))} optim(c(0,0),LL) #Multiply x by 2 and repeat optim X[,2]=2*X[,2] optim(c(0,0),LL) HTH, Daniel yehengxin wrote: What I tried is just a simple binary probit model. Create a random data and use optim to maximize the log-likelihood function to estimate the coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = (u 0), y indicating a binary choice variable) If I estimate coefficient of x, I should be able to get a value close to 0.2 if sample is large enough. Say I got 0.18. If I expand x by twice and reestimate the model, which coefficient should I get? 0.09, right? But with optim, I got something different. When I do the same thing in both Gauss and Matlab, I can exactly get 0.09, evidencing that the coefficient estimator is reliable. But R's optim does not give me a reliable estimator. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864681.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
And there I caught myself with the next blooper: it wasn't Ben Bolker, it was Bert Gunter who pointed that out. :) Daniel Malter wrote: Ben Bolker sent me a private email rightfully correcting me that was factually wrong when I wrote that ML /is/ a numerical method (I had written sloppily and under time pressure). He is of course right to point out that not all maximum likelihood estimators require numerical methods to solve. Further, only numerical optimization will show the behavior discussed in this post for the given reasons. (I hope this post isn't yet another blooper of mine at 5 a.m. in the morning). Best, Daniel Daniel Malter wrote: With respect, your statement that R's optim does not give you a reliable estimator is bogus. As pointed out before, this would depend on when optim believes it's good enough and stops optimizing. In particular if you stretch out x, then it is plausible that the likelihood function will become flat enough earlier, so that the numerical optimization will stop earlier (i.e., optim will think that the slope of the likelihood function is flat enough to be considered zero and stop earlier than it will for more condensed data). After all, maximum likelihood is a numerical method and thus an approximation. I would venture to say that what you describe lies in the nature of this method. You could also follow the good advice given earlier, by increasing the number of iterations or decreasing the tolerance. However, check the example below: for all purposes it's really close enough and has nothing to do with optim being unreliable. n-1000 x-rnorm(n) y-0.5*x+rnorm(n) z-ifelse(y0,1,0) X-cbind(1,x) b-matrix(c(0,0),nrow=2) #Probit reg-glm(z~x,family=binomial(probit)) #Optim reproducing probit (with minor deviations due to difference in method) LL-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))} optim(c(0,0),LL) #Multiply x by 2 and repeat optim X[,2]=2*X[,2] optim(c(0,0),LL) HTH, Daniel yehengxin wrote: What I tried is just a simple binary probit model. Create a random data and use optim to maximize the log-likelihood function to estimate the coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = (u 0), y indicating a binary choice variable) If I estimate coefficient of x, I should be able to get a value close to 0.2 if sample is large enough. Say I got 0.18. If I expand x by twice and reestimate the model, which coefficient should I get? 0.09, right? But with optim, I got something different. When I do the same thing in both Gauss and Matlab, I can exactly get 0.09, evidencing that the coefficient estimator is reliable. But R's optim does not give me a reliable estimator. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864688.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ipad on R
It is possible to install R on Ipad 2? -- Oscar Ramírez A. Universidad Nacional, Escuela de Ciencias Biológicas. M.Sc. en Conservación y Manejo de Vida Silvestre osorami...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot: how to fix the ratio of the plot box?
On 10/02/2011 07:20 PM, Hofert Jan Marius wrote: Dear all, this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like a square anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. Cheers, Marius Minimal example: u- runif(10) pdf(width=5, height=5) plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title) dev.off() Hi Marius, Have you tried: par(pty=s) after you open the device and before plotting? Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot: how to fix the ratio of the plot box?
ahh, perfect, thanks. Cheers, Marius On 2011-10-02, at 13:08 , Jim Lemon wrote: On 10/02/2011 07:20 PM, Hofert Jan Marius wrote: Dear all, this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like a square anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. Cheers, Marius Minimal example: u- runif(10) pdf(width=5, height=5) plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main=My title) dev.off() Hi Marius, Have you tried: par(pty=s) after you open the device and before plotting? Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ipad on R
2011/10/2 Oscar Ramírez osorami...@gmail.com: It is possible to install R on Ipad 2? This discussion predates the iPad 2, but the licensing restrictions likely still apply: http://www.r-statistics.com/2010/06/could-we-run-a-statistical-analysis-on-iphoneipad-using-r/ One-word answer: no. Two-word answer: Not legally. But do read the discussion at the above link. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset in dataframes
I need help in subseting a dataframe: data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) data1 I want to keep the firms where all x0 (where there are no negative values in x) So my output should be: year firm xy 1 20013 101 1010 2 20023 14 140 3 20033 87 870 4 20043 56 560 5 20014 12 120 6 20024 43 430 7 20034 67 670 8 20044 54 540 So I'm doing: data2-data1[data1$firm%in%subset(data1,data1$x0),] data2 But the result is [1] year firm xy 0 rows (or 0-length row.names) Thank you for any help Cecília Carmo (Universidade de Aveiro) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset in dataframes
Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: I need help in subseting a dataframe: data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. data1 I want to keep the firms where all x0 (where there are no negative values in x) So my output should be: year firm x y 1 2001 3 101 1010 2 2002 3 14 140 3 2003 3 87 870 4 2004 3 56 560 5 2001 4 12 120 6 2002 4 43 430 7 2003 4 67 670 8 2004 4 54 540 So I'm doing: data2-data1[data1$firm%in%subset(data1,data1$x0),] data2 What about finding which ones have negative values and should be deleted, unique(data1$firm[data1$x = 0]) [1] 1 2 And then deleting them? data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),] year firm xy 9 20013 101 1010 10 20023 14 140 11 20033 87 870 12 20043 56 560 13 20014 12 120 14 20024 43 430 15 20034 67 670 16 20044 54 540 But the result is [1] year firm x y 0 rows (or 0-length row.names) If you look at just the result of part of your code, subset(data1,data1$x0) it isn't giving at all what you need for the next step: the entire data frame for x0. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difference between ~lp() or simply ~ in R's locfit?
As I think it is not spam but helpful, let me repeat my stats.stackexchange.com question here, from http://stats.stackexchange.com/questions/16346/difference-between-lp-or-simply-in-rs-locfit I am not sure I see the difference between different examples for local logistic regression in the documentation of the gold standard locfit package for R: http://cran.r-project.org/web/packages/locfit/locfit.pdf I get starkingly different results with fit2-scb(closed_rule ~ lp(bl),deg=1,xlim=c(0,1),ev=lfgrid(100), family='binomial',alpha=cbind(0,0.3),kern=parm) from fit2-scb(closed_rule ~ bl,deg=1,xlim=c(0,1),ev=lfgrid(100), family='binomial',alpha=cbind(0,0.3),kern=parm) . What is the nature of the difference? Maybe that can help me phrase which I wanted. I had in mind an index linear in bl within a logistic link function predicting the probability of closed_rule. The documentation of lp says that it fits a local polynomial which is great, but I thought that would happen even if I leave it out. And in any case, the documentation has examples for local logistic regression either way [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting 3 beta distributions
On Sat, 1 Oct 2011, Nitin Bhardwaj wrote: Hi, I want to fit 3 beta distributions to my data which ranges between 0 and 1. What are the functions that I can easily call and specify that 3 beta distributions should be fitted? I have already looked at normalmixEM and fitdistr but they dont seem to be applicable (normalmixEM is only for fitting normal dist and fitdistr will only fit 1 distribution, not 3). Is that right? From your description above, I guess that (a) you want to fit a _mixture_ of 3 beta distributions, and (b) have tried to use mixtools and MASS so far. Based on these assumptions: fitdistr() does not fit mixture models. mixtools does fit mixtures and the accompanying paper has an example where a nonparametric model is applied to mixtures of beta distributions. Furthermore, the betareg package has a function betamix() which can fit mixtures of beta regression models (including the special case of no covariates). Both mixtools and betareg have been published in JSS, as indicated when calling citation(mixtools) and citation(betareg): http://www.jstatsoft.org/v32/i06/ http://www.jstatsoft.org/v34/i02/ The latter does not yet contain the betamix() function. As an example, one can use the artificial data generated in Section 5.2: set.seed(123) y1 - c(rbeta(150, 0.3 * 4, 0.7 * 4), rbeta(50, 0.5 * 4, 0.5 * 4)) y2 - c(rbeta(100, 0.3 * 4, 0.7 * 4), rbeta(100, 0.3 * 8, 0.7 * 8)) d - data.frame(y1, y2) bm1 - betamix(y1 ~ 1 | 1, data = d, k = 2) bm2 - betamix(y2 ~ 1 | 1, data = d, k = 2) where one should note that compared to R's parametrization of the beta distribution two transformations are employed: From shape1/shape2 to mu/phi and then adding logit/log link functions. Also, my data has 26 million data points. What can I do to reduce the computation time with the suggested function? I think all functions above will have problems with 26 million observations directly. One alternative - if the fitting function takes weights - would be to use a representative sample or computing weights on a possibly coarsened grid. hth, Z thanks a lot in advance, eagerly waiting for any input. Best Nitin -- ??I+I?? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset in dataframes
Thank you very much. My dataframe has thousands of firms, how can I delete all of those with x0 and keep another dataframe with firms where all x0? Thank you again. Cecília Carmo (Universidade de Aveiro - Portugal) -Mensagem original- De: Sarah Goslee [mailto:sarah.gos...@gmail.com] Enviada: domingo, 2 de Outubro de 2011 13:01 Para: Cecilia Carmo Cc: r-help@r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: I need help in subseting a dataframe: data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. data1 I want to keep the firms where all x0 (where there are no negative values in x) So my output should be: year firm x y 1 2001 3 101 1010 2 2002 3 14 140 3 2003 3 87 870 4 2004 3 56 560 5 2001 4 12 120 6 2002 4 43 430 7 2003 4 67 670 8 2004 4 54 540 So I'm doing: data2-data1[data1$firm%in%subset(data1,data1$x0),] data2 What about finding which ones have negative values and should be deleted, unique(data1$firm[data1$x = 0]) [1] 1 2 And then deleting them? data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),] year firm xy 9 20013 101 1010 10 20023 14 140 11 20033 87 870 12 20043 56 560 13 20014 12 120 14 20024 43 430 15 20034 67 670 16 20044 54 540 But the result is [1] year firm x y 0 rows (or 0-length row.names) If you look at just the result of part of your code, subset(data1,data1$x0) it isn't giving at all what you need for the next step: the entire data frame for x0. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset in dataframes
Hi, On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: Thank you very much. My dataframe has thousands of firms, how can I delete all of those with x0 and keep another dataframe with firms where all x0? How does that differ from your original question? What doesn't work for you in the answer I already gave? Sarah Thank you again. Cecília Carmo (Universidade de Aveiro - Portugal) -Mensagem original- De: Sarah Goslee [mailto:sarah.gos...@gmail.com] Enviada: domingo, 2 de Outubro de 2011 13:01 Para: Cecilia Carmo Cc: r-help@r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: I need help in subseting a dataframe: data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. data1 I want to keep the firms where all x0 (where there are no negative values in x) So my output should be: year firm x y 1 2001 3 101 1010 2 2002 3 14 140 3 2003 3 87 870 4 2004 3 56 560 5 2001 4 12 120 6 2002 4 43 430 7 2003 4 67 670 8 2004 4 54 540 So I'm doing: data2-data1[data1$firm%in%subset(data1,data1$x0),] data2 What about finding which ones have negative values and should be deleted, unique(data1$firm[data1$x = 0]) [1] 1 2 And then deleting them? data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),] year firm x y 9 2001 3 101 1010 10 2002 3 14 140 11 2003 3 87 870 12 2004 3 56 560 13 2001 4 12 120 14 2002 4 43 430 15 2003 4 67 670 16 2004 4 54 540 But the result is [1] year firm x y 0 rows (or 0-length row.names) If you look at just the result of part of your code, subset(data1,data1$x0) it isn't giving at all what you need for the next step: the entire data frame for x0. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset in dataframes
Sarah, Sorry for being ignorant. I was doing something wrong. It works perfectly. Thank you. Cecília Carmo -Mensagem original- De: Sarah Goslee [mailto:sarah.gos...@gmail.com] Enviada: domingo, 2 de Outubro de 2011 14:21 Para: Cecilia Carmo Cc: r-help@r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: Thank you very much. My dataframe has thousands of firms, how can I delete all of those with x0 and keep another dataframe with firms where all x0? How does that differ from your original question? What doesn't work for you in the answer I already gave? Sarah Thank you again. Cecília Carmo (Universidade de Aveiro - Portugal) -Mensagem original- De: Sarah Goslee [mailto:sarah.gos...@gmail.com] Enviada: domingo, 2 de Outubro de 2011 13:01 Para: Cecilia Carmo Cc: r-help@r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: I need help in subseting a dataframe: data1-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. data1 I want to keep the firms where all x0 (where there are no negative values in x) So my output should be: year firm x y 1 2001 3 101 1010 2 2002 3 14 140 3 2003 3 87 870 4 2004 3 56 560 5 2001 4 12 120 6 2002 4 43 430 7 2003 4 67 670 8 2004 4 54 540 So I'm doing: data2-data1[data1$firm%in%subset(data1,data1$x0),] data2 What about finding which ones have negative values and should be deleted, unique(data1$firm[data1$x = 0]) [1] 1 2 And then deleting them? data1[!(data1$firm %in% unique(data1$firm[data1$x = 0])),] year firm x y 9 2001 3 101 1010 10 2002 3 14 140 11 2003 3 87 870 12 2004 3 56 560 13 2001 4 12 120 14 2002 4 43 430 15 2003 4 67 670 16 2004 4 54 540 But the result is [1] year firm x y 0 rows (or 0-length row.names) If you look at just the result of part of your code, subset(data1,data1$x0) it isn't giving at all what you need for the next step: the entire data frame for x0. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overlapping plot in lattice
Thanks Gabor, that was exactly what I needed. On Sep 30, 9:00 pm, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Sep 30, 2011 at 3:01 AM, Kang Min ngokang...@gmail.com wrote: Hi all, I was wondering if there's an equivalent to par(new=T) of the plot function in lattice. I'm plotting an xyplot, and I would like to highlight one point by plotting that one point again using a different symbol. For example, where 6 is highlighted: plot(1:10, xlim=c(0,10), ylim=c(0,10)) par(new=T) plot(6,6, xlim=c(0,10), ylim=c(0,10), pch=16) Try this: library(lattice) xyplot(1:10 ~ 1:10, xlim=c(0,10), ylim=c(0,10)) trellis.focus() panel.points(6, 6, pch = 6) trellis.unfocus() -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Arimax First-Order Transfer Function
Dear list members, I am a (very) recent convert to R and I am hoping you can help me with a problem I'm having. I'm trying to fit a first-order transfer function to an ARIMA intervention analysis using the arimax function. The data was obtained from McCleary Hay (1980) (via Rob Hyndman's Time Series Library: http://robjhyndman.com/tsdldata/data/schizo.dat). It has 120 time points with an intervention occurring on the 60th unit. So far I've been able to run a simple zero-order intervention model , which I've done like this: Model1 -arimax(x,order=c(0,1,1), xreg=Intv) ** where Intv -as.matrix(c(rep(0,60),rep(1,60))) (the dummy, intervention variable). I'd like to add a first-order transfer function in order to test for gradual, permanent effects. I understand this can be done by adding the xtransf and transfer arguments, however after playing around with this I've been unsuccessful in replicating the results found in McCleary Hay(1980). I've looked, in depth, at the 'airline' example, however, despite the guidance provided by Chan on this (see below) it's not immediately clear to me how xtransfer (i.e. I911=1*(seq(airmiles)==69) and the transfer (i.e. transfer=list(c(0,0),c(1,0)) arguments are generated, and what they consist of. I've looked extensively for further information on this, but to no avail. Is anyone able to offer any further advice/ directions on how to go about this? Best wishes David example provided by Chan (2008)(airline example): air.m1=arimax(log(airmiles),order=c(0,1,1),seasonal=list(order=c(0,1,1), period=12),xtransf=data.frame(I911=1*(seq(airmiles)==69), I911=1*(seq(airmiles)==69)), transfer=list(c(0,0),c(1,0)),xreg=data.frame(Dec96=1*(seq(airmiles)==12), Jan97=1*(seq(airmiles)==13),Dec02=1*(seq(airmiles)==84)),method='ML') # Additive outliers are incorporated as dummy variables in xreg. # Transfer function components are incorporated by the xtransf and transfer # arguments. # Here, the transfer function consists of two parts omega0*P(t) and # omega1/(1-omega2*B)P(t) where the inputs of the two transfer # functions are identical and equals the dummy variable that is 1 at September # 2001 (the 69th data point) and zero otherwise. # xtransf is a matrix whose columns are the input variables. # transfer is a list consisting of the pair of (MA order, AR order) of each # transfer function, which in this examples is (0,0) and (1,0). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find all duplicate records
Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine OS and time. (df-data.frame(ID=c(userA, userB, userA, userC), OS=c(Win,OSX,Win, Win64), time=c(12:22,23:22,04:44,12:28))) IDOS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson -- View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all duplicate records
On 02.10.2011 16:05, Erik Svensson wrote: Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine OS and time. (df-data.frame(ID=c(userA, userB, userA, userC), OS=c(Win,OSX,Win, Win64), time=c(12:22,23:22,04:44,12:28))) IDOS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 See ?split or ?subset Uwe Ligges preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson -- View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
Hi, You really need to study the documentation of optim carefully before you make broad generalizations. There are several algorithms available in optim. The default is a simplex-type algorithm called Nelder-Mead. I think this is an unfortunate choice as the default algorithm. Nelder-Mead is a robust algorithm that can work well for almost any kind of objective function (smooth or nasty). However, the trade-off is that it is very slow in terms of convergence rate. For simple, smooth problems, such as yours, you should use BFGS (or L-BFGS if you have simple box-constraints). Also, take a look at the optimx package and the most recent paper in J Stat Software on optimx for a better understanding of the wide array of optimization options available in R. Best, Ravi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generating Venn diagram with 6 sets
Dear r-helpers, Here I would like to have your kind helps on generating Venn diagram. There are some packages within R on this task, like venneuler, VennDiagram, vennerable. But, vennerable can not be installed on my Mac book. It seems VennDiagram can not work on my data. And, venneuler may have generated a wrong Venn diagram to me. Do you have any experience/expertise on those Venn diagram? Could you please give me any directions on that? Thanks in advance. Best wishes, Jian-Feng, ## # (1) my code for venneuler vd - venneuler(c(ABCDEF= 69604,ABCDE=426120 ,ABCDF=20297,ABCD=123063,ABCEF=12695,ABCE=115100,ABCF=11667,ABC=95656,ABDEF=1755,ABDE=20113,ABDF=1903,ABD=19218,ABEF=2831,ABE=38362,ABF=4950,AB=68289,ACDEF=11657,ACDE=107235,ACDF=14883,ACD=193338,ACEF=6284,ACE=79985,ACF=14710,AC= 271416 ,ADEF=1069,ADE=17628,ADF=3152,AD=71573,AEF=2786,AE=57511,AF=13684,A= 475970 ,BCDEF=2722,BCDE=30528,BCDF=2740,BCD=30986,BCEF=3579,BCE=55443,BCF=7789,BC=101005,BDEF=917,BDE=14894,BDF=1436,BD=24972,BEF=3975,BE=105527,BF= 16877,B=718570 ,CDEF=1587,CDE=26289,CDF=4902,CD=101947,CEF=3326,CE=77289,CF=20125,C= 689330,DEF=892,DE=22666,DF=4661,D=200020,EF=8518,E=521290,F= 401622)) pdf(myvenn.pdf) plot(vd) dev.off() # # (2) the problem of the plot venneuler generated me is sets (A,B,C,D,E,F) should shared 69604 elements. # But, it illustrated nothing for me for this 6 sets sharing. # # (3) I prepared my code for vennerable package, but it can not be installed now. myVenn - Venn(SetNames = c(Norway,Russia,Iceland, Scotland,Austria,North American), Weight = c('11'=69604,'10'= 426120 ,'01'=20297','00'=123063,'111011'=12695,'111010'=115100,'111001'=11667,'111000'=95656,'110111'=1755,'110110'=20113,'110101'=1903,'110100'=19218,'110011'=2831,'110010'=38362,'110001'=4950,'11'=68289,'10'=11657,'101110'=107235,'101101'=14883,'101100'=193338,'101011'=6284,'101010'=79985,'101001'=14710,'101000'= 271416 ,'100111'=1069,'100110'=17628,'100101'=3152,'100100'=71573,'100011'=2786,'100010'=57511,'11'=13684,'10'= 475970 ,'01'=2722,'00'=30528,'011101'=2740,'011100'=30986,'011011'=3579,'011010'=55443,'011001'=7789,'011000'=101005,'010111'=917,'010110'=14894,'010101'=1436,'010100'=24972,'010011'=3975,'010010'=105527,'010001'= 16877,'01'=718570 ,'00'=1587,'001110'=26289,'001101'=4902,'001100'=101947,'001011'=3326,'001010'=77289,'001001'=20125,'001000'=689330,'000111'=892,'000110'=22666,'000101'=4661,'000100'=200020,'11'=8518,'10'=521290,'01'=401622) pdf(myVenn) plot(myVenn, doWeight = T, type = circles) dev.off() [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regarding specifying criteria for Cointegration
Dear All, I am learning R and Time Series Econometrics for the first time. I have doubt regarding cointegration specification criteria. The problem follows: test1 - ca.jo(data1,ecdet=const,type=trace,K=2,spec=transitory)---When to specify transitory test1 - ca.jo(data1,ecdet=const,type=trace,K=2,spec=longrun)..when to specify long-run With regards, Upananda [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error while using shapiro.test()
On 2011-10-01 09:24, spicymchaggis101 wrote: Thank you very much! your response solved my issue. I needed to determine the probability of normality for word types per page. You may want to review just what the test does. It certainly does not give you the 'probability of normality'. A worthwhile exercise might be to test several other distributions on your data. Peter Ehlers -- View this message in context: http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3863205.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] On-line machine learning packages?
Hello Jay, Did you find the answer to your question on incremental machine learning? If not, I found some links that might help: It appears that might be able to do streaming/incremental machine learning in Weka: http://moa.cs.waikato.ac.nz/details/classification/using-weka/ On the above link, there is a link to a free online book on data stream mining: http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf While weka is a separate project from R, there is an R to Weka interface available at http://cran.r-project.org/web/packages/RWeka/index.html Sadly, I didn't see any streaming/incremental machine learning packages on the CRAN machine leaning task view. I would guess that your best bet is using Weka with the Rweka interface, but I'm a neophyte in the machine learning field, so please take this advice with a grain of salt. Sincerely, Jason On 09/13/2011 02:35 AM, Jay wrote: How does sequential classification differ form running a one-off classifier for each run? - Because feedback from the previous round can and needs to be incorporated into the ext round. http://lmgtfy.com/?q=R+machine+learning - That is a new low. I was hoping to get help, oblivious I was wrong to use this forum in the hopes of somebody had already battled these kinds of problems in R. On Sep 13, 1:52 am, Jason Edgecombeja...@rampaginggeek.com wrote: I already provided the link to the task view, which provides a list of the more popular machine learning algorithms for R. Do you have a particular algorithm or technique in mind? Does it have a name? How does sequential classification differ form running a one-off classifier for each run? On 09/12/2011 05:24 AM, Jay wrote: In my mind this sequential classification task with feedback is somewhat different from an completely offline, once-off, classification. Am I wrong? However, it looks like the mentality on this topic is to refer me to cran/google in order to look for solutions myself. Oblivious I know about these sources, and as I said, I used rseek.org among other sources to look for solutions. I did not start this topic for fun, I'm asking for help to find a suitable machine learning packages that readily incorporates feedback loops and online learning. If somebody has experience these kinds of problems in R, please respond. Or will http://cran.r-project.org Look for 'Task Views' be my next piece of advice? On Sep 12, 11:31 am, Dennis Murphydjmu...@gmail.comwrote: http://cran.r-project.org/web/views/ Look for 'machine learning'. Dennis On Sun, Sep 11, 2011 at 11:33 PM, Jayjosip.2...@gmail.comwrote: If the answer is so obvious, could somebody please spell it out? On Sep 11, 10:59 pm, Jason Edgecombeja...@rampaginggeek.comwrote: Try this: http://cran.r-project.org/web/views/MachineLearning.html On 09/11/2011 12:43 PM, Jay wrote: Hi, I used the rseek search engine to look for suitable solutions, however as I was unable to find anything useful, I'm asking for help. Anybody have experience with these kinds of problems? I looked into dynaTree, but as information is a bit scares and as I understand it, it might not be what I'm looking for..(?) BR, Jay On Sep 11, 7:15 pm, David Winsemiusdwinsem...@comcast.net wrote: On Sep 11, 2011, at 11:42 AM, Jay wrote: What R packages are available for performing classification tasks? That is, when the predictor has done its job on the dataset (based on the training set and a range of variables), feedback about the true label will be available and this information should be integrated for the next classification round. You should look at CRAN Task Views. Extremely easy to find from the main R-project page. -- David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] On-line machine learning packages?
Hi Jay, I see this thread is a bit (ok, quite) old at this point, but I see you never really got an answer to your question that was satisfactory. I figured you might be interested to know that Dirk has started to wrap vowpal wabbit[1,2] into an R package, RVowpalWabbit[3,4] The package itself is still a rather bare-bones, but perhaps it can be useful to you in its current state, or perhaps the raw vowpal wabbit. You might also consider the shogun toolbox[5]. As of its 1.0 release, I believe it has incorporated vowpal wabbit in some form or another to do online learning, but might have other online learning algos baked in. It has its own flavor of an R interface (r_static or r_modular), which might work for you if you can get it to compile. -steve [1] Vowpal Wabbit (home page): http://hunch.net/~vw/ [2] Vowpal Wabbit (github): https://github.com/JohnLangford/vowpal_wabbit [3] RVowpalWabbit (CRAN): http://cran.r-project.org/web/packages/RVowpalWabbit/index.html [4] RVowpalWabbit (R-forge): https://r-forge.r-project.org/projects/rvowpalwabbit/ [5] The shogun toolbox: http://www.shogun-toolbox.org/ On Mon, Sep 12, 2011 at 5:24 AM, Jay josip.2...@gmail.com wrote: In my mind this sequential classification task with feedback is somewhat different from an completely offline, once-off, classification. Am I wrong? However, it looks like the mentality on this topic is to refer me to cran/google in order to look for solutions myself. Oblivious I know about these sources, and as I said, I used rseek.org among other sources to look for solutions. I did not start this topic for fun, I'm asking for help to find a suitable machine learning packages that readily incorporates feedback loops and online learning. If somebody has experience these kinds of problems in R, please respond. Or will http://cran.r-project.org Look for 'Task Views' be my next piece of advice? On Sep 12, 11:31 am, Dennis Murphy djmu...@gmail.com wrote: http://cran.r-project.org/web/views/ Look for 'machine learning'. Dennis On Sun, Sep 11, 2011 at 11:33 PM, Jay josip.2...@gmail.com wrote: If the answer is so obvious, could somebody please spell it out? On Sep 11, 10:59 pm, Jason Edgecombe ja...@rampaginggeek.com wrote: Try this: http://cran.r-project.org/web/views/MachineLearning.html On 09/11/2011 12:43 PM, Jay wrote: Hi, I used the rseek search engine to look for suitable solutions, however as I was unable to find anything useful, I'm asking for help. Anybody have experience with these kinds of problems? I looked into dynaTree, but as information is a bit scares and as I understand it, it might not be what I'm looking for..(?) BR, Jay On Sep 11, 7:15 pm, David Winsemiusdwinsem...@comcast.net wrote: On Sep 11, 2011, at 11:42 AM, Jay wrote: What R packages are available for performing classification tasks? That is, when the predictor has done its job on the dataset (based on the training set and a range of variables), feedback about the true label will be available and this information should be integrated for the next classification round. You should look at CRAN Task Views. Extremely easy to find from the main R-project page. -- David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve
Re: [R] Keep ALL duplicate records
Erik Svensson wrote: Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine OS and time. (df-data.frame(ID=c(userA, userB, userA, userC), OS=c(Win,OSX,Win, Win64), time=c(12:22,23:22,04:44,12:28))) IDOS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson How about ... # All records ALL_RECORDS - df[df$ID==df$ID[duplicated(df$ID)],] print(ALL_RECORDS) # Logical Records TRUE_FALSE - df$ID==df$ID[duplicated(df$ID)] print(TRUE_FALSE) HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] difference between createPartition and createfold functions
Hello, I'm trying to separate my dataset into 4 parts with the 4th one as the test dataset, and the other three to fit a model. I've been searching for the difference between these 2 functions in Caret package, but the most I can get is this-- A series of test/training partitions are created using createDataPartition while createResample creates one or more bootstrap samples. createFolds splits the data into k groups. I'm missing something here? What is the difference btw createPartition and createFold? I guess they wouldn't be equivalent. Thank you. Bonnie Yuan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is the output of survfit.coxph survival or baseline survival?
On Sat, Oct 1, 2011 at 2:31 PM, koshihaku koshih...@gmail.com wrote: Dear all, I am confused with the output of survfit.coxph. Someone said that the survival given by summary(survfit.coxph) is the baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. Which one is correct? The baseline hazard as estimated in survfit.coxph is the hazard when all covariates are equal to the sample mean (or the stratum mean for a stratified model). The means that it is using are available in the $means component of the coxph object. It is not the hazard extrapolated to all covariates equal zero. The centering at the sample mean is done for three reasons 1/ it's computationally convenient 2/ it's numerically more stable 3/ it makes the baseline hazard more interpretable, since at least it is the hazard for a set of covariate values somewhere in the interior of your data. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on approach to weighting survey
On Sat, Oct 1, 2011 at 4:59 AM, Farley, Robert farl...@metro.net wrote: I'm about to add weights to a bus on-board survey dataset with ~150 variables and ~28,000 records. My intention is to weight (for each bus run) by boarding stop and alighting stop. I've seen the Rake function of the Survey package, but it seems that converting to a svydesign might be excessive for my purpose. My dataset has a huge number of unique Run-Boarding and Run-Alighting groups each with a small number of records to expand. Would it be easier to manually implement Iterative-Proportional-Fitting/Raking/Fratar/Furness on the data? Or are there benefits to converting the data to a svydesign that would make it valuable? This traditional weighting expands what we call unlinked (based on each boarding)trips. I'm thinking of also using IPF/Raking to estimate linked (based on each individual) trips. Would this change the consideration of using the svydesign process? If you're planning to do any analysis afterwards it would be useful to have the data in a svydesign object, or if you end up needing to do weight trimming or bounding, or other slightly more complicated weight adjustments. Otherwise it might well just be easier to do your own IPF algorithm. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between createPartition and createfold functions
Hi, On Sun, Oct 2, 2011 at 2:47 PM, bby2...@columbia.edu wrote: Hello, I'm trying to separate my dataset into 4 parts with the 4th one as the test dataset, and the other three to fit a model. I've been searching for the difference between these 2 functions in Caret package, but the most I can get is this-- A series of test/training partitions are created using createDataPartition while createResample creates one or more bootstrap samples. createFolds splits the data into k groups. I'm missing something here? What is the difference btw createPartition and createFold? I guess they wouldn't be equivalent. Well -- you could always look at the source code to find out (enter the name of the function into your R console and hit return), but you can also do some experimentation to find out. Using the data from the Examples section of caret::createFolds: R library(caret) R data(oil) R part - createDataPartition(oilType, 2) R fold - createFolds(oilType, 2) R length(Reduce(intersect, part)) [1] 27 R length(Reduce(intersect, fold)) [1] 0 Looks like `createDataPartition` split your data into smaller pieces, but allows for the same example to appear in different splits. `createFolds` doesn't allow different examples to appear in different splits of the folds. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scatterplot with the 3rd dimension = color?
I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? Thanks, KB __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is member
Dear all, I would like to thank you for you answers This worked for me Browse[1] match(Test,seq(1,C,FrN),nomatch=FALSE) [1] 1 0 2 3 0 0 4 0 0 5 0 0 6 7 0 0 8 0 [19] 0 9 0 10 11 0 0 12 0 0 13 14 0 15 0 16 0 0 [37] 17 18 19 0 0 20 21 22 23 0 0 24 0 25 0 0 26 0 [55] 0 27 29 0 30 31 0 32 0 0 33 34 0 0 37 0 38 0 [73] 0 0 39 0 40 0 41 0 42 43 46 47 0 48 0 0 49 51 [91] 0 0 52 0 53 0 0 54 55 0 0 56 0 57 58 59 0 0 [109] 60 61 0 0 62 63 64 65 67 68 69 70 71 72 73 74 75 0 [127] 76 77 79 0 80 0 81 82 83 84 85 86 0 87 0 88 89 90 [145] 0 0 91 92 93 94 0 95 0 96 97 98 99 0 0 100 0 0 [163] 101 102 0 0 103 0 0 0 104 0 0 105 0 0 106 0 107 0 [181] 108 0 109 110 111 0 0 112 0 113 0 114 0 115 116 117 118 119 [199] 120 121 122 123 124 125 126 127 129 130 131 132 133 134 135 0 136 137 [217] 0 138 0 139 140 141 0 142 0 0 143 144 0 0 145 0 146 0 [235] 0 147 0 148 149 150 0 151 152 153 0 0 154 156 157 158 0 159 [253] 160 161 162 163 164 165 166 167 0 168 169 170 171 172 173 0 0 174 [271] 0 175 176 177 178 179 180 181 182 183 184 185 0 186 187 0 188 0 [289] 189 190 191 192 0 193 194 195 196 197 198 199 200 What I want to do now is to keep all the vector elements (only numbers) without the zeros!. How I can do that? B.R Alex From: William Dunlap wdun...@tibco.com Sent: Saturday, October 1, 2011 12:11 AM Subject: RE: [R] is member Someone already suggested that you use match(), which does what I think you want. Read its help file for details. A - seq(1,113,4) match(c(9, 17, 18), A) [1] 3 5 NA Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Sent: Friday, September 30, 2011 2:07 PM To: William Dunlap; R-help@r-project.org Subject: Re: [R] is member Thanks a lot! This works. Now I want to do the opposite let's say that I have one sequence for example check in image http://imageshack.us/photo/my-images/4/unleduso.png/ column A (this is a seq(1,113,4) and I want when I get the number 9 to say that this is the third number in the seq (1,113,4). everything about the seq(1,113,4) is known and I want when I get one of the number of the sequence to say which is its order. How I can do that? B.R A;ex From:William Dunlap wdun...@tibco.com Sent: Friday, September 30, 2011 6:34 PM Subject: RE: [R] is member is.element(myvector, seq(1,800,4)) or, if you like typing percent signs, myvector %in% seq(1,800,4) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alaios Sent: Friday, September 30, 2011 9:26 AM To: R-help@r-project.org Subject: [R] is member Dear all, I have a vector with number that some of them are part of the seq(1,800,4). How can I check which of the numbers belong to the seq(1,800,4) LEt's say that is called myvector the vector with the numbers. Is there in R something like this? is.member(myvector,seq(1,800,4)) I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot with the 3rd dimension = color?
Here is one: http://cran.r-project.org/web/packages/scatterplot3d/index.html In the future, consider first searching: http://finzi.psych.upenn.edu/search.html http://rseek.org/ etc... Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Oct 2, 2011 at 7:11 PM, Kerry kbro...@gmail.com wrote: I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? Thanks, KB __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between createPartition and createfold functions
Hi Steve, Thanks for the note. I did try the example and the result didn't make sense to me. For splitting a vector, what you describe is a big difference btw them. For splitting a dataframe, I now wonder if these 2 functions are the wrong choices. They seem to split the columns, at least in the few things I tried. Bonnie Quoting Steve Lianoglou mailinglist.honey...@gmail.com: Hi, On Sun, Oct 2, 2011 at 2:47 PM, bby2...@columbia.edu wrote: Hello, I'm trying to separate my dataset into 4 parts with the 4th one as the test dataset, and the other three to fit a model. I've been searching for the difference between these 2 functions in Caret package, but the most I can get is this-- A series of test/training partitions are created using createDataPartition while createResample creates one or more bootstrap samples. createFolds splits the data into k groups. I'm missing something here? What is the difference btw createPartition and createFold? I guess they wouldn't be equivalent. Well -- you could always look at the source code to find out (enter the name of the function into your R console and hit return), but you can also do some experimentation to find out. Using the data from the Examples section of caret::createFolds: R library(caret) R data(oil) R part - createDataPartition(oilType, 2) R fold - createFolds(oilType, 2) R length(Reduce(intersect, part)) [1] 27 R length(Reduce(intersect, fold)) [1] 0 Looks like `createDataPartition` split your data into smaller pieces, but allows for the same example to appear in different splits. `createFolds` doesn't allow different examples to appear in different splits of the folds. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot with the 3rd dimension = color?
On 11-10-02 1:11 PM, Kerry wrote: I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? It's not a particularly effective display, but here's how to do it. Use rainbow(101) in place of rev(heat.colors(101)) if you like. x - rnorm(10) y - rnorm(10) z - rnorm(10) colors - rev(heat.colors(101)) zcolor - colors[(z - min(z))/diff(range(z))*100 + 1] plot(x,y,col=zcolor) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Studio and Rcmdr/RcmdrPlugins
Hi Erin, The last I checked - it was not possible. However, the place to ask this is here: http://support.rstudio.org/help/discussions Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Oct 2, 2011 at 4:42 AM, Erin Hodgess erinm.hodg...@gmail.comwrote: Dear R People: Hope you're having a great weekend! Anyhow, I'm currently experimenting with R Studio on a web server, which is the best thing since sliced bread, Coca Cola, etc. My one question: there is a way to show plots. is there a way to show Rcmdr or its Plugins, please? I tried, but it doesn't seem to work. Thanks so much, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between createPartition and createfold functions
Hi, On Sun, Oct 2, 2011 at 3:54 PM, bby2...@columbia.edu wrote: Hi Steve, Thanks for the note. I did try the example and the result didn't make sense to me. For splitting a vector, what you describe is a big difference btw them. For splitting a dataframe, I now wonder if these 2 functions are the wrong choices. They seem to split the columns, at least in the few things I tried. Sorry, I'm a bit confused now as to what you are after. You don't pass in a data.frame into any of the createFolds/DataPartition functions from the caret package. You pass in a *vector* of labels, and these functions tells you which indices into the vector to use as examples to hold out (or keep (depending on the value you pass in for the `returnTrain` argument)) between each fold/partition of your learning scenario (eg. cross validation with createFolds). You would then use these indices to keep (remove) the rows of a data.frame, if that is how you are storing your examples. Does that make sense? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error while using shapiro.test()
Em 1/10/2011 13:24, spicymchaggis101 escreveu: Thank you very much! your response solved my issue. I needed to determine the probability of normality for word types per page. You need to insure this assumption is reasonable for your problem domain as words types per page seems like count data for me and for this kind of data Gaussian distributions are at the very best last resort approximations. -- Cesar Rabak __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keep ALL duplicate records
Here is a function I use to find all duplicate records allDup - function (value) { duplicated(value) | duplicated(value, fromLast = TRUE) } x IDOS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 x[allDup(x$ID),] ID OS time 1 userA Win 12:22 3 userA Win 04:44 On Sun, Oct 2, 2011 at 2:18 PM, Pete Brecknock peter.breckn...@bp.com wrote: Erik Svensson wrote: Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine OS and time. (df-data.frame(ID=c(userA, userB, userA, userC), OS=c(Win,OSX,Win, Win64), time=c(12:22,23:22,04:44,12:28))) ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson How about ... # All records ALL_RECORDS - df[df$ID==df$ID[duplicated(df$ID)],] print(ALL_RECORDS) # Logical Records TRUE_FALSE - df$ID==df$ID[duplicated(df$ID)] print(TRUE_FALSE) HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all duplicate records
On Sun, Oct 2, 2011 at 10:05 AM, Erik Svensson erik.b.svens...@gmail.com wrote: Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine OS and time. (df-data.frame(ID=c(userA, userB, userA, userC), OS=c(Win,OSX,Win, Win64), time=c(12:22,23:22,04:44,12:28))) ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Try this: ave(rownames(df), df$ID, FUN = length) 1 [1] TRUE FALSE TRUE FALSE -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between createPartition and createfold functions
Basically, createDataPartition is used when you need to make one or more simple two-way splits of your data. For example, if you want to make a training and test set and keep your classes balanced, this is what you could use. It can also make multiple splits of this kind (or leave-group-out CV aka Monte Carlos CV aka repeated training test splits). createFolds is exclusively for k-fold CV. Their usage is simular when you use the returnTrain = TRUE option in createFolds. Max On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Sun, Oct 2, 2011 at 3:54 PM, bby2...@columbia.edu wrote: Hi Steve, Thanks for the note. I did try the example and the result didn't make sense to me. For splitting a vector, what you describe is a big difference btw them. For splitting a dataframe, I now wonder if these 2 functions are the wrong choices. They seem to split the columns, at least in the few things I tried. Sorry, I'm a bit confused now as to what you are after. You don't pass in a data.frame into any of the createFolds/DataPartition functions from the caret package. You pass in a *vector* of labels, and these functions tells you which indices into the vector to use as examples to hold out (or keep (depending on the value you pass in for the `returnTrain` argument)) between each fold/partition of your learning scenario (eg. cross validation with createFolds). You would then use these indices to keep (remove) the rows of a data.frame, if that is how you are storing your examples. Does that make sense? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is the output of survfit.coxph survival or baseline survival?
Dear all, I am confused with the output of survfit.coxph. Someone said that the survival given by summary(survfit.coxph) is the baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. Which one is correct? The ³baseline survival², which is the survival for a hypothetical subject with all covariates=0, may be useful mathematical shorthand when writing a book but I cannot think of a single case where the resulting curve would be of any practical interest in medical data. For this reason my survival routines in R NEVER return it. (Ask yourself ³what is the survival for someone with blood pressure=0, cholesterol=0, weight=0, ². The answer is that they are either non-existent or dead). The intention with survfit is that you will give it a second data set containing one or more lines, each of which describes a subject whose predicted survival is of interest. If no such data is given, the survival for someone with all covariates = to the mean is given. This is better than covariates =0, but sometimes not by much. (What if sex were coded as a 0/1 numeric do we get the survival of a hermaphrodite?) Your best approach is to forget the phrase ³baseline survival² and focus on covariate sets of interest to you. Terry Therneau [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about the array transpose
Hi, all, I am a newbie for [R] Would anyone help me how to transpose a 3x3x3 array for 1:27 Eg. A-array(1:27, c(3,3,3) What is the logic to transpose it to B-aperm(A, c(3,2,1)) Because I found I could not imagine how it transposes, anyone could solve my problem? And most important I could get the number what I expected, I think if I could not figure it out, I will have a confused concept which will affect my future learning of 3D models in [R]. Highly appreciated and thanks. VD -- View this message in context: http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3866241.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot with the 3rd dimension = color?
Yes, perfect! This I can work with. Thanks, KB On Oct 2, 3:55 pm, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 11-10-02 1:11 PM, Kerry wrote: I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? It's not a particularly effective display, but here's how to do it. Use rainbow(101) in place of rev(heat.colors(101)) if you like. x - rnorm(10) y - rnorm(10) z - rnorm(10) colors - rev(heat.colors(101)) zcolor - colors[(z - min(z))/diff(range(z))*100 + 1] plot(x,y,col=zcolor) Duncan Murdoch __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] patients.txt data
please send me the patients.txt data. thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] patients.txt data
I'm new to learning R. I'm taking a course and will need access to the patients.txt data to be able to do the exercises required using this dataset. thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot with the 3rd dimension = color?
Duncan Murdoch murdoch.duncan at gmail.com writes: On 11-10-02 1:11 PM, Kerry wrote: I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? It's not a particularly effective display, but here's how to do it. Use rainbow(101) in place of rev(heat.colors(101)) if you like. x - rnorm(10) y - rnorm(10) z - rnorm(10) colors - rev(heat.colors(101)) zcolor - colors[(z - min(z))/diff(range(z))*100 + 1] plot(x,y,col=zcolor) or d - data.frame(x,y,z) library(ggplot2) qplot(x,y,colour=z,data=d) I agree about the not particularly effective display comment, but if you have two continuous predictors and a continuous response you've got a tough display problem -- your choices are: 1. use color, size, or some other graphical characteristic (pretty far down on the Cleveland hierarchy) 2. use a perspective plot (hard to get the right viewing angle, often confusing) 3. use coplots/small multiples/faceting (requires discretizing one dimension) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rolling regression
Dear all, I have spent the last few days on a seemingly simple and previously documented rolling regression. I have a 60 year data set organized in a ts matrix. The matrix has 5 columns; cash_ret, epy1, ism1, spread1, unemp1 I have been able to come up with the following based on previous help threads. It seems to work fine. The trouble is I get regression coefficients but need the immediate next period forecast. cash_fit= rollapply(cash_data, width=60, function(x) coef(lm(cash_ret~epy1+ism1+spread1+unemp1, data = as.data.frame(x))), by.column=FALSE, align=right); cash_fit I tried to replace coef above to predict but I get a whole bunch of results too big to be displayed. I would be grateful if someone could guide me on how to get the next period forecast after each regression. If there is a possibility of getting the significance of each regressor and the standard error in addition to R-sq without having to spend the next week, that would be helpful as well. Many thanks, Darius [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] patients.txt data
Hi, On Sun, Oct 2, 2011 at 4:31 PM, Melhem, Nadine mel...@upmc.edu wrote: I'm new to learning R. I'm taking a course and will need access to the patients.txt data to be able to do the exercises required using this dataset. Without more context, I'm doubtful that anybody will be able to help you. I reckon your best bet will be to ask your instructor where you can find this sample data. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to format R superscript 2 followed by = value
I am trying to put an R2 value with R2 formatted with a superscript 2 followed by = and the value : the first mtext prints the R2 correctly formatted but follows it with =round(summary(mylm)$r.squared,3))) as text the second prints R^2 = followed by the value of round(summary(mylm)$r.squared,3))). how do I correctly write the expression to get formatted r2 followed by the value? x=runif(10) y=runif(10) summary(mylm-lm(y~x)) plot(x,y) abline(mylm) mtext(expression(paste(R^2,=,round(summary(mylm)$r.squared,3))),1) mtext(paste(expression(R^2),=,round(summary(mylm)$r.squared,3)),3) thanks Nevil Amos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function recode within sapply
Dear List, I am using function recode, from package car, within sapply, as follows: L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10, replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3, 10, replace=TRUE))) str(d) d[, c(fac1, fac2)] - sapply(d[, c(fac1, fac2)], recode, c('A', 'B') = 'XX', as.factor.result = TRUE) d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX') str(d) However, the class of columns fac1 and fac2 is character as opposed to factor, even though I specify the option as.factor.result = TRUE; this option works fine with a single column. Any thoughts? Many thanks, Lara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to format R superscript 2 followed by = value
Hi Nevil, Here is one option: ## function definition r2format - function(object, digits = 3, output, sub, expression = TRUE, ...) { if (inherits(object, lm)) { x - summary(object) } else if (inherits(object, summary.lm)) { x - object } else stop(object is an unmanageable class) out - format(x$r.squared, digits = digits) if (!missing(output)) { output - gsub(sub, out, output) } else { output - out } if (expression) { output - parse(text = output) } return(output) } ## model m - lm(mpg ~ hp * wt, data = mtcars) ## demonstration r2format(object = m, output = R^2 == rval, sub = rval, expression = TRUE) ## your problem x - runif(10) y - runif(10) mylm - lm(y ~ x) plot(x, y) abline(mylm) ## simplified version of demo mtext(r2format(m, 3, R^2 == rval, rval), 3) The real key is using == instead of =. The lengthy response is because I have been toying with and working with different stylers and formatters to try to facilitate getting output from R into publication format so I was interested in playing with this and thinking what might be useful abstractions. Anyway, more specific to your useage might be something like: substitute(expression(R^2 == rval), list(rval = round(summary(mylm)$r.squared,3))) Cheers, Josh On Sun, Oct 2, 2011 at 9:49 PM, Nevil Amos nevil.a...@gmail.com wrote: I am trying to put an R2 value with R2 formatted with a superscript 2 followed by = and the value : the first mtext prints the R2 correctly formatted but follows it with =round(summary(mylm)$r.squared,3))) as text the second prints R^2 = followed by the value of round(summary(mylm)$r.squared,3))). how do I correctly write the expression to get formatted r2 followed by the value? x=runif(10) y=runif(10) summary(mylm-lm(y~x)) plot(x,y) abline(mylm) mtext(expression(paste(R^2,=,round(summary(mylm)$r.squared,3))),1) mtext(paste(expression(R^2),=,round(summary(mylm)$r.squared,3)),3) thanks Nevil Amos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function recode within sapply
Hi Lara, Use lapply here instead of sapply or specify simplify = FALSE. See ?sapply for details. d[, c(fac1, fac2)] - lapply(d[, c(fac1, fac2)], recode, c('A', 'B') = 'XX', as.factor.result = TRUE) d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX') str(d) Cheers, Josh On Sun, Oct 2, 2011 at 10:16 PM, Lara Poplarski larapoplar...@gmail.com wrote: Dear List, I am using function recode, from package car, within sapply, as follows: L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10, replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3, 10, replace=TRUE))) str(d) d[, c(fac1, fac2)] - sapply(d[, c(fac1, fac2)], recode, c('A', 'B') = 'XX', as.factor.result = TRUE) d[, fac3] - recode(d[, fac3], c('A', 'B') = 'XX') str(d) However, the class of columns fac1 and fac2 is character as opposed to factor, even though I specify the option as.factor.result = TRUE; this option works fine with a single column. Any thoughts? Many thanks, Lara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.