Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
Oh, I got it! I was sending the fluorescence instead of the cycles x. Thank you ``` desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , 1:45) ``` On Wed, Mar 17, 2021 at 8:58 PM Duncan Murdoch wrote: > > On 17/03/2021 12:37 p.m., Luigi Marongiu wrote: > > sorry, I don't get it... > > Modify your rutledge function to print x, and you'll see the values of > high printed. x should be 1:45. > > Duncan Murdoch > > > > > On Wed, Mar 17, 2021 at 2:35 PM Duncan Murdoch > > wrote: > >> > >> On 17/03/2021 6:59 a.m., Luigi Marongiu wrote: > >>> yes, but in `rutledge` I model y as `y = (M / ( 1 + exp(-(x-m)/s)) ) + > >>> B`, with x being 1:45. Isn't that the equivalent of what I fed Desmos > >>> with? Tx > >> > >> No, it's not. > >> > >> Duncan Murdoch > >> > >>> > >>> On Wed, Mar 17, 2021 at 11:31 AM Duncan Murdoch > >>> wrote: > > On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: > > Hello, > > I have a dataset from a polymerase chain reaction. I am using the > > equation given by Rutledge > > (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R > > does not match the data. I ran the same thing in Desmos and instead > > the profile is correct (attached). > > Why do I not get the same matching model as in Desmos? I believe the > > formula in R is the same as the one in Desmos, and I am using the same > > parameters. > > Is there a procedure to debug models? > > Thanks > > > > Here is the code: > > ``` > > high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, > > -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, > > 15.01, > > 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, > > 5059.94, > > 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, > > 10077.19, > > 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, > > 11684.96, > > 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) > > plot(1:45, high, type = "l") > > rutledge <- function(p, x) { > > m = p$half_fluorescence > > s = p$slope > > M = p$max_fluorescence > > B = p$back_fluorescence > > y = (M / ( 1 + exp(-(x-m)/s)) ) + B > > return(y) > > } > > desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, > >max_fluorescence = 11839.8, back_fluorescence > > = -138.864) , high) > > > > points(1:45, desmos, type="l", col="blue") > > > In your calculation of desmos, you are using the Y variable for x in the > formula. Calculate it this way instead: > > desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, > max_fluorescence = 11839.8, > back_fluorescence > = -138.864) , 1:45) > > Duncan Murdoch > > >>> > >>> > >> > > > > > -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
On 17/03/2021 12:37 p.m., Luigi Marongiu wrote: sorry, I don't get it... Modify your rutledge function to print x, and you'll see the values of high printed. x should be 1:45. Duncan Murdoch On Wed, Mar 17, 2021 at 2:35 PM Duncan Murdoch wrote: On 17/03/2021 6:59 a.m., Luigi Marongiu wrote: yes, but in `rutledge` I model y as `y = (M / ( 1 + exp(-(x-m)/s)) ) + B`, with x being 1:45. Isn't that the equivalent of what I fed Desmos with? Tx No, it's not. Duncan Murdoch On Wed, Mar 17, 2021 at 11:31 AM Duncan Murdoch wrote: On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: Hello, I have a dataset from a polymerase chain reaction. I am using the equation given by Rutledge (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R does not match the data. I ran the same thing in Desmos and instead the profile is correct (attached). Why do I not get the same matching model as in Desmos? I believe the formula in R is the same as the one in Desmos, and I am using the same parameters. Is there a procedure to debug models? Thanks Here is the code: ``` high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, 11684.96, 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) plot(1:45, high, type = "l") rutledge <- function(p, x) { m = p$half_fluorescence s = p$slope M = p$max_fluorescence B = p$back_fluorescence y = (M / ( 1 + exp(-(x-m)/s)) ) + B return(y) } desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , high) points(1:45, desmos, type="l", col="blue") In your calculation of desmos, you are using the Y variable for x in the formula. Calculate it this way instead: desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , 1:45) Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
sorry, I don't get it... On Wed, Mar 17, 2021 at 2:35 PM Duncan Murdoch wrote: > > On 17/03/2021 6:59 a.m., Luigi Marongiu wrote: > > yes, but in `rutledge` I model y as `y = (M / ( 1 + exp(-(x-m)/s)) ) + > > B`, with x being 1:45. Isn't that the equivalent of what I fed Desmos > > with? Tx > > No, it's not. > > Duncan Murdoch > > > > > On Wed, Mar 17, 2021 at 11:31 AM Duncan Murdoch > > wrote: > >> > >> On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: > >>> Hello, > >>> I have a dataset from a polymerase chain reaction. I am using the > >>> equation given by Rutledge > >>> (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R > >>> does not match the data. I ran the same thing in Desmos and instead > >>> the profile is correct (attached). > >>> Why do I not get the same matching model as in Desmos? I believe the > >>> formula in R is the same as the one in Desmos, and I am using the same > >>> parameters. > >>> Is there a procedure to debug models? > >>> Thanks > >>> > >>> Here is the code: > >>> ``` > >>> high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, > >>> -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, > >>> 15.01, > >>> 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, > >>> 5059.94, > >>> 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, > >>> 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, > >>> 11684.96, > >>> 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) > >>> plot(1:45, high, type = "l") > >>> rutledge <- function(p, x) { > >>> m = p$half_fluorescence > >>> s = p$slope > >>> M = p$max_fluorescence > >>> B = p$back_fluorescence > >>> y = (M / ( 1 + exp(-(x-m)/s)) ) + B > >>> return(y) > >>> } > >>> desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, > >>> max_fluorescence = 11839.8, back_fluorescence > >>> = -138.864) , high) > >>> > >>> points(1:45, desmos, type="l", col="blue") > >> > >> > >> In your calculation of desmos, you are using the Y variable for x in the > >> formula. Calculate it this way instead: > >> > >> desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, > >> max_fluorescence = 11839.8, back_fluorescence > >>= -138.864) , 1:45) > >> > >> Duncan Murdoch > >> > > > > > -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
On 17/03/2021 6:59 a.m., Luigi Marongiu wrote: yes, but in `rutledge` I model y as `y = (M / ( 1 + exp(-(x-m)/s)) ) + B`, with x being 1:45. Isn't that the equivalent of what I fed Desmos with? Tx No, it's not. Duncan Murdoch On Wed, Mar 17, 2021 at 11:31 AM Duncan Murdoch wrote: On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: Hello, I have a dataset from a polymerase chain reaction. I am using the equation given by Rutledge (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R does not match the data. I ran the same thing in Desmos and instead the profile is correct (attached). Why do I not get the same matching model as in Desmos? I believe the formula in R is the same as the one in Desmos, and I am using the same parameters. Is there a procedure to debug models? Thanks Here is the code: ``` high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, 11684.96, 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) plot(1:45, high, type = "l") rutledge <- function(p, x) { m = p$half_fluorescence s = p$slope M = p$max_fluorescence B = p$back_fluorescence y = (M / ( 1 + exp(-(x-m)/s)) ) + B return(y) } desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , high) points(1:45, desmos, type="l", col="blue") In your calculation of desmos, you are using the Y variable for x in the formula. Calculate it this way instead: desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , 1:45) Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
yes, but in `rutledge` I model y as `y = (M / ( 1 + exp(-(x-m)/s)) ) + B`, with x being 1:45. Isn't that the equivalent of what I fed Desmos with? Tx On Wed, Mar 17, 2021 at 11:31 AM Duncan Murdoch wrote: > > On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: > > Hello, > > I have a dataset from a polymerase chain reaction. I am using the > > equation given by Rutledge > > (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R > > does not match the data. I ran the same thing in Desmos and instead > > the profile is correct (attached). > > Why do I not get the same matching model as in Desmos? I believe the > > formula in R is the same as the one in Desmos, and I am using the same > > parameters. > > Is there a procedure to debug models? > > Thanks > > > > Here is the code: > > ``` > > high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, > > -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, > > 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, > > 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, > > 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, > > 11684.96, > > 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) > > plot(1:45, high, type = "l") > > rutledge <- function(p, x) { > >m = p$half_fluorescence > >s = p$slope > >M = p$max_fluorescence > >B = p$back_fluorescence > >y = (M / ( 1 + exp(-(x-m)/s)) ) + B > >return(y) > > } > > desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, > > max_fluorescence = 11839.8, back_fluorescence > > = -138.864) , high) > > > > points(1:45, desmos, type="l", col="blue") > > > In your calculation of desmos, you are using the Y variable for x in the > formula. Calculate it this way instead: > > desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, >max_fluorescence = 11839.8, back_fluorescence > = -138.864) , 1:45) > > Duncan Murdoch > -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling 4-parameter curve in R does not match data - how to proceed?
On 17/03/2021 5:41 a.m., Luigi Marongiu wrote: Hello, I have a dataset from a polymerase chain reaction. I am using the equation given by Rutledge (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R does not match the data. I ran the same thing in Desmos and instead the profile is correct (attached). Why do I not get the same matching model as in Desmos? I believe the formula in R is the same as the one in Desmos, and I am using the same parameters. Is there a procedure to debug models? Thanks Here is the code: ``` high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, 11684.96, 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) plot(1:45, high, type = "l") rutledge <- function(p, x) { m = p$half_fluorescence s = p$slope M = p$max_fluorescence B = p$back_fluorescence y = (M / ( 1 + exp(-(x-m)/s)) ) + B return(y) } desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , high) points(1:45, desmos, type="l", col="blue") In your calculation of desmos, you are using the Y variable for x in the formula. Calculate it this way instead: desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , 1:45) Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] modelling 4-parameter curve in R does not match data - how to proceed?
Hello, I have a dataset from a polymerase chain reaction. I am using the equation given by Rutledge (https://pubmed.ncbi.nlm.nih.gov/15601990/) but the profile I get in R does not match the data. I ran the same thing in Desmos and instead the profile is correct (attached). Why do I not get the same matching model as in Desmos? I believe the formula in R is the same as the one in Desmos, and I am using the same parameters. Is there a procedure to debug models? Thanks Here is the code: ``` high <- c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, 11684.96, 11781.77, 11863.35, 11927.44, 11980.81, 12021.88) plot(1:45, high, type = "l") rutledge <- function(p, x) { m = p$half_fluorescence s = p$slope M = p$max_fluorescence B = p$back_fluorescence y = (M / ( 1 + exp(-(x-m)/s)) ) + B return(y) } desmos <- rutledge(list(half_fluorescence = 27.1102, slope = 2.76798, max_fluorescence = 11839.8, back_fluorescence = -138.864) , high) points(1:45, desmos, type="l", col="blue") ``` -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling non-Negative Time Series
https://cran.r-project.org/web/packages/tsintermittent/tsintermittent.pdf Best, -- GG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling non-Negative Time Series
Dear All, I am struggling to develop a model to forecast the daily expenses from a bank account. The daily time series consists (obviously) of non-negative numbers which can be zero in the days when no money is taken from the bank account. To give you an idea of the kind of series I am dealing with, please have a look at myts<-structure(c(5.5, 0, 126.93, 0, 0, 0, 0, 10, 0, 135.34, 0, 0, 0, 0, 98.21, 0, 112.38, 0, 0, 0, 0, 0, 1373.77, 151.83, 26.66, 205.5, 129.33, 172.5, 0, 10, 131.09, 0, 0, 0, 0, 0, 689, 0, 0, 0, 0, 0, 0, 60.6, 183, 98.21, 0, 0, 0, 0, 1433.79, 175.89, 0, 0, 0, 200, 134.33, 98.26, 112.21, 0, 0, 0, 0, 0, 0, 112.31, 0, 0, 0, 0, 120, 0, 350, 0, 0, 98.21, 0, 0, 0, 113.24, 0, 0, 0, 0, 15, 696.65, 321.87, 929, 210.58, 0, 0, 10), .Tsp = c(16563, 16654, 1), class = "ts") (the time origin is a bit funny, but what matters is that I have daily data). Do you know any R package to handle this kind of series? I think I am outside the domain of the ARIMA approach. I experimented with acp and tscount (to see if I could treat the series as an autoregressive Poisson series), but I did not get very far. Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling categorical variables
Hi, I am a beginner with statistics and R and have no clue on how to model my data. I have collected information on seed traps (ID) that includes the habitat type (Hab) and different measures of distances. Also I have applied a modularity analysis, so that the seeds traps are grouped into modules. My dataset is as follow: *IDHab ModuleDistEdgeMeanDist1MeanDist2MeanDist3 F48F A 21.768 24.941 6.033 27.642F50F E 35.666** 60.505 149.927 * * 48.582 F52F B** 12.243** 103.041 72.908 * *102.375N02N B **58.681** 129.59 127.344 * * 131.383 N17N B** 62.829** 72.827 ** 76.736 * *77.644 N22N B** 89.207** 78.719 ** 75.005 * * 81.176N33N A** 23.288** 35.48** 25.317 * * 36.931N40N B** 36.734** 62.234 ** 30.68 * * 61.885N47N E ** 60.443** 66.367 ** 150.892 ** 55.097 * I am looking for a way to analyze if there is any correlation between the Module classification and the other variables. My difficulties here are: 1 - is there a way to model my data where Module is the response variable (something like Module~Hab*DistEdge*MeanDist1) ? If so, which model should I use (I only have a bit of experience with glm) and which distribution? 2 - Is that a problem if I have different types of predictor variable (factor and numerical)? Any help would be greatly appreciated, -- Jessica Lavabre-Micas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling categorical variables
Hi You probably wont get many answers because: 1 -you post in HTML (post in plain text) 2 -you provide data which are unreadable (copy output of dput(yourdata) instead) 3 -you ask statistical question which are rarely answered here (they are better suited to stackexchange list) Regarding your models nothing prevents you to test any of them - lm, glm, ... Or go through some available documents on CRAN like e.g. Using R for Data Analysis and Graphics - Introduction, Examples and Commentary” by John Maindonald (PDF, data sets and scripts are available at JM's homepage). “Practical Regression and Anova using R” by Julian Faraway (PDF, data sets and scripts are available at the book homepage). among many others to learn how to use R for modelling. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jessica > Lavabre > Sent: Tuesday, September 08, 2015 11:01 AM > To: r-help@r-project.org > Subject: [R] Modelling categorical variables > > Hi, > > I am a beginner with statistics and R and have no clue on how to model > my data. I have collected information on seed traps (ID) that includes > the habitat type (Hab) and different measures of distances. Also I have > applied a modularity analysis, so that the seeds traps are grouped into > modules. My dataset is as follow: > > > > *IDHab ModuleDistEdgeMeanDist1MeanDist2MeanDist3 > F48F A 21.768 24.941 6.033 > 27.642F50F E 35.666** 60.505 > 149.927 * > * 48.582 F52F B** 12.243** 103.041 > 72.908 * > *102.375N02N B **58.681** > 129.59 127.344 * > * 131.383 N17N B** 62.829** 72.827 ** > 76.736 * > *77.644 N22N B** 89.207** 78.719 ** > 75.005 * > * 81.176N33N A** 23.288** 35.48** > 25.317 * > * 36.931N40N B** 36.734** 62.234 ** > 30.68 * > * 61.885N47N E ** 60.443** 66.367 ** > 150.892 ** 55.097 * > > I am looking for a way to analyze if there is any correlation between > the Module classification and the other variables. My difficulties here > are: > 1 - is there a way to model my data where Module is the response > variable (something like Module~Hab*DistEdge*MeanDist1) ? If so, which > model should I use (I only have a bit of experience with glm) and which > distribution? > 2 - Is that a problem if I have different types of predictor variable > (factor and numerical)? > > Any help would be greatly appreciated, > > -- Jessica Lavabre-Micas > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not author
[R] Modelling categorical and non-categorical datasets using Artifical Neural Networks
Hi, I have to do some Radon modelling and I have categorical and non categorical datasets. I have been considering Artificial Neural Networks to do this. I was wondering has anybody done anything like this before and have you any advice before I start and where there might be some good tutorials on this type of modelling. Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling categorical and non-categorical datasets using Artifical Neural Networks
On Wed, 29 May 2013, Shane Carey wrote: I have to do some Radon modelling and I have categorical and non categorical datasets. I have been considering Artificial Neural Networks to do this. I was wondering has anybody done anything like this before and have you any advice before I start and where there might be some good tutorials on this type of modelling. Shane, What questions do you want to answer? Rich -- Richard B. Shepard, Ph.D. | Have knowledge, will travel. Applied Ecosystem Services, Inc. | http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling and R misconceptions; was: package installtion
This is hopeless, since you never seem to listen to our advice, therefore this will be my very last try: So you actually need local advice, both for statistical concepts and R related. No statistics software can estimate effects of variables that you observed to be constant (e.g. 0) all the time. If any software does, please delete it a once from your machine. Instead, ask a local statistician for advice on your problem. You certainly want to show the data and your model to the local expert - since you don't show us. And then you want to ask for local R course since reading the documentation seems not to help. Applying mtrace() in a non exiting object shows this straight away. Uwe Ligges On 17.11.2011 15:49, Scott Raynaud wrote: I believe the problem is a column of zeroes in my x matrix. I have tried the suggestions in the documentation, so now to try to confirm the probelm I'd like to run debug. Here's where I think the problem is: ###~~ Fitting the model using lmer funtion~~### (fitmodel- lmer(modelformula,data,family=binomial(link=logit),nAGQ=1)) mtrace(fitmodel) I added the mtrace to catch the error, but get the following: Error in mtrace(fitmodel) : Can't find fitmodel How can I debug this? - Original Message - From: Rolf Turnerrolf.tur...@xtra.co.nz To: Scott Raynaudscott.rayn...@yahoo.com Cc: r-help@r-project.orgr-help@r-project.org Sent: Wednesday, November 16, 2011 6:04 PM Subject: Re: [R] package installtion On 17/11/11 05:37, Scott Raynaud wrote: That might be an option if it weren't my most important predictor. I'm thinking my best bet is to use MLWin for the estimation since it will properly set fixed effects to 0. All my other sample size simulation programs use SAS PROC IML which I don't have/can't afford. I like R since it's free, but I can't work around the problem I'm currently having. This is the ``push every possible button until you get a result and to hell with what anything actually means'' approach to statistics. The probability of getting a *meaningful* result from this approach is close to zero. Why don't you try to *understand* what is going on, rather than wildly throwing every possible piece of software at the problem until one such piece runs? cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling and R misconceptions; was: package installtion
My responses are in brackets below, plus a final note after the main text. - Original Message - From: Uwe Ligges lig...@statistik.tu-dortmund.de To: Scott Raynaud scott.rayn...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, November 17, 2011 9:16 AM Subject: Re: [R] modelling and R misconceptions; was: package installtion This is hopeless [That's a matter of perception-even concentration camp prisoners found a way to hope (see Viktor Frankl)], since you never [never is a strong word and many times leads to cognitive errors] seem to listen to our advice [It's possible that I misunderstood your recommedations (more likely), or that you communicated poorly (less likely)], therefore this will be my very last try: So you actually need local advice [Yes I need advice-that's why I post here!], both for statistical concepts and R related [I don't claim to be a statistical genius, but I can hold my own. Now, R is a different matter]. No statistics software can estimate effects of variables that you observed to be constant (e.g. 0) all the time [I think you misuderstood my intentions-I never wanted to estimate effects that are 0 all of the time]. If any software does, please delete it a once from your machine. Instead, ask a local statistician for advice on your problem. You certainly want to show the data and your model to the local expert - since you don't show us. [I gave a detailed explanation in a previous post which I repeat here: |OK, I'm using William Browne's MLPowSim to create an R script which will simulate samples for estimation of sample size in mixed models. I have subjects | nested in hospitals with hospitals treated as random and all of my covariates at level 1. My outcome is death, so it's binary and I'll have a fixed and |random intercept. My interest is in the relation of the covariates to the outcome. | |My most important variable is gestational age (GA) which my investigators divide thusly: 23-24, 25-26, 27-28, 29-30 and 31-32. I have recoded the | dummies for GA in the script according to the MLPowSim instructions to a random multinomial variable: | | macpred-rmultinom(n2,1,c(.1031,.1482,.2385,.4404,.0698)) | x[,3]-macpred[1,][l2id] | x[,4]-macpred[2,][l2id] | x[,5]-macpred[3,][l2id] | x[,6]-macpred[4,][l2id] | |GA 23-24 is the reference with p=.0698. I started with a structured sampling scheme of 20, 60, 100, 120 and 140 level 2 units. My level 2 units have |different sizes. So at 20 I had 5 hospitals with 100 patients, 4 with 280, 3 with 460, 3 with 640, 3 with 820 and 2 with 1000. Thus, at 60 hospitals, I have 15, |12, 9, 9, 9, 6 with the same cell sample sizes. | |According to the MLPowSim documentation, with small probablities it's possible to have a column of zeroes in the X matrix if there are not many units in |the random factor. R will choke on this but MLWin sets the associated fixed effects to 0. When R choked, I increased from 20 to 60 as my minimum as |suggested in the MLPowSim documentation. Still no luck. Since this is a simulation, I assume once and a while that by chance a coefficient could be 0. In fact, Browne mentions as much in his documentation. There is a bit more to my simulation, but I thought I'd try to keep it as simple as possible, at least at the outset.] And then you want to ask for local R course since reading the documentation seems not to help [You got that right!]. Applying mtrace() in a non exiting object shows this straight away. Uwe Ligges Apparently I misuderstood the prupose of mtrace after reading the documentation-I thought it was to debug problems of the sort I've encountered. Michael Weylandt provided appropriate direction in the previous post for which I am grateful. Not all of us can be intellectual superstars. That's why we ask for help. This much I did read and understand from the R posting guide: Responding to other posts: * Rudeness and ad hominem comments are not acceptable. Brevity is OK. It's a good lesson to learn. On 17.11.2011 15:49, Scott Raynaud wrote: I believe the problem is a column of zeroes in my x matrix. I have tried the suggestions in the documentation, so now to try to confirm the probelm I'd like to run debug. Here's where I think the problem is: ###~~ Fitting the model using lmer funtion ~~### (fitmodel- lmer(modelformula,data,family=binomial(link=logit),nAGQ=1)) mtrace(fitmodel) I added the mtrace to catch the error, but get the following: Error in mtrace(fitmodel) : Can't find fitmodel How can I debug this? - Original Message - From: Rolf Turnerrolf.tur...@xtra.co.nz To: Scott Raynaudscott.rayn...@yahoo.com Cc: r-help@r-project.orgr-help@r-project.org Sent: Wednesday, November 16, 2011 6:04 PM Subject: Re: [R] package installtion On 17/11/11 05:37, Scott Raynaud wrote
[R] Modelling poisson distribution with variance structure
I'm dealing with count data that's nested and has spatial dependence. I ran a glmm in lmer with a random factor for nestedness. Spatial dependence seems to have been accommodated by model. However I can't add a variance strcuture to this model (to accommodate heterogeneity). Is there a model that can have a poisson distribution *AND* a variance structure *AND* have AIC in output (for model comparison and selection)? Some we've looked at that can't: - glmmPQL - can add structures BUT can't have AIC (you can calculate it but it doesn't give correct AIC with this model) - glmm in lme4 (lmer) - won't allow variance structure - gls - can add variance but can't have Poisson Thanks so much, Karen Moore PhD Researcher, FORESTBIO, Department of Botany, Trinity College Dublin Ireland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling poisson distribution with variance structure
Karen Moore kmoore at tcd.ie writes: I'm dealing with count data that's nested and has spatial dependence. I ran a glmm in lmer with a random factor for nestedness. Spatial dependence seems to have been accommodated by model. However I can't add a variance strcuture to this model (to accommodate heterogeneity). Is there a model that can have a poisson distribution *AND* a variance structure *AND* have AIC in output (for model comparison and selection)? Some we've looked at that can't: - glmmPQL - can add structures BUT can't have AIC (you can calculate it but it doesn't give correct AIC with this model) - glmm in lme4 (lmer) - won't allow variance structure - gls - can add variance but can't have Poisson [Any further discussion should probably go to r-sig-mixed-mod...@r-project.org ...] I'm not sure I know what you mean by Poisson + variance structure -- if the data are really Poisson (not overdispersed in some way), then the variance structure is completely defined. If you want to deal with overdispersion, and have a well-defined AIC, you may be able to add a per-observation random effect in lme4. Alternatively, you could just use a weights= argument in glmmPQL to set some sensible mean-variance relationship, overlooking the fact that the data are discrete and positive rather than being normally distributed with an equivalent variance structure. http://glmm.wikidot.com/faq may also be useful. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling survival with time-dependent covariates
1-Would informing the algorithm coxph which samples represents the same person (through the use of an Id for example) improve the ?efficiency? of the estimated model? And if so, how should i do that? Using strata()? No, it makes no change. The reason is that the (start, stop] is just a trick. At each death time the program needs to figure out what the covariates are for everyone else at that time; the start,stop lets it pick the right line for each subject. As long as there are no overlaps, i.e. (0,20], (15, 50], then there is only one copy of the person, and no 'correlated data' issue. (Overlap is wierd -- it corresponds to two copies of me being in the room at the same time.) If there are multiple events for a subject, then there is correlation (via a different mechanism), and addition of a cluster() term is needed. 2- He later suggests ?accommodating non-proportional hazards by building interactions between covariates and time into the Cox regression model? as follows: coxph(Surv(start, stop, arrest.time) ~fin + age + age:stop + prio, ... This trick ONLY works if a. the data set has been artificially divided (as your example has) into small uniform time increments, the same for each subject. b. the form of the non-ph is actally a linear change in beta over time. Use cox.zph on the original model to look at this. When I see non-ph (the plot from cox.zph is not horizontal) life is rarely so simple. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling survival with time-dependent covariates
Hi all, I am looking at the tutorial/appendix from John Fox on “Cox Proportional-Hazards Regression for Survival Data” available here: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf I am particularly interested in modelling survival with time-dependent covariates (Section 4). The data look like this: Rossi.2[1:50,] start stop arrest.time week arrest fin age race wexp mar paro prio educ employed 0 1 0 20 1 0 27 1 0 0 1 3 3 0 1 2 0 20 1 0 27 1 0 0 1 3 3 0 ... 18 19 0 20 1 0 27 1 0 0 1 3 3 0 19 20 1 20 1 0 27 1 0 0 1 3 3 0 0 1 0 17 1 0 18 1 0 0 1 8 4 0 1 2 0 17 1 0 18 1 0 0 1 8 4 0 ... 15 16 0 17 1 0 18 1 0 0 1 8 4 0 16 17 1 17 1 0 18 1 0 0 1 8 4 0 0 1 0 25 1 0 19 0 1 0 1 13 3 0 1 2 0 25 1 0 19 0 1 0 1 13 3 0 ... 3.13 12 13 0 25 1 0 19 0 1 0 1 13 3 0 John suggests the following model: mod.allison.2 - coxph(Surv(start, stop, arrest.time) ~ + fin + age + race + wexp + mar + paro + prio + employed, + data=Rossi.2) 1-Would informing the algorithm coxph which samples represents the same person (through the use of an Id for example) improve the “efficiency” of the estimated model? And if so, how should i do that? Using strata()? 2- He later suggests “accommodating non-proportional hazards by building interactions between covariates and time into the Cox regression model” as follows: mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~ + fin + age + age:stop + prio, + data=Rossi.2) I have read quite a lot of documentation to understand the meaning of “age + age:stop” in the formula, but I am unsure of what it means. If I wanted to visualise these variables which are entering the model, would it be something like: data.frame(Rossi.2$age,Rossi.2$age %in% Rossi.2$stop) I hope this make sense. Thanks for your help, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling Crystal Growth
Dear all, I would like to hear from anyone who has experience using R to simulate and visualise the formation and growth of crystals. Thank you. mpl -- View this message in context: http://r.789695.n4.nabble.com/Modelling-Crystal-Growth-tp2268746p2268746.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling
Dear R-users, Suppose I have the followin g sample of data, 0 1 2 4 3 1 2 1 3 1 1 3 3 4 1 0 1 2 1 2 1 4 1 4 2 1 2 2 1 1 The first variable is the response variable where 0 is defective and 1 normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I want to fit a binomial model in R . I want also to rder the factors based on their degree of influence the outcome. How do I do this in R. thanks in advance Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an incomplete Poisson distribution ?
Dear Ben, Le samedi 18 avril 2009 à 23:37 +, Ben Bolker a écrit : Emmanuel Charpentier charpent at bacbuc.dyndns.org writes: I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a zero-deflated Poisson. An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... I would call this a truncated Poisson distribution, related to hurdle models. You could probably use the hurdle function in the pscl package to do this, by ignoring the fitting to the zero part of the model. On the other hand, it might break if there are no zeros at all (adding some zeros would be a pretty awful hack/workaround). Indeed ... If you defined a dtpoisson() for the distribution of the truncated Poisson model, you could probably also use bbmle with the formula interface and the parameters argument. This I was not aware of... Thank you for the tip ! By the way, I never delved into stats4, relying (erroneously) on its one-liner description in CRAN, which is (more or less) an implementation of stats with S4 classes. Therefore mle escaped me also... May I suggest a better one-liner ? (This goes also for bbmle...) The likelihood ratio test seems absolutely appropriate for this case. Why not? Few data, therefore a bit far from the asymptotic condition... Anyway, a better look at my data after discovering a bag'o bugs in the original files led me to reconsider my model : I wanted to assess, among other things, the effect of a boolean (really a two-class variable). After the *right* graph, I now tend to think that my distribution is indeed zero-deflated Poisson in one group and ... zero-deflated negative binomial in the other (still might be zero-deflated Poisson with very small mean, hard to say graphically...). Which gives me some insights on the mechanisms at work (yippeee !!) but will require some nonparametric beast for assessing central value differences (yes, this still has a meaning...), such as bootstrap. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling an incomplete Poisson distribution ?
Dear list, I have the following problem : I want to model a series of observations of a given hospital activity on various days under various conditions. among my outcomes (dependent variables) is the number of patients for which a certain procedure is done. The problem is that, when no relevant patient is hospitalized on said day, there is no observation (for which the number of patients item would be 0). My goal is to model this number of patients as a function of the various conditions described by my independant variables, mosty of them observed but uncontrolled, some of them unobservable (random effects). I am tempted to model them along the lines of : glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) or (accounting for some random effects) : lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) While the preliminary analysis suggest that (the right part of) a Poisson distribution might be reasonable for all real observations, the lack of observations with count==0 bothers me. Is there a way to cajole glm (and lmer, by the way) into modelling these data to an incomplete Poisson model, i. e. with unobserved 0 values ? Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an incomplete Poisson distribution ?
I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a zero-deflated Poisson. An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... Any ideas ? Emmanuel Charpentier Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit : Dear list, I have the following problem : I want to model a series of observations of a given hospital activity on various days under various conditions. among my outcomes (dependent variables) is the number of patients for which a certain procedure is done. The problem is that, when no relevant patient is hospitalized on said day, there is no observation (for which the number of patients item would be 0). My goal is to model this number of patients as a function of the various conditions described by my independant variables, mosty of them observed but uncontrolled, some of them unobservable (random effects). I am tempted to model them along the lines of : glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) or (accounting for some random effects) : lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) While the preliminary analysis suggest that (the right part of) a Poisson distribution might be reasonable for all real observations, the lack of observations with count==0 bothers me. Is there a way to cajole glm (and lmer, by the way) into modelling these data to an incomplete Poisson model, i. e. with unobserved 0 values ? Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an incomplete Poisson distribution ?
Emmanuel Charpentier charpent at bacbuc.dyndns.org writes: I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a zero-deflated Poisson. An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... I would call this a truncated Poisson distribution, related to hurdle models. You could probably use the hurdle function in the pscl package to do this, by ignoring the fitting to the zero part of the model. On the other hand, it might break if there are no zeros at all (adding some zeros would be a pretty awful hack/workaround). If you defined a dtpoisson() for the distribution of the truncated Poisson model, you could probably also use bbmle with the formula interface and the parameters argument. The likelihood ratio test seems absolutely appropriate for this case. Why not? Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling probabilities instead of binary data with logisticregression
Hi Joris, glm() handles proportions but will give you a warning (and not an error) about non-integer values. So if you get an error then there should be something wrong with the syntax, model or data. Can you provide us with a reproducible example? Cheers, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens joris meys Verzonden: dinsdag 24 maart 2009 20:30 Aan: R-help Mailing List Onderwerp: [R] modelling probabilities instead of binary data with logisticregression Dear all, I have a dataset where I reduced the dimensionality, and now I have a response variable with probabilities/proportions between 0 and 1. I wanted to do a logistic regression on those, but the function glm refuses to do that with non-integer values in the response. I also tried lrm, but that one interpretes the probabilities as different levels and gives for every level a different intercept. Not exactly what I want... Is there a way to specify that the response variable should be interpreted as a probability? Kind regards Joris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling probabilities instead of binary data with logisticregression
Hi Thierry, You're very right and I was very wrong. It's a warning indeed, and the outcome is close to what I become if I calculate the logit transformation myself (although not exactly the same, but I guess that has to do with the correction the logit function does when p=0 or p=1). I should have paid more attention to the course of categorical ;-) Kind regards Joris On Wed, Mar 25, 2009 at 10:32 AM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Hi Joris, glm() handles proportions but will give you a warning (and not an error) about non-integer values. So if you get an error then there should be something wrong with the syntax, model or data. Can you provide us with a reproducible example? Cheers, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens joris meys Verzonden: dinsdag 24 maart 2009 20:30 Aan: R-help Mailing List Onderwerp: [R] modelling probabilities instead of binary data with logisticregression Dear all, I have a dataset where I reduced the dimensionality, and now I have a response variable with probabilities/proportions between 0 and 1. I wanted to do a logistic regression on those, but the function glm refuses to do that with non-integer values in the response. I also tried lrm, but that one interpretes the probabilities as different levels and gives for every level a different intercept. Not exactly what I want... Is there a way to specify that the response variable should be interpreted as a probability? Kind regards Joris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] modelling probabilities instead of binary data with logistic regression
Dear all, I have a dataset where I reduced the dimensionality, and now I have a response variable with probabilities/proportions between 0 and 1. I wanted to do a logistic regression on those, but the function glm refuses to do that with non-integer values in the response. I also tried lrm, but that one interpretes the probabilities as different levels and gives for every level a different intercept. Not exactly what I want... Is there a way to specify that the response variable should be interpreted as a probability? Kind regards Joris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modelling probabilities instead of binary data with logistic regression
Thank you all for the very fast answers. My proportions come from a factor analysis on a number of binary variables, in order to avoid having to fit 12 logistic regressions on the same dataset. By scaling the obtained scores to 0 and 1, I get weighted averages of the response combinations I'm interested in. I tried the betareg function, but that one can't deal with probabilities 0 and 1 unfortunately. I'll have to manually do the logit transformation, I'm afraid. Thanks for the help. Kind regards Joris On Tue, Mar 24, 2009 at 8:48 PM, Kjetil Halvorsen kjetilbrinchmannhalvor...@gmail.com wrote: You did'nt say how your proportions have arisen! If each corresonds to one observation, you could simply simulate indicator variables with those proportions as prob's, fit glm, repeat many times, and average results! More seriously, you could transform the proportions to logits logit - log(p/(1-p)) and fit a linear regression. Kjetil On Tue, Mar 24, 2009 at 3:30 PM, joris meys jorism...@gmail.com wrote: Dear all, I have a dataset where I reduced the dimensionality, and now I have a response variable with probabilities/proportions between 0 and 1. I wanted to do a logistic regression on those, but the function glm refuses to do that with non-integer values in the response. I also tried lrm, but that one interpretes the probabilities as different levels and gives for every level a different intercept. Not exactly what I want... Is there a way to specify that the response variable should be interpreted as a probability? Kind regards Joris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling disease spread
Thanks Marcel, In addition to your program and the reference to simecol, someone had replied to my private email pointing out RLadyBug: An R package for stochastic epidemic models which is on CRAN and which seems one of the most relevant. I write it here as a reference for users doing a future search in this archive. Regards. Bio7 wrote: The simecol package is maybe what you want. http://hhbio.wasser.tu-dresden.de/projects/simecol/ http://hhbio.wasser.tu-dresden.de/projects/simecol/ Another possibility is to use a program i've written. Here is a flash presentation maybe also interesting for you. http://www.uni-bielefeld.de/biologie/Oekosystembiologie/bio7app/flashtut/animaterplot.htm http://www.uni-bielefeld.de/biologie/Oekosystembiologie/bio7app/flashtut/animaterplot.htm -- View this message in context: http://www.nabble.com/Modelling-disease-spread-tp15459834p15480798.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling disease spread
I was at a lecture the other day and I saw a presentation of very neat (short) animation modeling epidemic disease spread over a map region. When I ask what software they used they mentioned SAS. Do you know if there are equivalent resources in R to model the spread of disease with animation output? My search in R-help and google didn't lead to any document (though I found a couple of documents for SAS). However my intuition telles me there must be a similar program in R. Thanks -- View this message in context: http://www.nabble.com/Modelling-disease-spread-tp15459834p15459834.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling disease spread
The simecol package is maybe what you want. http://hhbio.wasser.tu-dresden.de/projects/simecol/ http://hhbio.wasser.tu-dresden.de/projects/simecol/ Another possibility is to use a program i've written. Here is a flash presentation maybe also interesting for you. http://www.uni-bielefeld.de/biologie/Oekosystembiologie/bio7app/flashtut/animaterplot.htm http://www.uni-bielefeld.de/biologie/Oekosystembiologie/bio7app/flashtut/animaterplot.htm With kind regards Marcel -- View this message in context: http://www.nabble.com/Modelling-disease-spread-tp15459834p15460869.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.