[R] newly install old and present R versions
Hi, Moving to a new computer (Windows 7) but for reasons of reproduceability I would seriously like to also install my present R 2.8.1 along with all extension packages on that machine as well (that is besides R 2.14.1). What's the best way of doing so? My idea is: Get the setup for 2.8.1 and install it; during install neither select 'save version number in registry' nor 'associate Rdata files' to keep the registry clean. Manually Copy the extenxion packages from my present machine to the lib folder of 2.8.1 That should mean 2.8.1 should run. Then install 2.14.1 and download most up-to-date versions of packages. How do i detect if packages were e.g. renamed or merged? Also manually copying my old packages to the lib dir and then running update.packages(checkBuilt=TRUE, ask=FALSE)? Will that work under these circumstances as well? To make things more complicated I am also using Rtools. Is it possible to have two parallel versions of Rtools? Sure I can install it but I suppose as it sets some PATH variables only one can actually only be active at a time. Hence it might be smarter to avoid installing Rtools for 2.8.1 alltogether, that is live without it (having enough troubles with running Rtools for one version alongside with MinGW as also the path to g++ can conflict ...) thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking dates for entry errors
On 1/11/2012 11:07 PM, Paul Miller wrote: Hello Everyone, I have a question about how best to check dates for entry errors. Try using regular expression matching and the functions grep, strsplit, regexpr etc. If you are not familiar with regex: bit a bumpy road of getting into it but in the long term definitely worth the effort ! cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multinomial regression model
Hi, Consider the need for a regression model which can handle an ordered multinomial response variable. There are, for example, proportional odds / cumulative logit models, but actually the regression should include random effects (a mixed model), and I would not be aware of multinomial regression model as part of lme4 (am I wrong here ?). Further, the constraint of proportional odd models that predictors have the same relative impact across all levels, does most likely not hold for the study in question. I was wondering if an ordinary binomial mixed model can be turned in an multinomial one through preparing the input data.frame in a different way: Consider three response levels, A, B, C, ordered. I can accurately describe the occurrence of each of these three realizations using one to two Bernoulli random variables: Let P(X == A) = a P(X in {B, C}) = 1 - a P(X == B | X in {B, C}) = (1 - a) * b P(X == C | X in {B, C}) = (1 - a) * (1 - b) so the first comparison checks if A or either of B/C is the case, and the second, conditional on it's either B/C, checks which of these two holds. Sort of traversing sequentially the hierarchy of the ordered levels. In terms of the likelihood of the desired model, the probabilities on the right hand side would be exactly achieved if I use one input row in case the random variable takes on the value A and assign the response variable the value 0, while in the other cases the probabilities are achieved by using two input table rows, with the first one having value 1 for the response variable so the random variate is either B/C) and a second row with response equal to 0 if B is the case, and 1 otherwise, that is C is the case. Certainly, degrees of freedom must be manually adjusted in inferences, as every measured response should provide only a single degree of freedom. Question: Do I overlook here something, or is above outlined way a valid method to yield an ordered multinomial mixed model by tweaking the input table in such manners ? many thanks and best, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random effects in mixed model not that 'random'
HI, Thanks for your response; yes you are right it's not fully on topic, but I chose this list not only because I am using R for all my stats and so read it anyway, but also because here many statisticians read too. Do you know another list where my question is more appropriate ? For what it's worth, haven't found a local statistician yet to really answer the question, but I'll continue searching ... thanks, Thomas On 12/13/2009 11:07 AM, Daniel Malter wrote: Hi, you are unlikely to (or lucky if you) get a response to your question from the list. This is a question that you should ask your local statistician with knowledge in stats and, optimally, your area of inquiry. The list is (mostly) concerned with solving R rather than statistical problems. Best of luck, Daniel - cuncta stricte discussurus - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas Mang Sent: Friday, December 11, 2009 6:19 PM To: r-h...@stat.math.ethz.ch Subject: [R] random effects in mixed model not that 'random' Hi, I have the following conceptual / interpretative question regarding random effects: A mixed effects model was fit on biological data, with observations coming from different species. There is a clear overall effect of certain predictors (entering the model as fixed effect), but as different species react slightly differently, the predictor also enters the model as random effect and with species as grouping variable. The resulting model is very fine. Now comes the tricky part however: I can inspect not only the variance parameter estimate for the random effect, but also the 'coefficients' for each species. If I do this, suppose I find out that they make biologically sense, and maybe actually more sense then they should: For each species vast biological knowledge is available, regarding traits etc. So I can link the random effect coefficients to that knowledge, see the deviation from the generic predictor impact (the fixed effect) and relate it to the traits of my species. However I see the following problem with that approach: If I have no knowledge of the species traits, or the species names are anonymous to me, it makes sense to treat the species-specific deviations as realizations of a random variable (principle of exchangeability). Once I know however the species used in the study and have the biological knowledge at hand, it does not make so much sense any more; I can predict whether for that particular species the generic predictor impact will be amplified, or not. That is, I can predict if more likely the draw from the assumed normal distribution of the random effects will be 0, or 0 - which is of course complete contradictory and nonsense if I assume I have a random draw from a N(0, sigma) distribution. Integrating the biological knowledge as fixed effect however might be tremendously difficult, as species traits can sometimes not readily be quantified in a numeric way. I could defer issue to the species traits and say, once the species evolved their traits were drawn randomly from a population. This however causes problems with ideas of evolution and phylogenetic relationships among the species. Maybe my question can be rephrased the following way: Does it ever make sense to _interpret_ the coefficients of the random effects for each group and link it to properties of the grouping variable? The assumption of a realization of a random variable seems to render that quite problematic. However, this means that the more ignorant I am , and the less knowledge I have, the more the random realization seems to become realistic - which is at odds with scientific investigations. Suppose the mixed model is one of the famous social sciences studies analysing pupil results on tests at different schools, with schools acting as grouping variable for a random effect intercept. If I have no knowledge about the schools, the random effect assumption makes sense. If I however investigate the schools in detail (either a priori or a posterior), say teaching quality of the teachers, socio-economic status of the school area etc, it will probably make sense to predict which ones will have pupils performing above average, and which below average. However then probably these factors leading me to the predictions should enter the model as fixed effects, and maybe I don't need and school random effect any more at all. But this means actually the school deviation from the global mean is not the realization of a random variable, but instead the result of something quite deterministic, but which is usually just unknown, or can only be measured with extreme, impractical efforts. So the process might not be random, just because so little is known about the process, the results appear as if they would be randomly drawn (from a larger population distribution). Again, is ignorance / lack of deeper knowledge the key
Re: [R] random effects in mixed model not that 'random'
HI, Thanks for your input; see below On 12/13/2009 4:41 PM, Robert A LaBudde wrote: I think what you are finding is that calling a grouping variable a random effect is not the same thing as it actually being a random effect. An effect is really only random when it is chosen randomly. Just because you don't want to deal with it as a fixed effect (e.g., too many levels) doesn't mean it qualifies as a random effect. This sloppiness in common in mixed modeling. Well to some degree the species were chosen randomly, so there isn't a big selection bias in there. I also argue they wouldn't qualify as fixed effect (they might as stand-alone fixed effect factor, but definitely not as interaction with other predictors - there is no reason to believe the impact of predictors is totally independent across species). Sample size isn't the problem; I truly wouldn't want to include them as fixed effect based on expert knowledge. In your example of student scores, you mentioned the schools were a random effect, because they were a grouping variable. This is not true. Schools have a strong fixed effect. They are also not chosen randomly in your student. How to resolve your problem? Two methods: 1) Stop modeling the grouping variable as a random effect, when it's not: Model it as a fixed effect; 2) Do the experiment right: a) List the schools in their population. b) Chose the schools to be used by random sampling from that population. Then you will find schools really is a random effect. 1) does not seem to be the right solution. 2) is more interesting in terms of understanding: Are you saying that it's just the random choice of why something was included in the sample is what makes it qualify as random effect ? I thought the fact that it is the realization of a random variable (drawn from a N(0, sigma) distribution). These are two different things. Suppose I list all the schools in the population and randomly pick 15. IIUC, you would argue now it qualifies as random effect. However, once I have chosen my schools I could still investigate the estimated random effects coefficients, a posteriori investigate the schools and try to find out what discriminates those with students above average from those below average. Odds are, if I had the resources to make a thorough investigation, I would find something - or in other words, because there is something deterministic behind it, I would have said they are not the random realization from a normal distribution - which was my understanding of properties of random effects so far, but which might be wrong and hence the problem (although due to the complexity of this deterministic process, they might practically appear as random realizations). If I would pick a 16. school and then apply my knowledge from the investigations, I could probably say if it will be above or below average - this is what, in my understanding of random effects, actually would not qualify it as random effect, whereas according to you it would, if the school was chosen randomly. Is that correct ? Suppose I have chosen randomly: Does it make sense to investigate a posteriori why the estimates for the random effects are the way the are and find insights on the system, or would it not make sense as they are assumed complete random realization of a random variable and can be anything because they are random variable ? To some degree I think the issue can also be seen the following way: Conditional on my extensive knowledge of the school properties, the schools are probably not distributed iid. I could have this knowledge enter as fixed effect. But since this knowledge is usually not available the unconditional distribution might well make them iid N(0, sigma), and hence makes the schools qualify as grouping variable for random effects (where of course it is assumed that now sampling was done randomly from the population). But what shall I do if I have a bit of the extensive knowledge available - maybe too much to sticking to the complete unconditional iid assumption, but also not enough for a sensible conditional distribution to allow the specification of a fixed effect ? thanks Thomas What you have discovered is called selection bias. It is common in unrandomized studies. At 09:12 AM 12/13/2009, Thomas Mang wrote: HI, Thanks for your response; yes you are right it's not fully on topic, but I chose this list not only because I am using R for all my stats and so read it anyway, but also because here many statisticians read too. Do you know another list where my question is more appropriate ? For what it's worth, haven't found a local statistician yet to really answer the question, but I'll continue searching ... thanks, Thomas On 12/13/2009 11:07 AM, Daniel Malter wrote: Hi, you are unlikely to (or lucky if you) get a response to your question from the list. This is a question that you should ask your local statistician with knowledge in stats
[R] random effects in mixed model not that 'random'
Hi, I have the following conceptual / interpretative question regarding random effects: A mixed effects model was fit on biological data, with observations coming from different species. There is a clear overall effect of certain predictors (entering the model as fixed effect), but as different species react slightly differently, the predictor also enters the model as random effect and with species as grouping variable. The resulting model is very fine. Now comes the tricky part however: I can inspect not only the variance parameter estimate for the random effect, but also the 'coefficients' for each species. If I do this, suppose I find out that they make biologically sense, and maybe actually more sense then they should: For each species vast biological knowledge is available, regarding traits etc. So I can link the random effect coefficients to that knowledge, see the deviation from the generic predictor impact (the fixed effect) and relate it to the traits of my species. However I see the following problem with that approach: If I have no knowledge of the species traits, or the species names are anonymous to me, it makes sense to treat the species-specific deviations as realizations of a random variable (principle of exchangeability). Once I know however the species used in the study and have the biological knowledge at hand, it does not make so much sense any more; I can predict whether for that particular species the generic predictor impact will be amplified, or not. That is, I can predict if more likely the draw from the assumed normal distribution of the random effects will be 0, or 0 - which is of course complete contradictory and nonsense if I assume I have a random draw from a N(0, sigma) distribution. Integrating the biological knowledge as fixed effect however might be tremendously difficult, as species traits can sometimes not readily be quantified in a numeric way. I could defer issue to the species traits and say, once the species evolved their traits were drawn randomly from a population. This however causes problems with ideas of evolution and phylogenetic relationships among the species. Maybe my question can be rephrased the following way: Does it ever make sense to _interpret_ the coefficients of the random effects for each group and link it to properties of the grouping variable? The assumption of a realization of a random variable seems to render that quite problematic. However, this means that the more ignorant I am , and the less knowledge I have, the more the random realization seems to become realistic - which is at odds with scientific investigations. Suppose the mixed model is one of the famous social sciences studies analysing pupil results on tests at different schools, with schools acting as grouping variable for a random effect intercept. If I have no knowledge about the schools, the random effect assumption makes sense. If I however investigate the schools in detail (either a priori or a posterior), say teaching quality of the teachers, socio-economic status of the school area etc, it will probably make sense to predict which ones will have pupils performing above average, and which below average. However then probably these factors leading me to the predictions should enter the model as fixed effects, and maybe I don't need and school random effect any more at all. But this means actually the school deviation from the global mean is not the realization of a random variable, but instead the result of something quite deterministic, but which is usually just unknown, or can only be measured with extreme, impractical efforts. So the process might not be random, just because so little is known about the process, the results appear as if they would be randomly drawn (from a larger population distribution). Again, is ignorance / lack of deeper knowledge the key to using random effects - and the more knowledge I have, the less ? many thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MCMC sampling question
Hello, Consider MCMC sampling with metropolis / metropolis hastings proposals and a density function with a given valid parameter space. How are MCMC proposals performed if the parameter could be located at the very extreme of the parameter space, or even 'beyond that' ? Example to express it and my very nontechnical 'beyond that': The von Mises distribution is a circular distribution, describing directional trends. It has a concentration parameter Kappa, with Kappa 0. The lower kappa, the flatter the distribution, and for Kappa approaching 0, it converges into the uniform. Kappa shall be estimated [in a complex likelihood] through MCMC, with the problem that it is possible that there truly isn't any directional trend in the data at all, that is Kappa - 0; the latter would even constitute the H0. If I log-transform Kappa to get in on the real line, will the chain then ever fulfill convergence criteria ? The values for logged Kappa should be on average I suppose less and less all the time. But suppose it finds an almost flat plateau. How do I then test against the H0 - by definition, I'll never get a Kappa = 0 exactly; so I can't compare against that. One idea I had: Define not only a parameter Kappa, but also one of an indicator function, which acts as switch between a uniform and a vonMises distribution. Call that parameter d. I could then for example let d switch state with a 50% probability and then make usual acceptance tests. Is this approach realistic ? is it sound and solid or nonsense / suboptimal? Is there a common solution to the before mentioned problem ? [I suppose there is. Mixed effects models testing the variances of random effects for 0 should fall into the same kind of problem]. cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ordinal response model with spatial autocorrelation
Hi, [note: 4th posting trial - apologize if the other ones would ever show up...] I have a (3-level) ordinal response data set which needs the integration of an spatial autocorrelation structure. What packages / functions are available to fit such a thing ? The heterogeneous, cluster-alike structuring of the autocorrelation seems to make a mixed effects model with random factors capturing the different sampling regions as ideal. I failed however to find a function in nlme/lme4 or other packages capable of fitting ordinal data, and package 'DPpackage' uses an approach I am not really familiar with; the function DPMolmm also triggers errors, even for the provided examples. There is no sound a-priori reason to believe that the proportional odds assumptions apply - they may, or may not. Anyways I don't have any preference to that specific kind of model, so any type of ordinal response model would be much appreciated ! I don't see a practical way of constructing a model out of 2 individual logistic regression models, but in case there is, I'd be happy to hear about ! BTW, I have no idea why my posts don't make it through lately. Is there some 'testing' newsgroup I can send dummy posts too, just to see if it works ? thanks a lot and cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] min frequencies of categorical predictor variables in GLM
Marc Schwartz wrote: On Aug 3, 2009, at 12:06 AM, Thomas Mang wrote: Hi, Suppose a binomial GLM with both continuous as well as categorical predictors (sometimes referred to as GLM-ANCOVA, if I remember correctly). For the categorical predictors = indicator variables, is then there a suggested minimum frequency of each level ? Would such a rule/ recommendation be dependent on the y-side too ? Example: N is quite large, a bit 100. Observed however are only 0/1s (so Bernoulli random variables, not Binomial, because the covariates are from observations and in general always different between observations). There are two categorical predictors, each with 2 levels. It would structurally probably also make sense to allow an interaction between those, yielding de facto a single categorical predictor with 4 levels. Is then there a minimum of observations falling in each of the 4 level category (either absolute or relative), or also that plus also considering the y-side ? Must be the day for sample size questions for logistic regression. A similar query is on MedStats today. The typical minimum sample size recommendation for logistic regression is based upon covariate degrees of freedom (or columns in the model matrix). The guidance is that there should be 10 to 20 *events* per covariate degree of freedom. So if you have 2 factors, each with two levels, that gives you two covariate degrees of freedom total (two columns in the model matrix). At the high end of the above range, you would need 40 events in your sample. If the event incidence in your sample is 10%, you would need 400 cases to observe 40 events to support the model with the two two-level covariates (Y ~ X1 + X2). An interaction term (in addition to the 2 main effect terms, Y ~ X1 * X2) in this case would add another column to the model matrix, thus, you would need an additional 20 events, or another 200 cases in your sample. So you could include the two two-level factors and the interaction term if you have 60 events, or in my example, about 600 cases. Thanks for that. I suppose your term 'event' does not refer to a technical thing of GLMs, so I assume that both the number of observed 0s _or_ 1s have to be = 10 / 20 for each df (since it's arbitrary what of them is the event, and what is the non-event). OK, two questions: The model also contains continuous predictors (call them W, so the model is Y ~ X1*X2 + W. Does the same apply here too - for each df of these, 10-20 more events? [If the answer to the former yes, this question is now redundant:] If there are interactions between the continuous covariates and a categorical predictor (Y ~ X1 * (X2 + W), how many more events do I need? Does the rule for the categorical predictors count, or that for the continuous covariates ? many thanks ! Thomas If you include the interaction term only in the absence of the main effects (Y ~ X1:X2), that would yield 4 columns in the model matrix, requiring 80 events, or about 800 cases. Without more details (eg. your underlying hypothesis), it is not clear to me that you gain anything here as compared to the use of the main effects and potentially, the interaction term together, and you certainly lose in terms of model interpretation and requiring a notably larger sample size. Relative to a minimum sample size for each of the levels in the factor based covariates, I am not aware of any specific guidance there, short of dealing with empty cells at the extreme. However, there are methods to assess covariate complexity and the consideration for the collapsing of factor levels. For more details on these issues, I would refer you to Frank's book, Regression Modeling Strategies, specifically to chapters 4 and 10-12. The former focuses on general multivariable strategies and the latter focuses on LR. More information here: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] min frequencies of categorical predictor variables in GLM
Hi, Suppose a binomial GLM with both continuous as well as categorical predictors (sometimes referred to as GLM-ANCOVA, if I remember correctly). For the categorical predictors = indicator variables, is then there a suggested minimum frequency of each level ? Would such a rule/ recommendation be dependent on the y-side too ? Example: N is quite large, a bit 100. Observed however are only 0/1s (so Bernoulli random variables, not Binomial, because the covariates are from observations and in general always different between observations). There are two categorical predictors, each with 2 levels. It would structurally probably also make sense to allow an interaction between those, yielding de facto a single categorical predictor with 4 levels. Is then there a minimum of observations falling in each of the 4 level category (either absolute or relative), or also that plus also considering the y-side ? thanks ! Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert factor to indicator matrix
Hi, I am looking for a function which can convert a single factor (a vector), or a list of multiple factors, into the indicator (dummy) matrix with 0/1 entries (or other contrasting schemes). I could cook it myself, but I am actually sure this thing already exists - but forgot the name and package. IIRC it was clearly different from model.matrix, because it had a nicer, more straightforward interface [I know how to use model.matrix. I just can't figure out the other function any more]. many thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'singularity' between fixed effect and random factor in mixed model
Hi, I just came across the following issue regarding mixed effects models: In a longitudinal study individuals (variable ind) are observed for some response variable. One explanatory variable, f, entering the model as fixed effect, is a (2-level) factor. The expression of that factor is constant for each individual across time (say, the sex of the individual). ind enters the model as grouping variable for random effects. So in a simple form, the formula could look like: y ~ f + ... + (1|ind) [and in the simplest model, the ellipsis is simply nothing] To me, this seems not to be an unusual design at all. However, the indicator matrix consisting of f and ind - say if ind had entered the model as fixed effect - shows a singularity. My question is now what will this 'singularity' cause in a mixed-effects model ? I admit, I have never fully understood how the fitting of mixed-effects models happen internally (whether REML or ML) [so I am not even sure if it can be called a 'singularity']. Specifically, does it make the fit numerically more unstable? Would the degree of this depend on other variables of the model? Is the issue of degrees of freedom - complicated enough anyway for mixed models - further inflated by that? Have statistical inferences regarding the fixed effect be treated more carefully? Is the general situation something that should be avoided ? many thanks in advance for any insights and cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MCMC validity question
Hello, I have quite a tough problem, which might be able to be solved by MCMC. I am fairly new to MCMC (in the learning process) - so apologize if the answer is totally obvious, and any hints, links etc are greatly appreciated. I'll illustrate the problem in a version cut-down to the essentials - the real problem is ways more complex. Suppose I have a Markovian series of poisson processes, which generate (or kill) some objects. At t0, say I have initially x = 3 objects. In the next time step, t1, the number of total objects is Poisson-distributed, subject to a function of x at t0, covariates, and parameters. So x_t1 ~ Pois, with E(x_t1) = f(x_t0, covariates, parameters). Let's choose a very simple function for f, say just f = x * par1 * Covariate1. Now let this process be repeated for say 6 times, always with the number of objects obtained in a previous step as input (x) for the next step. The problem is, at all time steps the total number of objects remains unobservable, because they are only detected with a certain, low probability (itself subject to covariates and parameters). So if you observe say 2 objects, the only thing you know is the figure must be = 2. Presume however that the detection prob is equal and independent for all objects at a time step, and so the observed number of objects is also Poisson-distributed. The likelihood-function is then built upon that figure. The main problem is the input to function f: In the first step, I know what x is (or even don't know that, might again just be from a distribution). From that step on, I have only a Poisson-distribution, and lack the concrete realization; all I know is a pure minimum value. In general f is not a simply thing, and is quite impracticable to input a distribution itself; moreover, because of the inflation of variance, the output could not be treated as lambda of a Poisson-distribution any more. So the Poisson-distribution is lost, although physical knowledge tells you it really is (and would be, were it x was known precisely). The question is therefore, how can I work around the fact that x is always known to be only from a distribution, not knowing the precise realization? Ignoring above issue, and keeping in mind that I have shown only a simplified version of the model, MCMC methods seem a reasonable choice. I had an idea, which looks so strange, that I have strong doubts if it's valid to so: Say I have a present parameter set. Is it then possible to estimate the Poisson-lambda for t1 using f, then draw a random number from that distribution, and now treat that drawn figure as (fixed) input for calculation for t2, and so on, repeating it until t6 ? At the end, propose new parameters based on MCMC sampling, and repeat all over again. Will a long sequence of MCMC iterations homogenize these fake (simulated) realizations of the poisson-distributions, and so the chain will converge to the posterior distribution of the parameters? many many thanks for any inputs and thoughts, cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting numeric to integer
Hi, The problem is, x might be negative. If x == -6.999, the result should be -7, not -6. That's why my original proposal had the ifelse-condition (one could alternatively write sign(x) * 0.5, BTW. I agree however that in my proposal, the round(x) is redundant, one can use x itself as left-hand argument for the sum operation. Is there however still a more 'elegant' way ? thanks, Thomas Gabor Grothendieck wrote: Try: as.integer(x + 0.5) assuming the calculation error is less than 0.5 . On Sat, May 16, 2009 at 2:49 PM, Thomas Mang thomas.m...@fiwi.at wrote: Hello, Suppose I have x, which is a variable of class numeric. The calculations performed to yield x imply that mathematically it should be an integer , but due to round-off errors, it might not be (and so in either direction). The error is however small, so round(x) will yield the appropriate integer value. Moreover, this integer values is guaranteed to be representable by an 'integer' class, that is -2^31 x 2^31, and logically it is an integer anyway. So I want to convert x from class 'numeric' to 'integer'. What is the most elegant, but always correct way, to achieve this conversion ? What comes to mind is of course something along: x = as.integer(round(x)) I am, however, not sure if this always works, because I do not know if the round-function is guaranteed to return a numeric value which, in finite binary representation, is always = the underlying mathematical integer. If that is however guaranteed, that would of course be a simple + elegant one. An alternative I came up with is: x = as.integer(round(x) + ifelse(x = 0, 0.5, -0.5)) Where I explicitly add a bit to ensure the finite binary representation must be = the underlying integer, and then truncate the decimal digits. IMO, this one is always guaranteed to work, at least within the numerical range of what integers are limited to anyway. What's your opinion on the issue ? Any other solution ? Thanks a lot in advance and cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] converting numeric to integer
Hello, Suppose I have x, which is a variable of class numeric. The calculations performed to yield x imply that mathematically it should be an integer , but due to round-off errors, it might not be (and so in either direction). The error is however small, so round(x) will yield the appropriate integer value. Moreover, this integer values is guaranteed to be representable by an 'integer' class, that is -2^31 x 2^31, and logically it is an integer anyway. So I want to convert x from class 'numeric' to 'integer'. What is the most elegant, but always correct way, to achieve this conversion ? What comes to mind is of course something along: x = as.integer(round(x)) I am, however, not sure if this always works, because I do not know if the round-function is guaranteed to return a numeric value which, in finite binary representation, is always = the underlying mathematical integer. If that is however guaranteed, that would of course be a simple + elegant one. An alternative I came up with is: x = as.integer(round(x) + ifelse(x = 0, 0.5, -0.5)) Where I explicitly add a bit to ensure the finite binary representation must be = the underlying integer, and then truncate the decimal digits. IMO, this one is always guaranteed to work, at least within the numerical range of what integers are limited to anyway. What's your opinion on the issue ? Any other solution ? Thanks a lot in advance and cheers, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] changing function argument
Hi, I wonder if the following is possible in R: Suppose a function takes an argument, and wants to modify this argument so that the change is visible _at the call side_. It would be what is for example known as pass-by-reference-to-non-const in C++. test - function(x) { x - 10 ... return (somethingElse) # I do NOT want having to return x } number = 5 test(number) stopifnot(number == 10) Is there some trick with scoping rules and maybe - to achieve this ? thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing function argument
Dieter Menne wrote: Thomas Mang thomas.mang at fiwi.at writes: I wonder if the following is possible in R: Suppose a function takes an argument, and wants to modify this argument so that the change is visible _at the call side_. It would be what is for example known as pass-by-reference-to-non-const in C++. test - function(x) { x - 10 ... return (somethingElse) # I do NOT want having to return x } number = 5 test(number) stopifnot(number == 10) Is there some trick with scoping rules and maybe - to achieve this ? Yes, it is possible. No, don't do it (coming from the c++ world, I understand your temptation). Try to look for assign to see the attached flames. Hi, Well the point is I really really want to do it - and believe me, I know what I am doing ;) (for that matter, this is done solely in implementation code - of course I wouldn't do such a thing in a publicly used interface). I know assign, and I suppose the trick is something to pass the name of the variable at the call side, as well as the environment of the call side, as additional argument to my test-function, so it knows what to change where. Is that correct ? Is there some more short-hand / direct / elegant way of achieving my goal ? thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between assignment syntax - vs =
Hi, thanks for the link. In the bottom part of the relevant section, you say: Standard advice is to avoid using '=' when you mean '-' Is this a formal, generally accepted (R community) advice, or does it reflect you personal opinion? Note I am not asking this question as to criticize by any means, but instead I just want to know for my own work (which will be partially released to others) if the '-' - style is the preferred one (Personally, with a strong background from other programming languages, I have always used '=' so far). thanks, Thomas Patrick Burns wrote: 'The R Inferno' page 78 is one source you can look at. Patrick Burns patr...@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of The R Inferno and A Guide for the Unwilling S User) Thomas Mang wrote: Hi, Both operators - and = can be used to make an assignment. My question is: Is there a semantic difference between these two? Some time ago, I remember I have read that because of some reason, one should be given preference over the other - but I cannot remember the source, nor the argument, nor which operator the preferred was. What is the present state ? Is still one version better than the other, or is it only a matter of taste what to use ? thanks Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] difference between assignment syntax - vs =
Hi, Both operators - and = can be used to make an assignment. My question is: Is there a semantic difference between these two? Some time ago, I remember I have read that because of some reason, one should be given preference over the other - but I cannot remember the source, nor the argument, nor which operator the preferred was. What is the present state ? Is still one version better than the other, or is it only a matter of taste what to use ? thanks Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which test-statistic to use for quasibinomial GLMs?
Hi, I have fitted quasibinomial GLM [glm(y ~ ..., family = quasibinomial)] to a binary response variable; quasibinomial, because there were clear signs of underdispersion in a 'simple' binomial GLM, and so the dispersion is a free parameter in the model. My question is now: In a quasi-binomial model with a binary-only response variable, what are the most appropriate tests to compare different models? I have studied Faraway's book (Extending the linear Model with R) and concluded a likelihood-ratio test seems to be inappropriate, as seems to be the Wald-test. In chapter 7 an F-test is suggested, but this refers to an example with a beta-distributed response. Can I conclude that the following code example will be fine in my case: anova(model1, model2, test= F) ? Moreover, the summary of the GLM, including parameters of the predictors, shall be presented. The summary method however does not conduct an F test; so in sync with my ideas above, shall I also use F-tests for the individual predictors (personally, I would, but as I am not sure I ask here...)? Thanks a lot, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generalized mixed model + mcmcsamp
Hi, I have fitted a generalized linear mixed effects model using lmer (library lme4), and the family = quasibinomial. I have tried to obtain a MCMC sample, but on calling mcmcsamp(model1, 1000) I get the following error which I don't understand at all: Error in .local(object, n, verbose, ...) : Update not yet written traceback() delivers: 4: .Call(mer_MCMCsamp, ans, object) 3: .local(object, n, verbose, ...) 2: mcmcsamp(model1, n = 1000, verbose = FALSE) 1: mcmcsamp(model1, n = 1000, verbose = FALSE) which again doesn't particularly help me. R is 2.8.1 under Windows, lme4 clean installed just today. Before the model is fitted I just read in data, and transform some variables. No other library is loaded. Any ideas ? thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bootstrapping in regression
Hi, Please apologize if my questions sounds somewhat 'stupid' to the trained and experienced statisticians of you. Also I am not sure if I used all terms correctly, if not then corrections are welcome. I have asked myself the following question regarding bootstrapping in regression: Say for whatever reason one does not want to take the p-values for regression coefficients from the established test statistics distributions (t-distr for individual coefficients, F-values for whole-model-comparisons), but instead apply a more robust approach by bootstrapping. In the simple linear regression case, one possibility is to randomly rearrange the X/Y data pairs, estimate the model and take the beta1-coefficient. Do this many many times, and so derive the null distribution for beta1. Finally compare beta1 for the observed data against this null-distribution. What I now wonder is how the situation looks like in the multiple regression case. Assume there are two predictors, X1 and X2. Is it then possible to do the same, but just only rearranging the values of one predictor (the one of interest) at a time? Say I want again to test beta1. Is it then valid to many times randomly rearrange the X1 data (and keeping Y and X2 as observed), fit the model, take the beta1 coefficient, and finally compare the beta1 of the observed data against the distributions of these beta1s ? For X2, do the same, randomly rearrange X2 all the time while keeping Y and X1 as observed etc. Is this valid ? Second, if this is valid for the 'normal', fixed-effects only regression, is it also valid to derive null distributions for the regression coefficients of the fixed effects in a mixed model this way? Or does the quite different parameters estimation calculation forbid this approach (Forbid in the sense of bogus outcome) ? Thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bootstrapping in regression
Greg Snow wrote: What you are describing is actually a permutation test rather than a bootstrap (related concepts but with a subtle but important difference). The way to do a permutation test with multiple x's is to fit the reduced model (use all x's other than x1 if you want to test x1) on the original data and store the fitted values and the residuals. Permute the residuals (randomize their order) and add them back to the fitted values and fit the full model (including x1 this time) to the permuted data set. Do this a bunch of times and it will give you the sampling distribution for the slope on x1 (or whatever your set of interest is) when the null hypothesis that it is 0 given the other variables in the model is true. Hi, Thanks to you and Tom for the correction regarding bootstrapping vs permutation, and to Chuck for the cool link. Yes of course I described a permutation. I have a question here: I am not sure if I understand your 'fit the full model ... to the permuted data set'. Am I correct to suppose that once the residuals of the reduced-model fit have been permuted and added back to the fitted values, the values obtained this way (fitted + permuted residuals) now constitute the new y-values to which the full model is fitted? Is that correct ? Do you know if this procedure is also valid for a mixed-effects model ? thanks a lot, Thomas Permuting just x1 only works if x1 is orthogonal to all the other predictors, otherwise the permuting destroys the relationship with the other predictors and does not do the test you want. Bootstrapping depends on sampling with replacement, not permuting, and is used more for confidence intervals than for tests (the reference by John Fox given to you in another reply can help if that is the approach you want to take). Hope this helps, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] .C and 'temporaries'
Hello, Before I get into troubles I ask here: I make a call to a C function using .C. What I would like to know is if the arguments in the .C call can also be 'temporary' objects. An example will illustrate this: # here we don't have a temporary Obj1 = 5 .C(Func, as.integer(Obj1 ), ...) # certainly works # here we do have a temporary Obj2 = 5 .C(Func, as.integer(Obj2 + 100), ...) # is this guaranteed to work? Is the second call valid? Is it also valid if DUP = FALSE, or only if DUP = TRUE ? Thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] timezone attribute lost during operations
Hi, I was just *highly* surprised to find out that R 2.8.0 (Windows XP) changes the timezone-interpretation during operations on time data, as apparently the timezone attribute is lost and then, for the next interpretation of the timezone, the system settings are used. Here is sample code (executed under a platform with the system timezone managed by Windows set to CET, but note that as the input data is GMT I also want all the calculations to occur in GMT): # input data Time = as.POSIXct(strptime(c(2007-12-12 14:30:15, 2008-04-14 15:31:34, 2008-04-14 15:31:34), format = %Y-%m-%d %H:%M:%S, tz = GMT)) Time # OK, time zone is GMT attr(Time, tzone) # OK, time zone is GMT TApply = tapply(1:3, Time, max) names(TApply) # wrong, names are converted to time zone of system UTime = unique(Time) UTime # wrong, again time zone of system is used attr(UTime, tzone) # == NULL Now the issue is not that I wouldn't know how to solve the problem (by setting TZ to GMT prior to executing the calculations), but I wonder why is R doing this mess at all? Why is it not able to maintain the timezone-information stored in my original vector Time ? Is this behavior supposed to be a feature, or is it a plain bug ? Thanks, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.