[R] newly install old and present R versions

2012-02-03 Thread Thomas Mang

Hi,

Moving to a new computer (Windows 7) but for reasons of reproduceability 
I would seriously like to also install my present R 2.8.1 along with all 
extension packages on that machine as well (that is besides R 2.14.1).


What's the best way of doing so?

My idea is:
Get the setup for 2.8.1 and install it; during install neither select 
'save version number in registry' nor 'associate Rdata files' to keep 
the registry clean.
Manually Copy the extenxion packages from my present machine to the lib 
folder of 2.8.1

That should mean 2.8.1 should run.

Then install 2.14.1 and download most up-to-date versions of packages.
How do i detect if packages were e.g. renamed or merged? Also manually 
copying my old packages to the lib dir and then running 
update.packages(checkBuilt=TRUE, ask=FALSE)? Will that work under these 
circumstances as well?


To make things more complicated I am also using Rtools. Is it possible 
to have two parallel versions of Rtools? Sure I can install it but I 
suppose as it sets some PATH variables only one can actually only be 
active at a time. Hence it might be smarter to avoid installing Rtools 
for 2.8.1 alltogether, that is live without it (having enough troubles 
with running Rtools for one version alongside with MinGW as also the 
path to g++ can conflict ...)


thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking dates for entry errors

2012-01-17 Thread Thomas Mang

On 1/11/2012 11:07 PM, Paul Miller wrote:

Hello Everyone,

I have a question about how best to check dates for entry errors.


Try using regular expression matching and the functions grep, strsplit, 
regexpr etc.
If you are not familiar with regex: bit a bumpy road of getting into it 
but in the long term definitely worth the effort !


cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multinomial regression model

2011-05-10 Thread Thomas Mang

Hi,

Consider the need for a  regression model which can handle an ordered 
multinomial response variable. There are, for example, proportional odds 
/ cumulative logit models, but actually the regression should include 
random effects (a mixed model), and I would not be aware of multinomial 
regression model as part of lme4 (am I wrong here ?). Further, the 
constraint of proportional odd models that predictors have the same 
relative impact across all levels, does most likely not hold for the 
study in question.


I was wondering if an ordinary binomial mixed model can be turned in an 
multinomial one through preparing the input data.frame in a different way:
Consider three response levels, A, B, C, ordered. I can accurately 
describe the occurrence of each of these three realizations using one to 
two Bernoulli random variables:


Let
P(X == A) = a
P(X in {B, C}) = 1 - a
P(X == B | X in {B, C}) = (1 - a) * b
P(X == C | X in {B, C}) = (1 - a) * (1 - b)

so the first comparison checks if A or either of B/C is the case, and 
the second, conditional on it's either B/C, checks which of these two 
holds. Sort of traversing sequentially the hierarchy of the ordered levels.
In terms of the likelihood of the desired model, the probabilities on 
the right hand side would be exactly achieved if I use one input row in 
case the random variable takes on the value A and assign the response 
variable the value 0, while in the other cases the probabilities are 
achieved by using two input table rows, with the first one having value 
1 for the response variable so the random variate is either B/C) and a 
second row with response equal to 0 if B is the case, and 1 otherwise, 
that is C is the case.


Certainly, degrees of freedom must be manually adjusted in inferences, 
as every measured response should provide only a single degree of freedom.


Question: Do I overlook here something, or is above outlined way a valid 
method to yield an ordered multinomial mixed model by tweaking the input 
table in such manners ?


many thanks and best,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Thomas Mang

HI,

Thanks for your response; yes you are right it's not fully on topic, but 
I chose this list not only because I am using R for all my stats and so 
read it anyway, but also because here many statisticians read too.

Do you know another list where my question is more appropriate ?
For what it's worth, haven't found a local statistician yet to really 
answer the question, but I'll continue searching ...


thanks,
Thomas

On 12/13/2009 11:07 AM, Daniel Malter wrote:

Hi, you are unlikely to (or lucky if you) get a response to your question
from the list. This is a question that you should ask your local
statistician with knowledge in stats and, optimally, your area of inquiry.
The list is (mostly) concerned with solving R rather than statistical
problems.

Best of luck,
Daniel

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Thomas Mang
Sent: Friday, December 11, 2009 6:19 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] random effects in mixed model not that 'random'

Hi,

I have the following conceptual / interpretative question regarding
random effects:

A mixed effects model was fit on biological data, with observations
coming from different species. There is a clear overall effect of
certain predictors (entering the model as fixed effect), but as
different species react slightly differently, the predictor also enters
the model as random effect and with species as grouping variable. The
resulting model is very fine.

Now comes the tricky part however: I can inspect not only the variance
parameter estimate for the random effect, but also the 'coefficients'
for each species. If I do this, suppose I find out that they make
biologically sense, and maybe actually more sense then they should:
For each species vast biological knowledge is available, regarding
traits etc. So I can link the random effect coefficients to that
knowledge, see the deviation from the generic predictor impact (the
fixed effect) and relate it to the traits of my species.
However I see the following problem with that approach: If I have no
knowledge of the species traits, or the species names are anonymous to
me, it makes sense to treat the species-specific deviations as
realizations of a random variable (principle of exchangeability). Once I
know however the species used in the study and have the biological
knowledge at hand, it does not make so much sense any more; I can
predict whether for that particular species the generic predictor impact
will be amplified, or not. That is, I can predict if more likely the
draw from the assumed normal distribution of the random effects will be
0, or  0 - which is of course complete contradictory and nonsense if
I assume I have a random draw from a N(0, sigma) distribution.
Integrating the biological knowledge as fixed effect however might be
tremendously difficult, as species traits can sometimes not readily be
quantified in a numeric way.
I could defer issue to the species traits and say, once the species
evolved their traits were drawn randomly from a population. This however
causes problems with ideas of evolution and phylogenetic relationships
among the species.

Maybe my question can be rephrased the following way:
Does it ever make sense to _interpret_ the coefficients of the random
effects for each group and link it to properties of the grouping
variable? The assumption of a realization of a random variable seems to
render that quite problematic. However, this means that the more
ignorant I am , and the less knowledge I have, the more the random
realization seems to become realistic - which is at odds with scientific
investigations.
Suppose the mixed model is one of the famous social sciences studies
analysing pupil results on tests at different schools, with schools
acting as grouping variable for a random effect intercept. If I have no
knowledge about the schools, the random effect assumption makes sense.
If I however investigate the schools in detail (either a priori or a
posterior), say teaching quality of the teachers, socio-economic status
of the school area etc, it will probably make sense to predict which
ones will have pupils performing above average, and which below average.
However then probably these factors leading me to the predictions should
enter the model as fixed effects, and maybe I don't need and school
random effect any more at all. But this means actually the school
deviation from the global mean is not the realization of a random
variable, but instead the result of something quite deterministic, but
which is usually just unknown, or can only be measured with extreme,
impractical efforts.  So the process might not be random, just because
so little is known about the process, the results appear as if they
would be randomly drawn (from a larger population distribution). Again,
is ignorance / lack of deeper knowledge the key

Re: [R] random effects in mixed model not that 'random'

2009-12-13 Thread Thomas Mang

HI,

Thanks for your input; see below

On 12/13/2009 4:41 PM, Robert A LaBudde wrote:

I think what you are finding is that calling a grouping variable a
random effect is not the same thing as it actually being a random effect.

An effect is really only random when it is chosen randomly. Just because
you don't want to deal with it as a fixed effect (e.g., too many levels)
doesn't mean it qualifies as a random effect. This sloppiness in common
in mixed modeling.


Well to some degree the species were chosen randomly, so there isn't a 
big selection bias in there. I also argue they wouldn't qualify as fixed 
effect (they might as stand-alone fixed effect factor, but definitely 
not as interaction with other predictors - there is no reason to believe 
the impact of predictors is totally independent across species).
Sample size isn't the problem; I truly wouldn't want to include them as 
fixed effect based on expert knowledge.




In your example of student scores, you mentioned the schools were a
random effect, because they were a grouping variable. This is not true.
Schools have a strong fixed effect. They are also not chosen randomly in
your student.

How to resolve your problem? Two methods: 1) Stop modeling the grouping
variable as a random effect, when it's not: Model it as a fixed effect;
2) Do the experiment right: a) List the schools in their population. b)
Chose the schools to be used by random sampling from that population.
Then you will find schools really is a random effect.


1) does not seem to be the right solution.
2) is more interesting in terms of understanding:
Are you saying that it's just the random choice of why something was 
included in the sample is what makes it qualify as random effect ? I 
thought the fact that it is the realization of a random variable (drawn 
from a N(0, sigma) distribution). These are two different things.


Suppose I list all the schools in the population and randomly pick 15. 
IIUC, you would argue now it qualifies as random effect. However, once I 
have chosen my schools I could still investigate the estimated random 
effects coefficients, a posteriori investigate the schools and try to 
find out what discriminates those with students above average from those 
below average. Odds are, if I had the resources to make a thorough 
investigation, I would find something - or in other words, because there 
is something deterministic behind it, I would have said they are not the 
random realization from a normal distribution - which was my 
understanding of properties of random effects so far, but which might be 
wrong and hence the problem (although due to the complexity of this 
deterministic process, they might practically appear as random 
realizations). If I would pick a 16. school and then apply my knowledge 
from the investigations, I could probably say if it will be above or 
below average - this is what, in my understanding of random effects, 
actually would not qualify it as random effect, whereas according to you 
it would, if the school was chosen randomly. Is that correct ?


Suppose I have chosen randomly: Does it make sense to investigate a 
posteriori why the estimates for the random effects are the way the are 
and find insights on the system, or would it not make sense as they are 
assumed complete random realization of a random variable and can be 
anything because they are random variable ?


To some degree I think the issue can also be seen the following way:
Conditional on my extensive knowledge of the school properties, the 
schools are probably not distributed iid. I could have this knowledge 
enter as fixed effect. But since this knowledge is usually not available 
the unconditional distribution might well make them iid N(0, sigma), and 
hence makes the schools qualify as grouping variable for random effects 
(where of course it is assumed that now sampling was done randomly from 
the population).
But what shall I do if I have a bit of the extensive knowledge available 
- maybe too much to sticking to the complete unconditional iid 
assumption, but also not enough for a sensible conditional distribution 
to allow the specification of a fixed effect ?


thanks
Thomas







What you have discovered is called selection bias. It is common in
unrandomized studies.


At 09:12 AM 12/13/2009, Thomas Mang wrote:

HI,

Thanks for your response; yes you are right it's not fully on topic,
but I chose this list not only because I am using R for all my stats
and so read it anyway, but also because here many statisticians read too.
Do you know another list where my question is more appropriate ?
For what it's worth, haven't found a local statistician yet to really
answer the question, but I'll continue searching ...

thanks,
Thomas

On 12/13/2009 11:07 AM, Daniel Malter wrote:

Hi, you are unlikely to (or lucky if you) get a response to your
question
from the list. This is a question that you should ask your local
statistician with knowledge in stats

[R] random effects in mixed model not that 'random'

2009-12-11 Thread Thomas Mang

Hi,

I have the following conceptual / interpretative question regarding 
random effects:


A mixed effects model was fit on biological data, with observations 
coming from different species. There is a clear overall effect of 
certain predictors (entering the model as fixed effect), but as 
different species react slightly differently, the predictor also enters 
the model as random effect and with species as grouping variable. The 
resulting model is very fine.


Now comes the tricky part however: I can inspect not only the variance 
parameter estimate for the random effect, but also the 'coefficients' 
for each species. If I do this, suppose I find out that they make 
biologically sense, and maybe actually more sense then they should:
For each species vast biological knowledge is available, regarding 
traits etc. So I can link the random effect coefficients to that 
knowledge, see the deviation from the generic predictor impact (the 
fixed effect) and relate it to the traits of my species.
However I see the following problem with that approach: If I have no 
knowledge of the species traits, or the species names are anonymous to 
me, it makes sense to treat the species-specific deviations as 
realizations of a random variable (principle of exchangeability). Once I 
know however the species used in the study and have the biological 
knowledge at hand, it does not make so much sense any more; I can 
predict whether for that particular species the generic predictor impact 
will be amplified, or not. That is, I can predict if more likely the 
draw from the assumed normal distribution of the random effects will be 
 0, or  0 - which is of course complete contradictory and nonsense if 
I assume I have a random draw from a N(0, sigma) distribution. 
Integrating the biological knowledge as fixed effect however might be 
tremendously difficult, as species traits can sometimes not readily be 
quantified in a numeric way.
I could defer issue to the species traits and say, once the species 
evolved their traits were drawn randomly from a population. This however 
causes problems with ideas of evolution and phylogenetic relationships 
among the species.


Maybe my question can be rephrased the following way:
Does it ever make sense to _interpret_ the coefficients of the random 
effects for each group and link it to properties of the grouping 
variable? The assumption of a realization of a random variable seems to 
render that quite problematic. However, this means that the more 
ignorant I am , and the less knowledge I have, the more the random 
realization seems to become realistic - which is at odds with scientific 
investigations.
Suppose the mixed model is one of the famous social sciences studies 
analysing pupil results on tests at different schools, with schools 
acting as grouping variable for a random effect intercept. If I have no 
knowledge about the schools, the random effect assumption makes sense. 
If I however investigate the schools in detail (either a priori or a 
posterior), say teaching quality of the teachers, socio-economic status 
of the school area etc, it will probably make sense to predict which 
ones will have pupils performing above average, and which below average. 
However then probably these factors leading me to the predictions should 
enter the model as fixed effects, and maybe I don't need and school 
random effect any more at all. But this means actually the school 
deviation from the global mean is not the realization of a random 
variable, but instead the result of something quite deterministic, but 
which is usually just unknown, or can only be measured with extreme, 
impractical efforts.  So the process might not be random, just because 
so little is known about the process, the results appear as if they 
would be randomly drawn (from a larger population distribution). Again, 
is ignorance / lack of deeper knowledge the key to using random effects 
- and the more knowledge I have, the less ?


many thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MCMC sampling question

2009-08-12 Thread Thomas Mang

Hello,


Consider MCMC sampling with metropolis / metropolis hastings proposals 
and a density function with a given valid parameter space. How are MCMC 
proposals performed if the parameter could be located at the very 
extreme of the parameter space, or even 'beyond that' ? Example to 
express it and my very nontechnical 'beyond that': The von Mises 
distribution is a circular distribution, describing directional trends. 
It has a concentration parameter Kappa, with Kappa  0. The lower kappa, 
the flatter the distribution, and for Kappa approaching 0, it converges 
into the uniform. Kappa shall be estimated [in a complex likelihood] 
through MCMC, with the problem that it is possible that there truly 
isn't any directional trend in the data at all, that is Kappa - 0; the 
latter would even constitute the H0.
If I log-transform Kappa to get in on the real line, will the chain then 
ever fulfill convergence criteria ? The values for logged Kappa should 
be on average I suppose less and less all the time. But suppose it finds 
an almost flat plateau. How do I then test against the H0 - by 
definition, I'll never get a Kappa = 0 exactly; so I can't compare 
against that.


One idea I had: Define not only a parameter Kappa, but also one of an 
indicator function, which acts as switch between a uniform and a 
vonMises distribution. Call that parameter d. I could then for example 
let d switch state with a 50% probability and then make usual acceptance 
tests.
Is this approach realistic ? is it sound and solid or nonsense / 
suboptimal? Is there a common solution to the before mentioned problem ?
[I suppose there is. Mixed effects models testing the variances of 
random effects for 0 should fall into the same kind of problem].


cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ordinal response model with spatial autocorrelation

2009-08-10 Thread Thomas Mang

Hi,

[note: 4th posting trial - apologize if the other ones would ever show 
up...]


I have a (3-level) ordinal response data set which needs the integration 
of an spatial autocorrelation structure. What packages / functions are 
available to fit such a thing ?


The heterogeneous, cluster-alike structuring of the autocorrelation 
seems to make a mixed effects model with random factors capturing the 
different sampling regions as ideal. I failed however to find a function 
in nlme/lme4 or other packages capable of fitting ordinal data, and 
package 'DPpackage' uses an approach I am not really familiar with; the 
function DPMolmm also triggers errors, even for the provided examples.


There is no sound a-priori reason to believe that the proportional odds 
assumptions apply - they may, or may not. Anyways I don't have any 
preference to that specific kind of model, so any type of ordinal 
response model would be much appreciated !
I don't see a practical way of constructing a model out of 2 individual 
logistic regression models, but in case there is, I'd be happy to hear 
about !



BTW, I have no idea why my posts don't make it through lately. Is there 
some 'testing' newsgroup I can send dummy posts too, just to see if it 
works ?



thanks a lot and cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] min frequencies of categorical predictor variables in GLM

2009-08-04 Thread Thomas Mang

Marc Schwartz wrote:

On Aug 3, 2009, at 12:06 AM, Thomas Mang wrote:


Hi,

Suppose a binomial GLM with both continuous as well as categorical 
predictors (sometimes referred to as GLM-ANCOVA, if I remember 
correctly). For the categorical predictors = indicator variables, is 
then there a suggested minimum frequency of each level ? Would such a 
rule/ recommendation be dependent on the y-side too ?


Example: N is quite large, a bit  100. Observed however are only 0/1s 
(so Bernoulli random variables, not Binomial, because the covariates 
are from observations and in general always different between 
observations). There are two categorical predictors, each with 2 
levels. It would structurally probably also make sense to allow an 
interaction between those, yielding de facto a single categorical 
predictor with 4 levels. Is then there a minimum of observations 
falling in each of the 4 level category (either absolute or relative), 
or also that plus also considering the y-side ?


Must be the day for sample size questions for logistic regression. A 
similar query is on MedStats today.


The typical minimum sample size recommendation for logistic regression 
is based upon covariate degrees of freedom (or columns in the model 
matrix). The guidance is that there should be 10 to 20 *events* per 
covariate degree of freedom.


So if you have 2 factors, each with two levels, that gives you two 
covariate degrees of freedom total (two columns in the model matrix). At 
the high end of the above range, you would need 40 events in your sample.


If the event incidence in your sample is 10%, you would need 400 cases 
to observe 40 events to support the model with the two two-level 
covariates (Y ~ X1 + X2).


An interaction term (in addition to the 2 main effect terms, Y ~ X1 * 
X2) in this case would add another column to the model matrix, thus, you 
would need an additional 20 events, or another 200 cases in your sample.


So you could include the two two-level factors and the interaction term 
if you have 60 events, or in my example, about 600 cases.


Thanks for that. I suppose your term 'event' does not refer to a 
technical thing of GLMs, so I assume that both the number of observed 0s 
_or_ 1s have to be = 10 / 20 for each df (since it's arbitrary what of 
them is the event, and what is the non-event).


OK, two questions: The model also contains continuous predictors (call 
them W, so the model is Y ~ X1*X2 + W. Does the same apply here too - 
for each df of these, 10-20 more events? [If the answer to the former 
yes, this question is now redundant:] If there are interactions between 
the continuous covariates and a categorical predictor (Y ~ X1 * (X2 + 
W), how many more events do I need? Does the rule for the categorical 
predictors count, or that for the continuous covariates ?


many thanks !
Thomas




If you include the interaction term only in the absence of the main 
effects (Y ~ X1:X2), that would yield 4 columns in the model matrix, 
requiring 80 events, or about 800 cases. Without more details (eg. your 
underlying hypothesis), it is not clear to me that you gain anything 
here as compared to the use of the main effects and potentially, the 
interaction term together, and you certainly lose in terms of model 
interpretation and requiring a notably larger sample size.


Relative to a minimum sample size for each of the levels in the factor 
based covariates, I am not aware of any specific guidance there, short 
of dealing with empty cells at the extreme. However, there are methods 
to assess covariate complexity and the consideration for the collapsing 
of factor levels. For more details on these issues, I would refer you to 
Frank's book, Regression Modeling Strategies, specifically to chapters 4 
and 10-12. The former focuses on general multivariable strategies and 
the latter focuses on LR. More information here:


  http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

HTH,

Marc Schwartz



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] min frequencies of categorical predictor variables in GLM

2009-08-02 Thread Thomas Mang

Hi,

Suppose a binomial GLM with both continuous as well as categorical 
predictors (sometimes referred to as GLM-ANCOVA, if I remember 
correctly). For the categorical predictors = indicator variables, is 
then there a suggested minimum frequency of each level ? Would such a 
rule/ recommendation be dependent on the y-side too ?


Example: N is quite large, a bit  100. Observed however are only 0/1s 
(so Bernoulli random variables, not Binomial, because the covariates are 
from observations and in general always different between observations). 
There are two categorical predictors, each with 2 levels. It would 
structurally probably also make sense to allow an interaction between 
those, yielding de facto a single categorical predictor with 4 levels. 
Is then there a minimum of observations falling in each of the 4 level 
category (either absolute or relative), or also that plus also 
considering the y-side ?


thanks !
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] convert factor to indicator matrix

2009-07-01 Thread Thomas Mang

Hi,

I am looking for a function which can convert a single factor (a 
vector), or a list of multiple factors, into the indicator (dummy) 
matrix with 0/1 entries (or other contrasting schemes). I could cook it 
myself, but I am actually sure this thing already exists - but forgot 
the name and package. IIRC it was clearly different from model.matrix, 
because it had a nicer, more straightforward interface [I know how to 
use model.matrix. I just can't figure out the other function any more].


many thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'singularity' between fixed effect and random factor in mixed model

2009-07-01 Thread Thomas Mang

Hi,

I just came across the following issue regarding mixed effects models:
In a longitudinal study individuals (variable ind) are observed for some 
response variable. One explanatory variable, f, entering the model as 
fixed effect, is a (2-level) factor. The expression of that factor is 
constant for each individual across time (say, the sex of the 
individual). ind enters the model as grouping variable for random 
effects. So in a simple form, the formula could look like:

y ~ f + ... + (1|ind)
[and in the simplest model, the ellipsis is simply nothing]

To me, this seems not to be an unusual design at all.

However, the indicator matrix consisting of f and ind - say if ind had 
entered the model as fixed effect - shows a singularity. My question is 
now what will this 'singularity' cause in a mixed-effects model ? I 
admit, I have never fully understood how the fitting of mixed-effects 
models happen internally (whether REML or ML) [so I am not even sure if 
it can be called a 'singularity'].
Specifically, does it make the fit numerically more unstable? Would the 
degree of this depend on other variables of the model? Is the issue of 
degrees of freedom - complicated enough anyway for mixed models - 
further inflated by that? Have statistical inferences regarding the 
fixed effect be treated more carefully? Is the general situation 
something that should be avoided ?


many thanks in advance for any insights and cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MCMC validity question

2009-06-10 Thread Thomas Mang

Hello,

I have quite a tough problem, which might be able to be solved by MCMC. 
I am fairly new to MCMC (in the learning process) - so apologize if the 
answer is totally obvious, and any hints, links etc are greatly appreciated.


I'll illustrate the problem in a version cut-down to the essentials - 
the real problem is ways more complex.
Suppose I have a Markovian series of poisson processes, which generate 
(or kill) some objects. At t0, say I have initially x = 3 objects. In 
the next time step, t1, the number of total objects is 
Poisson-distributed, subject to a function of x at t0, covariates, and 
parameters. So x_t1 ~ Pois, with E(x_t1) = f(x_t0, covariates, 
parameters). Let's choose a very simple function for f, say just f = x * 
par1 * Covariate1. Now let this process be repeated for say 6 times, 
always with the number of objects obtained in a previous step as input 
(x) for the next step.
The problem is, at all time steps the total number of objects remains 
unobservable, because they are only detected with a certain, low 
probability (itself subject to covariates and parameters). So if you 
observe say 2 objects, the only thing you know is the figure must be = 
2. Presume however that the detection prob is equal and independent for 
all objects at a time step, and so the observed number of objects is 
also Poisson-distributed. The likelihood-function is then built upon 
that figure.


The main problem is the input to function f: In the first step, I know 
what x is (or even don't know that, might again just be from a 
distribution). From that step on, I have only a Poisson-distribution, 
and lack the concrete realization; all I know is a pure minimum value. 
In general f is not a simply thing, and is quite impracticable to input 
a distribution itself; moreover, because of the inflation of variance, 
the output could not be treated as lambda of a Poisson-distribution any 
more. So the Poisson-distribution is lost, although physical knowledge 
tells you it really is (and would be, were it x was known precisely).


The question is therefore, how can I work around the fact that x is 
always known to be only from a distribution, not knowing the precise 
realization?


Ignoring above issue, and keeping in mind that I have shown only a 
simplified version of the model, MCMC methods seem a reasonable choice. 
I had an idea, which looks so strange, that I have strong doubts if it's 
valid to so: Say I have a present parameter set. Is it then possible to 
estimate the Poisson-lambda for t1 using f, then draw a random number 
from that distribution, and now treat that drawn figure as (fixed) input 
for calculation for t2, and so on, repeating it until t6 ? At the end, 
propose new parameters based on MCMC sampling, and repeat all over again.
Will a long sequence of MCMC iterations homogenize these fake 
(simulated) realizations of the poisson-distributions, and so the chain 
will converge to the posterior distribution of the parameters?



many many thanks for any inputs and thoughts,
cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting numeric to integer

2009-05-17 Thread Thomas Mang

Hi,

The problem is, x might be negative. If x == -6.999, the result 
should be -7, not -6. That's why my original proposal had the 
ifelse-condition (one could alternatively write sign(x) * 0.5, BTW.
I agree however that in my proposal, the round(x) is redundant, one can 
use x itself as left-hand argument for the sum operation. Is there 
however still a more 'elegant' way ?


thanks,
Thomas



Gabor Grothendieck wrote:

Try:

as.integer(x + 0.5)

assuming the calculation error is less than 0.5 .

On Sat, May 16, 2009 at 2:49 PM, Thomas Mang thomas.m...@fiwi.at wrote:

Hello,

Suppose I have x, which is a variable of class numeric. The calculations
performed to yield x imply that mathematically it should be an integer , but
due to round-off errors, it might not be (and so in either direction). The
error is however small, so round(x) will yield the appropriate integer
value. Moreover, this integer values is guaranteed to be representable by an
'integer' class, that is -2^31  x  2^31, and logically it is an integer
anyway. So I want to convert x from class 'numeric' to 'integer'. What is
the most elegant, but always correct way, to achieve this conversion ?

What comes to mind is of course something along:

x = as.integer(round(x))

I am, however, not sure if this always works, because I do not know if the
round-function is guaranteed to return a numeric value which, in finite
binary representation, is always = the underlying mathematical integer. If
that is however guaranteed, that would of course be a simple + elegant one.

An alternative I came up with is:

x = as.integer(round(x) + ifelse(x = 0, 0.5, -0.5))
Where I explicitly add a bit to ensure the finite binary representation must
be = the underlying integer, and then truncate the decimal digits.
IMO, this one is always guaranteed to work, at least within the numerical
range of what integers are limited to anyway.


What's your opinion on the issue ?
Any other solution ?

Thanks a lot in advance and cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] converting numeric to integer

2009-05-16 Thread Thomas Mang

Hello,

Suppose I have x, which is a variable of class numeric. The calculations 
performed to yield x imply that mathematically it should be an integer , 
but due to round-off errors, it might not be (and so in either 
direction). The error is however small, so round(x) will yield the 
appropriate integer value. Moreover, this integer values is guaranteed 
to be representable by an 'integer' class, that is -2^31  x  2^31, and 
logically it is an integer anyway. So I want to convert x from class 
'numeric' to 'integer'. What is the most elegant, but always correct 
way, to achieve this conversion ?


What comes to mind is of course something along:

x = as.integer(round(x))

I am, however, not sure if this always works, because I do not know if 
the round-function is guaranteed to return a numeric value which, in 
finite binary representation, is always = the underlying mathematical 
integer. If that is however guaranteed, that would of course be a simple 
+ elegant one.


An alternative I came up with is:

x = as.integer(round(x) + ifelse(x = 0, 0.5, -0.5))
Where I explicitly add a bit to ensure the finite binary representation 
must be = the underlying integer, and then truncate the decimal digits.
IMO, this one is always guaranteed to work, at least within the 
numerical range of what integers are limited to anyway.



What's your opinion on the issue ?
Any other solution ?

Thanks a lot in advance and cheers,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] changing function argument

2009-03-13 Thread Thomas Mang

Hi,

I wonder if the following is possible in R:

Suppose a function takes an argument, and wants to modify this argument 
so that the change is visible _at the call side_. It would be what is 
for example known as pass-by-reference-to-non-const in C++.


test - function(x)
{
 x - 10
 ...
 return (somethingElse) # I do NOT want having to return x
}

number = 5
test(number)
stopifnot(number == 10)


Is there some trick with scoping rules and maybe - to achieve this ?

thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing function argument

2009-03-13 Thread Thomas Mang

Dieter Menne wrote:

Thomas Mang thomas.mang at fiwi.at writes:


I wonder if the following is possible in R:

Suppose a function takes an argument, and wants to modify this argument 
so that the change is visible _at the call side_. It would be what is 
for example known as pass-by-reference-to-non-const in C++.


test - function(x)
{
  x - 10
  ...
  return (somethingElse) # I do NOT want having to return x
}

number = 5
test(number)
stopifnot(number == 10)

Is there some trick with scoping rules and maybe - to achieve this ?


Yes, it is possible. No, don't do it (coming from the c++ world, I understand
your temptation). Try to look for assign to see the attached flames.


Hi,
Well the point is I really really want to do it - and believe me, I know 
what I am doing ;) (for that matter, this is done solely in 
implementation code - of course I wouldn't do such a thing in a publicly 
used interface).
I know assign, and I suppose the trick is something to pass the name of 
the variable at the call side, as well as the environment of the call 
side, as additional argument to my test-function, so it knows what to 
change where. Is that correct ? Is there some more short-hand / direct / 
elegant way of achieving my goal ?


thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difference between assignment syntax - vs =

2009-02-23 Thread Thomas Mang

Hi,

thanks for the link.

In the bottom part of the relevant section, you say:
Standard advice is to avoid using '=' when you mean '-'
Is this a formal, generally accepted (R community) advice, or does it 
reflect you personal opinion?
Note I am not asking this question as to criticize by any means, but 
instead I just want to know for my own work (which will be partially 
released to others) if the '-' - style is the preferred one 
(Personally, with a strong background from other programming languages, 
I have always used '=' so far).


thanks,
Thomas



Patrick Burns wrote:

'The R Inferno' page 78 is one source you can
look at.


Patrick Burns
patr...@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of The R Inferno and A Guide for the Unwilling S User)

Thomas Mang wrote:

Hi,

Both operators - and = can be used to make an assignment. My question 
is: Is there a semantic difference between these two? Some time ago, I 
remember I have read that because of some reason, one should be given 
preference over the other - but I cannot remember the source, nor the 
argument, nor which operator the preferred was.


What is the present state ?
Is still one version better than the other, or is it only a matter of 
taste what to use ?


thanks
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] difference between assignment syntax - vs =

2009-02-21 Thread Thomas Mang

Hi,

Both operators - and = can be used to make an assignment. My question 
is: Is there a semantic difference between these two? Some time ago, I 
remember I have read that because of some reason, one should be given 
preference over the other - but I cannot remember the source, nor the 
argument, nor which operator the preferred was.


What is the present state ?
Is still one version better than the other, or is it only a matter of 
taste what to use ?


thanks
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] which test-statistic to use for quasibinomial GLMs?

2009-02-18 Thread Thomas Mang

Hi,

I have fitted quasibinomial GLM [glm(y ~ ..., family = quasibinomial)] 
to a binary response variable; quasibinomial, because there were clear 
signs of underdispersion in a 'simple' binomial GLM, and so the 
dispersion is a free parameter in the model.


My question is now: In a quasi-binomial model with a binary-only 
response variable, what are the most appropriate tests to compare 
different models? I have studied Faraway's book (Extending the linear 
Model with R) and concluded a likelihood-ratio test seems to be 
inappropriate, as seems to be the Wald-test. In chapter 7 an F-test is 
suggested, but this refers to an example with a beta-distributed response.

Can I conclude that the following code example will be fine in my case:
anova(model1, model2, test= F) ?

Moreover, the summary of the GLM, including parameters of the 
predictors, shall be presented. The summary method however does not 
conduct an F test; so in sync with my ideas above, shall I also use 
F-tests for the individual predictors (personally, I would, but as I am 
not sure I ask here...)?


Thanks a lot,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generalized mixed model + mcmcsamp

2009-02-11 Thread Thomas Mang

Hi,

I have fitted a generalized linear mixed effects model using lmer 
(library lme4), and the family = quasibinomial. I have tried to obtain a 
MCMC sample, but on calling mcmcsamp(model1, 1000) I get the following 
error which I don't understand at all:


Error in .local(object, n, verbose, ...) : Update not yet written

traceback() delivers:
4: .Call(mer_MCMCsamp, ans, object)
3: .local(object, n, verbose, ...)
2: mcmcsamp(model1, n = 1000, verbose = FALSE)
1: mcmcsamp(model1, n = 1000, verbose = FALSE)

which again doesn't particularly help me.


R is 2.8.1 under Windows, lme4 clean installed just today.
Before the model is fitted I just read in data, and transform some 
variables. No other library is loaded.


Any ideas ?

thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrapping in regression

2009-01-29 Thread Thomas Mang

Hi,

Please apologize if my questions sounds somewhat 'stupid' to the trained 
and experienced statisticians of you. Also I am not sure if I used all 
terms correctly, if not then corrections are welcome.


I have asked myself the following question regarding bootstrapping in 
regression:
Say for whatever reason one does not want to take the p-values for 
regression coefficients from the established test statistics 
distributions (t-distr for individual coefficients, F-values for 
whole-model-comparisons), but instead apply a more robust approach by 
bootstrapping.


In the simple linear regression case, one possibility is to randomly 
rearrange the X/Y data pairs, estimate the model and take the 
beta1-coefficient. Do this many many times, and so derive the null 
distribution for beta1. Finally compare beta1 for the observed data 
against this null-distribution.


What I now wonder is how the situation looks like in the multiple 
regression case. Assume there are two predictors, X1 and X2. Is it then 
possible to do the same, but just only rearranging the values of one 
predictor (the one of interest) at a time? Say I want again to test 
beta1. Is it then valid to many times randomly rearrange the X1 data 
(and keeping Y and X2 as observed), fit the model, take the beta1 
coefficient, and finally compare the beta1 of the observed data against 
the distributions of these beta1s ?
For X2, do the same, randomly rearrange X2 all the time while keeping Y 
and X1 as observed etc.

Is this valid ?

Second, if this is valid for the 'normal', fixed-effects only 
regression, is it also valid to derive null distributions for the 
regression coefficients of the fixed effects in a mixed model this way? 
Or does the quite different parameters estimation calculation forbid 
this approach (Forbid in the sense of bogus outcome) ?


Thanks, Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrapping in regression

2009-01-29 Thread Thomas Mang

Greg Snow wrote:

What you are describing is actually a permutation test rather than a bootstrap 
(related concepts but with a subtle but important difference).

The way to do a permutation test with multiple x's is to fit the reduced model 
(use all x's other than x1 if you want to test x1) on the original data and 
store the fitted values and the residuals.

Permute the residuals (randomize their order) and add them back to the fitted 
values and fit the full model (including x1 this time) to the permuted data 
set.  Do this a bunch of times and it will give you the sampling distribution 
for the slope on x1 (or whatever your set of interest is) when the null 
hypothesis that it is 0 given the other variables in the model is true.


Hi,

Thanks to you and Tom for the correction regarding bootstrapping vs 
permutation, and to Chuck for the cool link. Yes of course I described a 
permutation.


I have a question here: I am not sure if I understand your 'fit the full 
model ... to the permuted data set'. Am I correct to suppose that once 
the residuals of the reduced-model fit have been permuted and added back 
to the fitted values, the values obtained this way (fitted + permuted 
residuals) now constitute the new y-values to which the full model is 
fitted? Is that correct ?

Do you know if this procedure is also valid for a mixed-effects model ?

thanks a lot,
Thomas



Permuting just x1 only works if x1 is orthogonal to all the other predictors, 
otherwise the permuting destroys the relationship with the other predictors and 
does not do the test you want.

Bootstrapping depends on sampling with replacement, not permuting, and is used 
more for confidence intervals than for tests (the reference by John Fox given 
to you in another reply can help if that is the approach you want to take).

Hope this helps,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] .C and 'temporaries'

2008-12-23 Thread Thomas Mang

Hello,

Before I get into troubles I ask here:

I make a call to a C function using .C. What I would like to know is if 
the arguments in the .C call can also be 'temporary' objects.


An example will illustrate this:

# here we don't have a temporary
Obj1 = 5
.C(Func, as.integer(Obj1 ), ...) # certainly works

# here we do have a temporary
Obj2 = 5
.C(Func, as.integer(Obj2 + 100), ...) # is this guaranteed to work?


Is the second call valid?
Is it also valid if DUP = FALSE, or only if DUP = TRUE ?

Thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] timezone attribute lost during operations

2008-11-21 Thread Thomas Mang

Hi,

I was just *highly* surprised to find out that R 2.8.0 (Windows XP) 
changes the timezone-interpretation during operations on time data, as 
apparently the timezone attribute is lost and then, for the next 
interpretation of the timezone, the system settings are used.


Here is sample code (executed under a platform with the system timezone 
managed by Windows set to CET, but note that as the input data is GMT 
I also want all the calculations to occur in GMT):


# input data
Time = as.POSIXct(strptime(c(2007-12-12 14:30:15, 2008-04-14 
15:31:34, 2008-04-14 15:31:34), format = %Y-%m-%d %H:%M:%S, tz = 
GMT))

Time  # OK, time zone is GMT
attr(Time, tzone)  # OK, time zone is GMT



TApply = tapply(1:3, Time, max)
names(TApply) # wrong, names are converted to time zone of system

UTime = unique(Time)
UTime  # wrong, again time zone of system is used
attr(UTime, tzone)  # == NULL


Now the issue is not that I wouldn't know how to solve the problem (by 
setting TZ to GMT prior to executing the calculations), but I wonder 
why is R doing this mess at all? Why is it not able to maintain the 
timezone-information stored in my original vector Time ?

Is this behavior supposed to be a feature, or is it a plain bug ?

Thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.