I know it only causes a shift and scale transformation of the distribution, but it always gets my goat when I see people using the sum as a statistic of interest when the mean has a more general application (i.e. some other researcher obtains a different number of observations per subject) and a more meaningful interpretation (i.e. the mean is the answer you'd expect, on average, from the participant on any given question; the sum is ... ?)
An acceptable method of fitting empirical data to theoretical distributions is maximum likelihood estimation. Sometimes there are analytic solutions, sometimes you have to use a search algorithm like optimize() (for 1 parameter distributions) or optim() (for multiple parameter distributions) to find the solution. Say you hypothesized that the red data came from an exponential distribution, you would define a function that obtains the sum log likelihood of the data given a specific rate parameter, then search a reasonable range of rate parameters for that which maximuzes the sum log liklihood: exp.sll = function(rate,x){ log.lik = dexp(x,rate,log=TRUE) sum.log.lik = sum(log.lik) print(c(rate,sum.log.lik)) return(sum.log.lik) } a=rexp(150,rate=10) hist(a,probability=TRUE) best.rate = optimize( exp.sll ,x=a ,lower=0 ,upper=1/mean(a)*10 ,maximum=TRUE )$maximum curve(dexp(x,rate=best.rate),col='red',add=T) On Wed, Mar 4, 2009 at 12:01 PM, Simone Gabbriellini <simone.gabbriell...@gmail.com> wrote: > Dear R-Experts, > > I have an empirical dataset with 150 subjects for 24 observations. > In each observation, each subject can have a score in the range 0:3. > > I made then a simple index making the sum of the values in each row, > so each subject have a score between 0 and 72. > > I was thinking about what kind of theoretical distribution such an > index should have, so I try to make things random like: > > data<-array(0,c(150,24)) > data2<-array(0, c(150, 100)) > for (prova in 1:100){ > for (i in 1:24){ > for (q in 1:150){ > data[q,i]<-sample(0:3, 1) > } > } > for (riga in 1:150){ > data2[riga,prova]<-sum(data[riga,]) > } > } > > now here you can find the plotted theoretical values (black) against > the empirical ones (red): > > http://www.digitaldust.it/papers/indice.png > > How can I estimate the density of both distribution? because the red > one looks like a pareto distribution, but I am not an expert... > > many thanks, > Simone > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.