Re: [R] [Q] BIC as a goodness-of-fit stat

2006-03-08 Thread Vincent Zoonekynd
The following article gives some motivations for the BIC and
some rules to interpret a difference of BICs (if I recall correctly,
over 10: very strong difference, between 6 and 10: strong
difference, between 2 and 6: some difference):

  A.E. Raftery, Bayesian model selection in social research
  http://www.stat.washington.edu/tech.reports/bic.ps

Regards,

-- Vincent

On 06/03/06, Young-Jin Lee <[EMAIL PROTECTED]> wrote:
> Dear R-List
>
> I have a question about how to interpret BIC as a goodness-of-fit statistic.
> I was trying to use "EMclust" and other "mclust" library and found that BIC
> was used as a goodness-of-fit statistic.
> Although I know that smaller BIC indicates a better fit, it is not clear to
> me how good a fit is by reading a BIC number. Is there a standard way of
> interpreting a BIC value?
>
> Thanks in advance.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Statistics with R

2005-08-28 Thread Vincent ZOONEKYND
Dear list,

One year ago, some of you had wished for an English version
of my web page "Statistiques avec R". The translation is now
completed. As the French version, this document is still
unfinished, probably full of mistakes -- but amply
illustrated.

For those of you who had not browsed through the previous
version, these are merely the notes I took while discovering
statistics and using R, with as many pictures as possible
(over a thousand).

  http://zoonek2.free.fr/UNIX/48_R/all.html

Regards,

-- Vincent

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: density estimation

2005-04-23 Thread Vincent ZOONEKYND
The command
  help.search("density")
which you should have tried if you had read the posting guide, returns,
among others, "kde2d", in package MASS.

-- Vincent


On 4/22/05, Bernard Palagos <[EMAIL PROTECTED]> wrote:
> hello
> sorry for my english
> I would like  estimate density  for multivariate variable,( f(x,y) , f(x,y 
> ,z) for example) ; for calculate mutual information
> how is posible with R?
> thanks
> Bernard
> 
> Bernard Palagos
> Unité Mixte de Recherche Cemagref - Agro.M - CIRAD
> Information et Technologie pour les Agro-Procédés
> Cemagref - BP 5095
> 34033 MONTPELLIER Cedex 1
> France
> http://www.montpellier.cemagref.fr/teap/default.htm
> Tel: 04 67 04 63 13
> Fax: 04 67 04 37 82
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: an interesting qqnorm question

2005-04-23 Thread Vincent ZOONEKYND
If I understand your problem, you are computing the difference between
your data and the quantiles of a standard gaussian variable -- in
other words, the difference between the data and the red line, in the
following picture.

  N <- 100  # Sample size
  m <- 1# Mean
  s <- 2# dispersion
  x <- m + s * rt(N, df=2)  # Non-gaussian data

  qqnorm(x)
  abline(0,1, col="red") 

And you get 

  y <- sort(x) - qnorm(ppoints(N))
  hist(y)

This is probably not the right line (not only because your mean is 1, 
the slope is wrong as well -- if the data were gaussian, you could
estimate it with the standard deviation).

You can use the "qqline" function to get the line passing throught the
first and third quartiles, which is probably closer to what you have
in mind.

  qqnorm(x)
  abline(0,1, col="red") 
  qqline(x, col="blue")

The differences are 

  x1 <- quantile(x, .25)
  x2 <- quantile(x, .75)
  b <- (x2-x1) / (qnorm(.75)-qnorm(.25))
  a <- x1 - b * qnorm(.25)
  y <- sort(x) - (a + b * qnorm(ppoints(N)))
  hist(y)

And you want to know when the differences ceases to be "significantly"
different from zero.

  plot(y)
  abline(h=0, lty=3)

You can use the plot fo fix a threshold, but unless you have a model
describing how non-gaussian you data are, this will be empirical. 

You will note that, in those simulations, the differences (either
yours or those from the lines through the first and third quartiles)
are not gaussian at all.

-- Vincent


On 4/22/05, WeiWei Shi <[EMAIL PROTECTED]> wrote:
> hope it is not b/c some central limit therory, otherwise my initial
> plan will fail :)
> 
> On 4/22/05, WeiWei Shi <[EMAIL PROTECTED]> wrote:
> > Hi, r-gurus:
> > 
> > I happened to have a question in my work:
> > 
> > I have a dataset, which has only one dimention, like
> > 0.99037297527605
> > 0.991179836732708
> > 0.995635340631367
> > 0.997186769599305
> > 0.991632565640424
> > 0.984047197106486
> > 0.99225943762649
> > 1.00555642128421
> > 0.993725402926564
> > 
> > 
> > the data is saved in a file called f392.txt.
> > 
> > I used the following codes to play around :)
> > 
> > k<-read.table("f392.txt", header=F)# read into k
> > kk<-k[[1]]
> > l<-qqnorm(kk)
> > diff=c()
> > lenk<-length(kk)
> > i=1
> > while (i<=lenk){
> > diff[i]=l$y[i]-l$x[i]   # save the difference of therotical quantile
> > and sample quantile
> ># remember, my sample mean is around 1
> > while the therotical one, 0
> > i<-i+1
> > }
> > hist(diff, breaks=300)  # analyze the distr of such diff
> > qqnorm(diff)
> > 
> > my question is:
> > from l<-qqnorm(kk), I wanted to know, from which point (or cut), the
> > sample points start to become away from therotical ones. That's the
> > reason I played around the "diff" list, which gives me the difference.
> > To my surprise, the diff is perfectly normal. I tried to use some
> > kk<-c(1, 2, 5, -1 , ...) to test, I concluded it must be some
> > distribution my sample follows gives this finding.
> > 
> > So, any suggestion on the distribution of my sample?   I think there
> > might be some mathematical inference which can leads this observation,
> > but not quite sure.
> > 
> > btw,
> > > fitdistr(kk, 't')
> > m  s  df
> >   9.65e-01   7.630770e-03   3.742244e+00
> >  (5.317674e-05) (5.373884e-05) (8.584725e-02)
> > 
> > btw2, can anyone suggest a way to find the "cut" or "threshold" from
> > my sample to discretize them into 3 groups: two tail-group and one
> > main group.- my focus.
> > 
> > Thanks,
> > 
> > Ed
> >
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html