[R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-12 Thread Jason Rupert
By any chance is any one aware of a website, book, paper, etc. or combinations 
of those sources that show plots of different distributions?

After reading a pretty good whitepaper I became aware of the benefit of I the 
benefit of doing Q-Q plots and histograms to help assess a distribution.   The 
whitepaper is called:
"Univariate Analysis and Normality Test Using SAS, Stata, and SPSS*" , © 
2002-2008 The Trustees of Indiana University Univariate Analysis and Normality 
Test: 1, Hun Myoung Park
 
Unfortunately the white paper does not provide an extensive amount of example 
distributions plotted using Q-Q plots and histograms, so I am curious if there 
is a "portfolio"-type  website or other whitepaper shows examples of various 
types of distributions. 

It would be helpful to see a bunch of Q-Q plots and their associated histograms 
to get an idea of how the distribution looks in comparison against the 
Gaussian. 

I think seeing the plot really helps. 

Thank you for any insights.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-12 Thread Juliet Hannah
You may find the qreference function in the DAAG package helpful. It
makes several QQ plots to give a sense of what kind of fluctuations
can be expected.

You can also construct a series of any plots you are interested in
(using different distributions), and modifying the code in this
function may help with this.

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert  wrote:
> By any chance is any one aware of a website, book, paper, etc. or 
> combinations of those sources that show plots of different distributions?
>
> After reading a pretty good whitepaper I became aware of the benefit of I the 
> benefit of doing Q-Q plots and histograms to help assess a distribution.   
> The whitepaper is called:
> "Univariate Analysis and Normality Test Using SAS, Stata, and SPSS*" , (c) 
> 2002-2008 The Trustees of Indiana University Univariate Analysis and 
> Normality Test: 1, Hun Myoung Park
>
> Unfortunately the white paper does not provide an extensive amount of example 
> distributions plotted using Q-Q plots and histograms, so I am curious if 
> there is a "portfolio"-type  website or other whitepaper shows examples of 
> various types of distributions.
>
> It would be helpful to see a bunch of Q-Q plots and their associated 
> histograms to get an idea of how the distribution looks in comparison against 
> the Gaussian.
>
> I think seeing the plot really helps.
>
> Thank you for any insights.
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-12 Thread Jason Rupert
Thank you for the guidance.   

I gave the qreference, but I guess I don't get it.  

What are the other plots that are generated?  It is my understanding that 
qreference only produces normal QQ plots,
so should the take away be if my data distribution, the first plot,
isn't similar to any of the other Q-Q plots, then it is not normal?

I was hoping to be able to compare my distribution in Q-Q Plot  against images 
of other distributions in Q-Q Plot distributions.  That is I would like to 
determine if my distribution determine if my distribution more closely 
resembles an exponential, Weibull, Normal, Log-Normal, etc. using the Q-Q 
Plot.  Not sure if there is an existing package or function to do this or if 
trail by error is the best approach.  From the below, it sounds like I would 
have to construct a series of Q-Q plots and manually compare them against my 
distribution.      

Thank you again for any further insights. 
  

     

--- On Thu, 2/12/09, Juliet Hannah  wrote:
From: Juliet Hannah 
Subject: Re: [R] Website, book, paper, etc. that shows example plots of  
distributions?
To: jasonkrup...@yahoo.com
Cc: R-help@r-project.org
Date: Thursday, February 12, 2009, 1:00 PM

You may find the qreference function in the DAAG package helpful. It
makes several QQ plots to give a sense of what kind of fluctuations
can be expected.

You can also construct a series of any plots you are interested in
(using different distributions), and modifying the code in this
function may help with this.

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert 
wrote:
> By any chance is any one aware of a website, book, paper, etc. or
combinations of those sources that show plots of different distributions?
>
> After reading a pretty good whitepaper I became aware of the benefit of I
the benefit of doing Q-Q plots and histograms to help assess a distribution.  
The whitepaper is called:
> "Univariate Analysis and Normality Test Using SAS, Stata, and
SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
Analysis and Normality Test: 1, Hun Myoung Park
>
> Unfortunately the white paper does not provide an extensive amount of
example distributions plotted using Q-Q plots and histograms, so I am curious if
there is a "portfolio"-type  website or other whitepaper shows
examples of various types of distributions.
>
> It would be helpful to see a bunch of Q-Q plots and their associated
histograms to get an idea of how the distribution looks in comparison against
the Gaussian.
>
> I think seeing the plot really helps.
>
> Thank you for any insights.
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread Gabor Grothendieck
You can readily create a dynamic display for using qqplot and similar functions
in conjunction with either the playwith or TeachingDemos packages.

For example, to investigate the effect of the shape parameter in the skew
normal distribution on its qqplot relative to the normal distribution:

   library(playwith)
   library(sn)
   playwith(qqnorm(rsn(100, shape = shape)),
   parameters = list(shape = seq(-3, 3, .1)))

Now move the slider located at the bottom of the window that
appears and watch the plot change in response to changing
the shape value.

You can find more distributions here:
http://cran.r-project.org/web/views/Distributions.html

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert  wrote:
> By any chance is any one aware of a website, book, paper, etc. or 
> combinations of those sources that show plots of different distributions?
>
> After reading a pretty good whitepaper I became aware of the benefit of I the 
> benefit of doing Q-Q plots and histograms to help assess a distribution.   
> The whitepaper is called:
> "Univariate Analysis and Normality Test Using SAS, Stata, and SPSS*" , (c) 
> 2002-2008 The Trustees of Indiana University Univariate Analysis and 
> Normality Test: 1, Hun Myoung Park
>
> Unfortunately the white paper does not provide an extensive amount of example 
> distributions plotted using Q-Q plots and histograms, so I am curious if 
> there is a "portfolio"-type  website or other whitepaper shows examples of 
> various types of distributions.
>
> It would be helpful to see a bunch of Q-Q plots and their associated 
> histograms to get an idea of how the distribution looks in comparison against 
> the Gaussian.
>
> I think seeing the plot really helps.
>
> Thank you for any insights.
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread Jason Rupert
Thank you very much.  Thank you again regarding the suggestion below.  I will 
give that a shot and I guess I've got my work counted out for me.  I counted 45 
different distributions.  

Is the best way to get a QQPlot of each, to run through producing a data set 
for each distribution and then using the qqplot function to get a QQplot of the 
distribution and then compare it with my data distribution? 

As you can tell I am not a trained statistician, so any guidance or suggested 
further reading is greatly appreciated.  

I guess I am pretty sure my data is not a normal distribution due to doing some 
of the empirical "Goodness of Fit" tests and comparing the QQplot of my data 
against the QQPlot of a normal distribution with the same number of points.  I 
guess the next step is to figure out which distribution my data most closely 
matches.  

Also, I guess I could also fool around and take the log, sqrt, etc. of my data 
and see if it will then more closely resemble a normal distribution.   

Thank you again for assisting this novice data analyst who is trying to gain a 
better understanding of the techniques using this powerful software package.  




--- On Fri, 2/13/09, Gabor Grothendieck  wrote:
From: Gabor Grothendieck 
Subject: Re: [R] Website, book, paper, etc. that shows example plots of  
distributions?
To: jasonkrup...@yahoo.com
Cc: R-help@r-project.org
Date: Friday, February 13, 2009, 5:43 AM

You can readily create a dynamic display for using qqplot and similar functions
in conjunction with either the playwith or TeachingDemos packages.

For example, to investigate the effect of the shape parameter in the skew
normal distribution on its qqplot relative to the normal distribution:

   library(playwith)
   library(sn)
   playwith(qqnorm(rsn(100, shape = shape)),
   parameters = list(shape = seq(-3, 3, .1)))

Now move the slider located at the bottom of the window that
appears and watch the plot change in response to changing
the shape value.

You can find more distributions here:
http://cran.r-project.org/web/views/Distributions.html

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert 
wrote:
> By any chance is any one aware of a website, book, paper, etc. or
combinations of those sources that show plots of different distributions?
>
> After reading a pretty good whitepaper I became aware of the benefit of I
the benefit of doing Q-Q plots and histograms to help assess a distribution.  
The whitepaper is called:
> "Univariate Analysis and Normality Test Using SAS, Stata, and
SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
Analysis and Normality Test: 1, Hun Myoung Park
>
> Unfortunately the white paper does not provide an extensive amount of
example distributions plotted using Q-Q plots and histograms, so I am curious if
there is a "portfolio"-type  website or other whitepaper shows
examples of various types of distributions.
>
> It would be helpful to see a bunch of Q-Q plots and their associated
histograms to get an idea of how the distribution looks in comparison against
the Gaussian.
>
> I think seeing the plot really helps.
>
> Thank you for any insights.
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread Gabor Grothendieck
You might also want to look at the idealized situation:

library(playwith)
library(sn)

playwith(qqnorm(qsn(1:99/100, shape = shape)),
   parameters = list(shape = seq(-3, 3, .1)))


On Fri, Feb 13, 2009 at 6:43 AM, Gabor Grothendieck
 wrote:
> You can readily create a dynamic display for using qqplot and similar 
> functions
> in conjunction with either the playwith or TeachingDemos packages.
>
> For example, to investigate the effect of the shape parameter in the skew
> normal distribution on its qqplot relative to the normal distribution:
>
>   library(playwith)
>   library(sn)
>   playwith(qqnorm(rsn(100, shape = shape)),
>   parameters = list(shape = seq(-3, 3, .1)))
>
> Now move the slider located at the bottom of the window that
> appears and watch the plot change in response to changing
> the shape value.
>
> You can find more distributions here:
> http://cran.r-project.org/web/views/Distributions.html
>
> On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert  wrote:
>> By any chance is any one aware of a website, book, paper, etc. or 
>> combinations of those sources that show plots of different distributions?
>>
>> After reading a pretty good whitepaper I became aware of the benefit of I 
>> the benefit of doing Q-Q plots and histograms to help assess a distribution. 
>>   The whitepaper is called:
>> "Univariate Analysis and Normality Test Using SAS, Stata, and SPSS*" , (c) 
>> 2002-2008 The Trustees of Indiana University Univariate Analysis and 
>> Normality Test: 1, Hun Myoung Park
>>
>> Unfortunately the white paper does not provide an extensive amount of 
>> example distributions plotted using Q-Q plots and histograms, so I am 
>> curious if there is a "portfolio"-type  website or other whitepaper shows 
>> examples of various types of distributions.
>>
>> It would be helpful to see a bunch of Q-Q plots and their associated 
>> histograms to get an idea of how the distribution looks in comparison 
>> against the Gaussian.
>>
>> I think seeing the plot really helps.
>>
>> Thank you for any insights.
>>
>>
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread David Winsemius
This is probably the right time to issue a warning about the error of  
making transformations on the dependent variable before doing your  
analysis. The classic error that newcomers to statistics commit is to  
decide that they want to "make their data normal". The assumptions of  
most regression methods is that the *errors* need to have the desired  
relationship between means and variance, and not that the dependent  
variable be "normal". Many times the apparent non-normality will be  
"explained" or "captured" by the regression model. Other methods of  
modeling non-linear dependence are also available.


I found Harrell's book "Regression Modeling Strategies" to be an  
excellent source for alternatives. My copy of V&R's MASS is only the  
second edition but chapters 5 & 6 in that edition on linear models  
also had examples of using QQ plots on residuals. Checking that text's  
website I see that chapters 6 at least is probably similar. They  
include the scripts from their chapters along with the MASS package  
(installed as part of the VR bundle). My copy is entitled "ch06.r" and  
resides in the scripts subdirectory:
/Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/ 
scripts/ch06.R


--
David Winsemius


On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:

Thank you very much.  Thank you again regarding the suggestion  
below.  I will give that a shot and I guess I've got my work counted  
out for me.  I counted 45 different distributions.


Is the best way to get a QQPlot of each, to run through producing a  
data set for each distribution and then using the qqplot function to  
get a QQplot of the distribution and then compare it with my data  
distribution?


As you can tell I am not a trained statistician, so any guidance or  
suggested further reading is greatly appreciated.


I guess I am pretty sure my data is not a normal distribution due to  
doing some of the empirical "Goodness of Fit" tests and comparing  
the QQplot of my data against the QQPlot of a normal distribution  
with the same number of points.  I guess the next step is to figure  
out which distribution my data most closely matches.


Also, I guess I could also fool around and take the log, sqrt, etc.  
of my data and see if it will then more closely resemble a normal  
distribution.


Thank you again for assisting this novice data analyst who is trying  
to gain a better understanding of the techniques using this powerful  
software package.





--- On Fri, 2/13/09, Gabor Grothendieck   
wrote:

From: Gabor Grothendieck 
Subject: Re: [R] Website, book, paper, etc. that shows example plots  
of  distributions?

To: jasonkrup...@yahoo.com
Cc: R-help@r-project.org
Date: Friday, February 13, 2009, 5:43 AM

You can readily create a dynamic display for using qqplot and  
similar functions

in conjunction with either the playwith or TeachingDemos packages.

For example, to investigate the effect of the shape parameter in the  
skew

normal distribution on its qqplot relative to the normal distribution:

  library(playwith)
  library(sn)
  playwith(qqnorm(rsn(100, shape = shape)),
  parameters = list(shape = seq(-3, 3, .1)))

Now move the slider located at the bottom of the window that
appears and watch the plot change in response to changing
the shape value.

You can find more distributions here:
http://cran.r-project.org/web/views/Distributions.html

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert 
wrote:

By any chance is any one aware of a website, book, paper, etc. or
combinations of those sources that show plots of different  
distributions?


After reading a pretty good whitepaper I became aware of the  
benefit of I
the benefit of doing Q-Q plots and histograms to help assess a  
distribution.

The whitepaper is called:

"Univariate Analysis and Normality Test Using SAS, Stata, and

SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
Analysis and Normality Test: 1, Hun Myoung Park


Unfortunately the white paper does not provide an extensive amount of
example distributions plotted using Q-Q plots and histograms, so I  
am curious if

there is a "portfolio"-type  website or other whitepaper shows
examples of various types of distributions.


It would be helpful to see a bunch of Q-Q plots and their associated
histograms to get an idea of how the distribution looks in  
comparison against

the Gaussian.


I think seeing the plot really helps.

Thank you for any insights.



  [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__

Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread Greg Snow
Why do you care what distribution your data comes from?

That is a serious question, the more we know about what your actual 
question/goal is, the more we can help.  It is a common mistake for people who 
know enough statistics to be dangerous to focus on the distribution of the data 
rather than the question of interest.

Many of the traditional statistical tests/models based on the assumption of 
normality are still useful when the data does not follow a normal distribution 
as long as the sample size is large enough.  If the above does not hold, there 
are often alternative tests/methods that don't rely on a specific known 
distribution.  A simple transformation may get you close enough.

David mentioned that some cases it is the distribution of the errors, not the 
original data that matters.  Some people mistakenly think that the explanatory 
variables need to be normal in a regression as well, but that is not needed.

For finding transformations that get you closer to normal, look at the boxcox 
function in the MASS package and possibly the vis.boxcox and vis.boxcoxu 
functions in the TeachingDemos package (and the paper referenced there).

If you really want to know the distribution of the data, you should start with 
the science, not the data and examples.  Random chance can make data from one 
distribution look like it comes from a different but similar one.  Start with 
the nature of the problem (without looking at the data), will the values be 
discrete or continuous?, is there a lower/upper limit on the values possible?  
Is it likely to be skewed (have extreme values in one direction)?  What 
distributions are commonly used in this area?  Etc.  Answering those questions 
can narrow down the candidates.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Jason Rupert
> Sent: Friday, February 13, 2009 6:12 AM
> To: Gabor Grothendieck
> Cc: R-help@r-project.org
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> 
> Thank you very much.  Thank you again regarding the suggestion below.
> I will give that a shot and I guess I've got my work counted out for
> me.  I counted 45 different distributions.
> 
> Is the best way to get a QQPlot of each, to run through producing a
> data set for each distribution and then using the qqplot function to
> get a QQplot of the distribution and then compare it with my data
> distribution?
> 
> As you can tell I am not a trained statistician, so any guidance or
> suggested further reading is greatly appreciated.
> 
> I guess I am pretty sure my data is not a normal distribution due to
> doing some of the empirical "Goodness of Fit" tests and comparing the
> QQplot of my data against the QQPlot of a normal distribution with the
> same number of points.  I guess the next step is to figure out which
> distribution my data most closely matches.
> 
> Also, I guess I could also fool around and take the log, sqrt, etc. of
> my data and see if it will then more closely resemble a normal
> distribution.
> 
> Thank you again for assisting this novice data analyst who is trying to
> gain a better understanding of the techniques using this powerful
> software package.
> 
> 
> 
> 
> --- On Fri, 2/13/09, Gabor Grothendieck 
> wrote:
> From: Gabor Grothendieck 
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> To: jasonkrup...@yahoo.com
> Cc: R-help@r-project.org
> Date: Friday, February 13, 2009, 5:43 AM
> 
> You can readily create a dynamic display for using qqplot and similar
> functions
> in conjunction with either the playwith or TeachingDemos packages.
> 
> For example, to investigate the effect of the shape parameter in the
> skew
> normal distribution on its qqplot relative to the normal distribution:
> 
>library(playwith)
>library(sn)
>playwith(qqnorm(rsn(100, shape = shape)),
>parameters = list(shape = seq(-3, 3, .1)))
> 
> Now move the slider located at the bottom of the window that
> appears and watch the plot change in response to changing
> the shape value.
> 
> You can find more distributions here:
> http://cran.r-project.org/web/views/Distributions.html
> 
> On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert 
> wrote:
> > By any chance is any one aware of a website, book, paper, etc. or
> combinations of those sources that show plots of different
> distributions?
> >
> > After reading a pretty good whitepaper I became aware of the benefit
> of I
> the benefit of doing Q-Q plots and histograms to help assess a

Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-13 Thread davidr
Jason,
Just to answer your direct question, there is Mathowrld.wolfram.com,
where there are 87 continuous distributions listed.
I have also used the book Statistical Distributions, 2nd ed, Merran Evans, et 
al.
which has most of the usual distributions with pictures and relationships.

Of course all of the advice about really thinking about what you are trying to 
accomplish is right on target.
HTH,
-- David


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jason Rupert
Sent: Friday, February 13, 2009 7:12 AM
To: Gabor Grothendieck
Cc: R-help@r-project.org
Subject: Re: [R] Website, book, paper,etc. that shows example plots of 
distributions?

Thank you very much.  Thank you again regarding the suggestion below.  I will 
give that a shot and I guess I've got my work counted out for me.  I counted 45 
different distributions.  

Is the best way to get a QQPlot of each, to run through producing a data set 
for each distribution and then using the qqplot function to get a QQplot of the 
distribution and then compare it with my data distribution? 

As you can tell I am not a trained statistician, so any guidance or suggested 
further reading is greatly appreciated.  

I guess I am pretty sure my data is not a normal distribution due to doing some 
of the empirical "Goodness of Fit" tests and comparing the QQplot of my data 
against the QQPlot of a normal distribution with the same number of points.  I 
guess the next step is to figure out which distribution my data most closely 
matches.  

Also, I guess I could also fool around and take the log, sqrt, etc. of my data 
and see if it will then more closely resemble a normal distribution.   

Thank you again for assisting this novice data analyst who is trying to gain a 
better understanding of the techniques using this powerful software package.  




--- On Fri, 2/13/09, Gabor Grothendieck  wrote:
From: Gabor Grothendieck 
Subject: Re: [R] Website, book, paper, etc. that shows example plots of  
distributions?
To: jasonkrup...@yahoo.com
Cc: R-help@r-project.org
Date: Friday, February 13, 2009, 5:43 AM

You can readily create a dynamic display for using qqplot and similar functions
in conjunction with either the playwith or TeachingDemos packages.

For example, to investigate the effect of the shape parameter in the skew
normal distribution on its qqplot relative to the normal distribution:

   library(playwith)
   library(sn)
   playwith(qqnorm(rsn(100, shape = shape)),
   parameters = list(shape = seq(-3, 3, .1)))

Now move the slider located at the bottom of the window that
appears and watch the plot change in response to changing
the shape value.

You can find more distributions here:
http://cran.r-project.org/web/views/Distributions.html

On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert 
wrote:
> By any chance is any one aware of a website, book, paper, etc. or
combinations of those sources that show plots of different distributions?
>
> After reading a pretty good whitepaper I became aware of the benefit of I
the benefit of doing Q-Q plots and histograms to help assess a distribution.  
The whitepaper is called:
> "Univariate Analysis and Normality Test Using SAS, Stata, and
SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
Analysis and Normality Test: 1, Hun Myoung Park
>
> Unfortunately the white paper does not provide an extensive amount of
example distributions plotted using Q-Q plots and histograms, so I am curious if
there is a "portfolio"-type  website or other whitepaper shows
examples of various types of distributions.
>
> It would be helpful to see a bunch of Q-Q plots and their associated
histograms to get an idea of how the distribution looks in comparison against
the Gaussian.
>
> I think seeing the plot really helps.
>
> Thank you for any insights.
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-14 Thread Jason Rupert
Many thanks to Greg L. Snow and David Winsemius for their responses.  

First off I can safely say I don't know enough statistics to be dangerous, but 
hopefully I will get to that point:) 

Regarding the goal - ultimately I would like to use linear regression 
(constrained for using linear regression at this point) for my data.  I thought 
the requirements for using linear regression was the following (I pulled this 
list from 
www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt):

The assumptions required for utilizing a regression equation are the same as 
the assumptions for the test of significance of a correlation coefficient.
Both variables are interval level.
Both variables are normally distributed.
The relationship between the two variables is linear.
The variance of the values of the dependent variable is uniform for all values 
of the independent variable (equality of variance).

Thus, I was going to attempt to (1) identify which distribution my data most 
closely represents, (2) translate my data so that it is normal, and (3) then 
use linear regression on the data.  

However, if 
"The assumptions of most regression methods is that the *errors* need to have 
the desired relationship between means and variance, and not that the dependent 
variable be "normal". Many times the apparent non-normality will be "explained" 
or "captured" by the regression model."

Does this mean I can just "do" linear regression without translating my data 
and it will be okay?  

Note that I was using "lm" from R to access the errors, however, I had not an 
opportunity to do much analysis of those results to determine if they are 
Gaussian or not.   

I guess I am going to try to track down the following documents:
(1) Statistical Distributions (Paperback)
by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock (Author) 
# ISBN-10: 0471371246
# ISBN-13: 978-0471371243

(2) Regression Modeling Strategies (Hardcover)
by Frank E. Jr. Harrell (Author)
# ISBN-10: 0387952322
# ISBN-13: 978-0387952321

Maybe electronic versions of those documents are available.  My wife is already 
giving me a hard time the volume of books around.   

Thank you again for all your feedback and insights.  


--- On Fri, 2/13/09, David Winsemius  wrote:
From: David Winsemius 
Subject: Re: [R] Website, book, paper, etc. that shows example plots of  
distributions?
To: jasonkrup...@yahoo.com
Cc: "Gabor Grothendieck" , R-help@r-project.org
Date: Friday, February 13, 2009, 9:10 AM

This is probably the right time to issue a warning about the error of making
transformations on the dependent variable before doing your analysis. The
classic error that newcomers to statistics commit is to decide that they want to
"make their data normal". The assumptions of most regression methods
is that the *errors* need to have the desired relationship between means and
variance, and not that the dependent variable be "normal". Many times
the apparent non-normality will be "explained" or "captured"
by the regression model. Other methods of modeling non-linear dependence are
also available.

I found Harrell's book "Regression Modeling Strategies" to be an
excellent source for alternatives. My copy of V&R's MASS is only the
second edition but chapters 5 & 6 in that edition on linear models also had
examples of using QQ plots on residuals. Checking that text's website I see
that chapters 6 at least is probably similar. They include the scripts from
their chapters along with the MASS package (installed as part of the VR bundle).
My copy is entitled "ch06.r" and resides in the scripts subdirectory:
/Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/scripts/ch06.R

--David Winsemius


On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:

> Thank you very much.  Thank you again regarding the suggestion below.  I
will give that a shot and I guess I've got my work counted out for me.  I
counted 45 different distributions.
> 
> Is the best way to get a QQPlot of each, to run through producing a data
set for each distribution and then using the qqplot function to get a QQplot of
the distribution and then compare it with my data distribution?
> 
> As you can tell I am not a trained statistician, so any guidance or
suggested further reading is greatly appreciated.
> 
> I guess I am pretty sure my data is not a normal distribution due to doing
some of the empirical "Goodness of Fit" tests and comparing the QQplot
of my data against the QQPlot of a normal distribution with the same number of
points.  I guess the next step is to figure out which distribution my data most
closely matches.
> 
> Also, I guess I could also fool around and take the log, sqrt, etc. of my
data and see if it will then more

Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-14 Thread Gabor Grothendieck
The regression book by John Fox:
http://socserv.mcmaster.ca/jfox/Books/Companion/index.html
has a section on regression diagnostics and everything is done
in R which might make it particularly suitable.

On Sat, Feb 14, 2009 at 6:48 PM, Jason Rupert  wrote:
> Many thanks to Greg L. Snow and David Winsemius for their responses.
>
> First off I can safely say I don't know enough statistics to be dangerous,
> but hopefully I will get to that point:)
>
> Regarding the goal - ultimately I would like to use linear regression
> (constrained for using linear regression at this point) for my data.  I
> thought the requirements for using linear regression was the following (I
> pulled this list from
> www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt):
>
> The assumptions required for utilizing a regression equation are the same as
> the assumptions for the test of significance of a correlation coefficient.
> Both variables are interval level.
> Both variables are normally distributed.
> The relationship between the two variables is linear.
> The variance of the values of the dependent variable is uniform for all
> values of the independent variable (equality of variance).
>
> Thus, I was going to attempt to (1) identify which distribution my data most
> closely represents, (2) translate my data so that it is normal, and (3) then
> use linear regression on the data.
>
> However, if
> "The assumptions of most regression methods is that the *errors* need to
> have the desired relationship between means and variance, and not that the
> dependent variable be "normal". Many times the apparent non-normality will
> be "explained" or "captured" by the regression model."
>
> Does this mean I can just "do" linear regression without translating my data
> and it will be okay?
>
> Note that I was using "lm" from R to access the errors, however, I had not
> an opportunity to do much analysis of those results to determine if they are
> Gaussian or not.
>
> I guess I am going to try to track down the following documents:
> (1) Statistical Distributions (Paperback)
> by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock (Author)
> # ISBN-10: 0471371246
> # ISBN-13: 978-0471371243
>
> (2) Regression Modeling Strategies (Hardcover)
> by Frank E. Jr. Harrell (Author)
> # ISBN-10: 0387952322
> # ISBN-13: 978-0387952321
>
> Maybe electronic versions of those documents are available.  My wife is
> already giving me a hard time the volume of books around.
>
> Thank you again for all your feedback and insights.
>
>
> --- On Fri, 2/13/09, David Winsemius  wrote:
>
> From: David Winsemius 
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> To: jasonkrup...@yahoo.com
> Cc: "Gabor Grothendieck" , R-help@r-project.org
> Date: Friday, February 13, 2009, 9:10 AM
>
> This is probably the right time to issue a warning about the error of making
> transformations on the dependent variable before doing your analysis. The
> classic error that newcomers to statistics commit is to decide that they
> want to
> "make their data normal". The assumptions of most regression methods
> is that the *errors* need to have the desired relationship between means and
> variance, and not that the dependent variable be "normal". Many times
> the apparent non-normality will be "explained" or "captured"
> by the regression model. Other methods of modeling non-linear dependence are
> also available.
>
> I found Harrell's book "Regression Modeling Strategies" to be an
> excellent source for alternatives. My copy of V&R's MASS is only the
> second edition but chapters 5 & 6 in that edition on
>  linear models also had
> examples of using QQ plots on residuals. Checking that text's website I see
> that chapters 6 at least is probably similar. They include the scripts from
> their chapters along with the MASS package (installed as part of the VR
> bundle).
> My copy is entitled "ch06.r" and resides in the scripts subdirectory:
> /Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/scripts/ch06.R
>
> --David Winsemius
>
>
> On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:
>
>> Thank you very much.  Thank you again regarding the suggestion below.  I
> will give that a shot and I guess I've got my work counted out for me.  I
> counted 45 different distributions.
>>
>> Is the best way to get a QQPlot of each, to run through producing a data
> set for each distribution and then using the qqplot function to get a

Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-15 Thread David Winsemius

On Feb 14, 2009, at 6:48 PM, Jason Rupert wrote:

> Many thanks to Greg L. Snow and David Winsemius for their responses.
>
> First off I can safely say I don't know enough statistics to be  
> dangerous, but hopefully I will get to that point:)
>
> Regarding the goal - ultimately I would like to use linear  
> regression (constrained for using linear regression at this point)  
> for my data.  I thought the requirements for using linear regression  
> was the following (I pulled this list from 
> www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt)
>  
> :
>
> The assumptions required for utilizing a regression equation are the  
> same as the assumptions for the test of significance of a  
> correlation coefficient.
> Both variables are interval level.
> Both variables are normally distributed.
> The relationship between the two variables is linear.
> The variance of the values of the dependent variable is uniform for  
> all values of the independent variable (equality of variance).
>
> Thus, I was going to attempt to (1) identify which distribution my  
> data most closely represents, (2) translate my data so that it is  
> normal, and (3) then use linear regression on the data.
>
> However, if
> "The assumptions of most regression methods is that the *errors*  
> need to have the desired relationship between means and variance,  
> and not that the dependent variable be "normal". Many times the  
> apparent non-normality will be "explained" or "captured" by the  
> regression model."
>
> Does this mean I can just "do" linear regression without translating  
> my data and it will be okay?

Not exactly. It does mean that you can "just do" linear regression but  
then check to see if "it was OK". The model will have the residuals in  
the regression object and these can be displayed with a scatterplot  
(versus the individual predictor variables)  or as a QQ plot.

>
>
> Note that I was using "lm" from R to access the errors, however, I  
> had not an opportunity to do much analysis of those results to  
> determine if they are Gaussian or not.
>
> I guess I am going to try to track down the following documents:
> (1) Statistical Distributions (Paperback)
> by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock  
> (Author)
> # ISBN-10: 0471371246
> # ISBN-13: 978-0471371243
>
> (2) Regression Modeling Strategies (Hardcover)
> by Frank E. Jr. Harrell (Author)
> # ISBN-10: 0387952322
> # ISBN-13: 978-0387952321
>
> Maybe electronic versions of those documents are available.  My wife  
> is already giving me a hard time the volume of books around.

Frank Harrell's website has a lot of material that he makes available  
online;
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

snipped remainder

-- David Winsemius
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Website, book, paper, etc. that shows example plots of distributions?

2009-02-16 Thread Greg Snow
I had a Murphy's law calendar a while back with many different laws in it.  One 
of those laws was along the lines of:

An easily understood, simple falsehood is often more useful than a complicated, 
often misunderstood truth

(though the original was probably much better phrased than my memory).

Many rules in textbooks and classes follow this principle, especially when 
outside pressures force teachers to cover 4-6 hours of material in a 3 hour 
course.  The set of assumptions you list below are of this type.  They are a 
good simple place to start, and good enough for an introductory class, but a 
full discussion of the truth would take more time than is reasonable for an 
intro class.

Yes, the theory on which linear models is based was originally derived using 
the assumptions of normality, but linear models are amazingly robust, meaning 
that if the normality assumptions don't hold, the results (p-values, confidence 
intervals) will still usually be "close enough".  How "close" and if it is 
"enough" depends on sample size, how nonnormal the residuals are, and the 
specific question(s).

For regression, start by "doing" the regression, but then look at the 
diagnostic plots of the residuals (see ?plot.lm).  If you sample size is large 
and the residuals do not show strong skewness/outliers, then you are probably 
safe using the output of lm as is (Central Limit Thoerem, but still check other 
assumptions and make sure that what you are seeing/saying makes sense).  If 
there is more skewness/outliers than you are comfortable with, then there are 
robust methods that will be more helpful here.


Also note that if you know enough to find and use the lm function in R, then 
you know enough statistics to be dangerous (unless you are not allowed to make 
any decisions or communicate with anyone else (comma patients maybe)).  The 
goal now is to learn to use that power to do good, posting/reading here and 
Frank's book are a good start in that direction.

Hope this helps,
  

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Jason Rupert
> Sent: Saturday, February 14, 2009 4:48 PM
> To: David Winsemius
> Cc: R-help@r-project.org
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> 
> Many thanks to Greg L. Snow and David Winsemius for their responses.
> 
> First off I can safely say I don't know enough statistics to be
> dangerous, but hopefully I will get to that point:)
> 
> Regarding the goal - ultimately I would like to use linear regression
> (constrained for using linear regression at this point) for my data.  I
> thought the requirements for using linear regression was the following
> (I pulled this list from
> www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27
> _RegressionNCorrHypoTest.ppt):
> 
> The assumptions required for utilizing a regression equation are the
> same as the assumptions for the test of significance of a correlation
> coefficient.
> Both variables are interval level.
> Both variables are normally distributed.
> The relationship between the two variables is linear.
> The variance of the values of the dependent variable is uniform for all
> values of the independent variable (equality of variance).
> 
> Thus, I was going to attempt to (1) identify which distribution my data
> most closely represents, (2) translate my data so that it is normal,
> and (3) then use linear regression on the data.
> 
> However, if
> "The assumptions of most regression methods is that the *errors* need
> to have the desired relationship between means and variance, and not
> that the dependent variable be "normal". Many times the apparent non-
> normality will be "explained" or "captured" by the regression model."
> 
> Does this mean I can just "do" linear regression without translating my
> data and it will be okay?
> 
> Note that I was using "lm" from R to access the errors, however, I had
> not an opportunity to do much analysis of those results to determine if
> they are Gaussian or not.
> 
> I guess I am going to try to track down the following documents:
> (1) Statistical Distributions (Paperback)
> by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock
> (Author)
> # ISBN-10: 0471371246
> # ISBN-13: 978-0471371243
> 
> (2) Regression Modeling Strategies (Hardcover)
> by Frank E. Jr. Harrell (Author)
> # ISBN-10: 0387952322
> # ISBN-13: 978-0387952321
> 
> Maybe electronic versions of those documents are available.  My wife is
> already giving