Re: probability definition

2001-03-03 Thread Richard A. Beldin

This is a multi-part message in MIME format.
--FF841A0334127EDA335D19E4
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I'm glad to hear that somebody has his eye on the ball. Unfortunately, a
designation of a region like "western Puerto Rico" means so many
different things to so many different people, that I disbelieve its
utility. With the definition you quote, we should have a 100% chance of
precipitation almost every day.

--FF841A0334127EDA335D19E4
Content-Type: text/x-vcard; charset=us-ascii;
 name="rabeldin.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Richard A. Beldin
Content-Disposition: attachment;
 filename="rabeldin.vcf"

begin:vcard 
n:Beldin;Richard
tel;home:787-255-2142
x-mozilla-html:TRUE
url:netdial.caribe.net/~rabeldin/Home.html
org:BELDIN Consulting Services
version:2.1
email;internet:[EMAIL PROTECTED]
title:Professional Statistician (retired)
adr;quoted-printable:;;PO Box 716=0D=0A;BoquerĂ³n;PR;00622;
fn:Richard A. Beldin
end:vcard

--FF841A0334127EDA335D19E4--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fisher's z-transformation

2001-03-03 Thread Donald Burrill

On Sat, 3 Mar 2001, Arenson, Ethan wrote:

 Would someone please remind me the formula for Fisher's 
 z-transformation of correlation coefficients? 

Z = 0.5 log[(1 + r)/(1 - r)]   (using the natural logarithm).

Its standard error is   1/sqrt(n - 3)  ("sqrt" = "square root of").

To convert back:r = (exp(2Z) - 1)/(exp(2Z) + 1)
 ("exp(2Z)" is the natural antilogarithm of 2Z, aka  e to the power 2Z).

Equivalently,   Z = tanh(r)   and   r = inverse tanh(Z)
 ("tanh" = hyperbolic tangent).
-- Don.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



power,beta, etc.

2001-03-03 Thread dennis roberts

when we discuss things like power, beta, type I error, etc. ... we often
show a 2 by 2 table ... similar to

 null truenull false

retain   correct  type II, beta

reject   type I, alpha power


i think that we need a bit of overhaul to this typical way of doing things ... 

1. each cell needs to have a name ... label ... that reflects the
consequence of the decision (retain, reject) that was made

i propose something along the lines of

  null true null false

retaintype I correct, 1C type II error, 2E

rejecttype I error, 1E   type II correct, 2C


then, we have names or symbols for probabilities attached to each cell

   null true  null false

retain  WHAT NAME/SYMBOL FOR THIS??beta

reject  alpha  power


DOES ANYONE HAVE SOME SUGGESTION AS TO HOW THE UPPER LEFT CELL MIGHT BE
REFERRED TO via A SYMBOL??? OR, SOME NAME THAT IS DIFFERENT FROM POWER BUT
... STILL GIVES THE FLAVOR THAT A CORRECT DECISION HAS BEEN MADE (better
than making an error)?

2. i think it would be helpful to first identify each cell with a
distinctive label ... describing the decision (correct, error) and ... the
type ... 1 or 2

3. i think it would be helpful to have a system where there are names for
EACH cell (why should the poor upper left be "left" out in the cold??) ...
FIRST ... then some OTHER name/symbol for the probability associated with
that cell

confusions that might be avoided would be like:

a. saying type II error is the same as beta ... 
b. saying that power is NOT a name for a decision but, rather, THE
probability of making some particular decision

we have special names for errors of the first and second kind  type I
and type II ... and we have symbols of alpha and beta to represent their
associated probabilities

we have power which is supposed to be the probability of making a certain
kind of decision ... but, no special name for THAT cell like we have given
to differentiate the two kinds of errors one can make ...

any support out there to try to right this somewhat ambiguous ship? 
==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Final CFP: WI-2001 (Web Intelligence)

2001-03-03 Thread Ning Zhong

[Apologies if you receive this more than once]

---
 FINAL CALL FOR PAPERS: WI-2001
 The First Asia-Pacific Conference on Web Intelligence 

SPONSORED BY
 ACM SIGART
Maebashi Institute of Technology
---

  Maebashi TERRSA, Maebashi City, Japan
  October 23-26, 2001 
Home Page: http://kis.maebashi-it.ac.jp/wi01
  Mirror Page: http://cs.uregina.ca/~wi01/

 Paper Submission Deadline: March 20, 2001 
 ~ 

 IN COOPERATION WITH
ACM SIGCHI, ACM SIGWEB
  Japanese Society for Artificial Intelligence (JSAI)
  JSAI SIGFAI, JSAI SIGKBS, IEICE SIGKBSE
 
   CORPORATE SPONSORS
   Maebashi Convention Bureau
Maebashi City Government
   Gunma Prefecture Government
   The Japan Research Institute, Limited
  US AFOSR/AOARD and US Army Research Office in Far East

  WI-2001 will be jointly held with 
 The Second Asia-Pacific Conference on 
 Intelligent Agent Technology (IAT-2001)
  (One registration may attend both IAT-2001 and WI-2001) 
  ===

   WI-2001 and IAT-2001 Joint Keynote Speakers:
 Edward A. Feigenbaum (Turing Award Winner), Stanford University 
 Benjamin Wah (2001 IEEE CS President), U. Illinois at Urbana-Champaign

   WI-2001 Invited Speakers:
   James Hendler (DARPA/ISO, USA)
   W. Lewis Johnson (University of Southern California, USA)
   Riichiro Mizoguchi (Osaka University, Japan)
   Prabhakar Raghavan (Verity Inc., USA)
   Patrick S. P. Wang (Northeastern University, USA)

The 21st century is the age of Internet and World Wide Web. The Web
revolutionizes the way we gather, process, and use information. At the
same time, it also redefines the meanings and processes of business,
commerce, marketing, finance, publishing, education, research,
development, as well as other aspects of our daily life.  Although
individual Web-based information systems are constantly being
deployed, advanced issues and techniques for developing and for
benefiting from Web intelligence still remain to be systematically
studied.

Broadly speaking, Web Intelligence (WI) exploits AI and advanced
information technology on the Web and Internet.  It is the key and the
most urgent research field of IT for business intelligence.

The Asia-Pacific Conference on Web Intelligence (WI) is an
international forum for researchers and practitioners

(1) to present the state-of-the-art in the development of Web intelligence; 
(2) to examine performance characteristics of various approaches in 
Web-based intelligent information technology;
(3) to cross-fertilize ideas on the development of 
Web-based intelligent information systems among different domains. 

By idea-sharing and discussions on the underlying foundations and the
enabling technologies of Web intelligence, WI-2001 is expected to
stimulate the future development of new models, new methodologies, and
new tools for building a variety of embodiments of Web-based
intelligent information systems.

The Asia-Pacific Conference on Web Intelligence (WI) is
a high-quality, high-impact biennial conference series.
It will be jointly held with the Asia-Pacific Conference on 
Intelligent Agent Technology (IAT).

TOPICS
==
WI-2001 welcomes submissions of original papers. The technical issues to 
be addressed include, but not limited to:
 
* Web-Based Applications:
  - Business Intelligence
  - Computational Societies and Markets
  - Conversational Systems
  - Customer Relationship Management (CRM)
  - Direct Marketing
  - Electronic Commerce and Electronic Business
  - Electronic Library
  - Information Markets
  - Price Dynamics and Pricing Algorithms
  - Measuring and Analyzing Web Merchandising
  - Web-Based Decision Support Systems
  - Web-Based Distributed Information Systems
  - Web-Based EDI
  - Web-Based Learning Systems
  - Web Marketing
  - Web Publishing

* Web Human-Media Engineering:
  - Art of Web Page Design
  - Multimedia Information Representation
  - Multimedia Information Processing
  - Visualization of Web Information
  - Web-Based Human Computer Interface

* Web Information Management:
  - Data Quality Management
  - Information Transformation
  - Internet and Web-Based Data Management
  - Multi-Dimensional Web Databases and OLAP
  - Multimedia Information Management
  - New Data Models for the Web
  - Object Oriented Web Information Management
  - Personalized Information Management
  - Semi-Structured Data Management
  - Use and Management of Metadata 
  - Web Knowledge Management
  - Web Page Automatic Generation and Updating
  - Web Security, Integrity, Privacy and Trust

* Web Information Retrieval:
  - Approximate Retrieval
  - Conceptual Information Extraction
  - 

Re: basic stats question

2001-03-03 Thread Milo Schield

But what does this (in)dependence really mean?
Can it change on conditioning?
Suppose that we take into account a plausible confounder: defective
equipment.  Suppose blacks are more likely to have "defective equipment
(broken light, etc.).  Suppose we find that percentage who are black  among
those stopped for defective equipment is the same as the percentage who are
black among those having defective equipment.  Now we have independence at
one level and non-independence at another.

This seems related to Simpson's paradox.
In any event, it seems that independence can be conditional.
Is this so?  If so, where is this discussed in more detail?
"Lise DeShea" [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
Re probability/independence, I've found that the most effective way to
communicate this concept to my students (College of Education, not heavily
math-oriented) is the following:
SNIP
Then you can move to an example of racial profiling.  Out of all the people
in your city who  drive, what proportion are African-American?
[p(African-American).] Now, GIVEN that you look only at drivers who are
pulled over, what proportion of these people are African American?
[p(African-American|pulled over).]  If being black and being pulled over are
independent events, then the probabilities should be equal.

You can illustrate this graphically by drawing a  large box to represent all
the drivers, then mark the proportion representing African-American drivers.
Then draw a smaller box representing the people being pulled over, with a
proportion of the box marked to represent the African-American drivers who
are pulled over.  If the proportions of each box are equal, then the events
are independent.

So now,  I would welcome comments from the more mathematically/statistically
rigorous list members among us!

~~~
Lise DeShea, Ph.D.
Assistant Professor
Educational and Counseling Psychology Department
University of Kentucky
245 Dickey Hall
Lexington KY 40506
Email:  [EMAIL PROTECTED]
Phone:  (859) 257-9884







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Trend analysis question

2001-03-03 Thread Donald Burrill

On Sun, 4 Mar 2001, Philip Cozzolino wrote in part:

 However, after the cubic non-significant finding, the 4th and 5th 
 order trends are significant. 
 
 Intuitively, it seems that if there is no cubic trend of significance, 
 there will not be any higher order trend, but this is relatively new 
 to me.
Your intuition is, in this case, incorrect.  The five 
trends are mutually independent in the sense that any combination of them 
may be operating.  (I am for the moment accepting the implied premise 
that a power function of the IV is a reasonable function to try to fit to 
your data.  In most instances I know of, this is not "really" the case, 
and the power function is more usefully thought of as an approximation 
to whatever the "real" functionality is.)  This may be seen by 
considering the following relationships between Y and X (think of them as 
DV and IV if you wish):

I. +   * *
   -*   *
   Y   -
   -*   *
   -
   + *  *
   -
   -   *  *
   - *
   -
   +-+-+-+-+-+-  X

II.+   *
   -  * **
   -
Y  -  **   *
   -
   +   * *   *
   -
   - *  * *
   -
   -   * *
   +-+-+-+-+-+-  X

In I. above, the linear trend is approximately zero, and the quadratic 
component of X accounts for nearly all the variation in Y.  A "rule" 
that claimed "If the linear trend is insignificant there can be no 
significant quadratic trend" is clearly false in this case.
 In II. above, both the linear and quadratic components of trend are 
virtually zero -- certainly insignificant -- and the cubic component 
accounts for nearly all the varition in Y.  Similar situations can be 
imagined, where only the quartic, or only the quintic, or only the 
linear, quadratic, and quartic, or any other arbitrary combination of 
the basic trends are significant, and other components are not.

If you are carrying out your trend analysis by using orthogonal 
polynomials (as you probably should be), try constructing the model 
derived from your linear + quadratic fit only, and plot those as 
predicted values against X;  then construct the model derived from linear 
+ quadratic + quartic + quintic, and plot those predicted values against 
X.  You may find it illuminating also to plot the residuals in each case 
against X, especially if you force the same vertical scale on the two 
sets of residuals.

I note in passing that you haven't stated how much of the variance of Y 
is accounted for by each of the significant components, nor how much 
residual variance there is after each component is entered.  That also 
might be illuminating.
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: power,beta, etc.

2001-03-03 Thread Donald Burrill

On Sat, 3 Mar 2001, dennis roberts wrote:

 when we discuss things like power, beta, type I error, etc. ... we 
 often show a 2 by 2 table ... similar to

  null truenull false

 retain   correct  type II, beta

 reject   type I, alpha power

Similar, but not the same.
I usually present a table   correcterror:  Type II
of "states of affairs", 
without probabilities;  error:  Type I correct
see table at right.
(And usually with the rows interchanged, so that "Type I error" LOOKS 
like the first kind of error one encounters.)  It seems to me that to 
include the probabilities in the same 2x2 table as the "states of 
affairs" would be actively to invite rampant (or at least, and more 
alliteratively, couchant) confusion of the concepts.

I have another problem with writing "power" in the lower right cell, 
apart from the fact that it's a probability and not a state of affairs. 
I'm aware that many people think of power as a conditional probability 
(of rejecting the null when it's false);  but I came to understand it as 
an UNconditional probability (of rejecting the null, period).  This 
definition permits drawing power curves that include the parameter value 
specified by the null hypothesis:  the power at that point (or, in that 
case) is alpha.  For a symmetric two-sided alternative, this is also the 
minimum value of power.  Since the value of power approaches alpha as the 
parameter value approaches the value specified in the null hypothesis, it 
seems a little silly to omit that one point from the continuous curve.

 i think that we need a bit of overhaul to this typical way of doing 
 things ... 

 1. each cell needs to have a name ... label ... that reflects the
 consequence of the decision (retain, reject) that was made

 i propose something along the lines of

   null true  null false

 retaintype I correct, 1C  type II error, 2E

 rejecttype I error, 1Etype II correct, 2C

I've long been persuaded of the need to distinguish between the two 
different kinds of errors.  That there are two distinct kinds is not at 
all obvious, evidently;  some folks seem never to master the distinction. 
But I am not convinced that we need to distinguish between two kinds of 
correct decision.  After all, the decisions themselves are different:  
to reject, or to retain (though some folks prefer "accept" to "retain"). 
Knowing the decision, and that it is (at least hypothetically) correct, 
is surely all one needs to know.  "Correct rejection" or "correct 
retention" (or "acceptance") of the hypothesis being tested seems to me 
easier to handle and apprehend than "a Type I correct decision" or "a 
Type II correct decision".

 then, we have names or symbols for probabilities attached to each cell

null true  null false

 retain  WHAT NAME/SYMBOL FOR THIS??beta

 reject  alpha  power

If you want to construct such a table, I'd recommend including the 
marginal row, showing the column totals to be 1 (or, if one prefers, 
100%).  That helps to emphasize the conditional nature of the 
probabilities being displayed:  conditional on the state of nature, not 
on the decision.  And consistent with my understanding of power, I'd 
present such a table thus:

   State of nature
 null true null false

P{retain}1 - alpha   beta

Power alpha1 - beta
-- 
 Total 1  1

Sometime along about now one really ought to point out that a 2x2 table 
like this is grossly oversimplified.  Beta (and therefore power) cannot 
be evaluated for "null false".  It can be evaluated only for a specified 
particular value of the parameter that is different from the value 
specified in the null hypothesis.  And, ceteris paribus, the farther that 
parameter value is from the null-hypothetical value, the smaller is beta 
(and the larger is power).  This leads more or less directly to the idea 
of a power curve, and then to the variations in such a curve as a 
function of alpha and sample size.

 DOES ANYONE HAVE SOME SUGGESTION AS TO HOW THE UPPER LEFT CELL MIGHT BE 
 REFERRED TO via A SYMBOL??? OR, SOME NAME THAT IS DIFFERENT FROM POWER 
 BUT ... STILL GIVES THE FLAVOR THAT A CORRECT DECISION HAS BEEN MADE 
 (better than making an error)?

Do you have a reasoned objection to "1 - alpha"?  In other contexts we 
routinely use, e.g., "1 - Rsq" for the proportion of variance unexplained 
by the model being considered.  The "1 minus" construction shows the 
logical and arithmetical connection between two quantities, which can 
easily get lost if one uses very different-looking terms for those 
quantities.

 2. i think it would be helpful to first identify each cell with a