IEEE Data Mining 2001: Final Call for Participation

2001-11-17 Thread Ning Zhong

[Apologies if you receive this more than once]

IEEE Data Mining 2001: Final Call for Participation
===

The 2001 IEEE International Conference on Data Mining
Doubletree Hotel, San Jose, California, USA
November 29 - December 2, 2001

On-line registration at
http://www.cs.uvm.edu/~xwu/icdm/reg-01.html

Hotel reservation information at
http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml

Conference program and other information at
http://www.cs.uvm.edu/~xwu/icdm-01.html

With the support of both world-renowned experts and new researchers
from the international data mining community, ICDM '01 has received an
overwhelming response compared to any other data mining related
conference this year: 365 paper submissions, 8 workshop proposals, and
29 tutorial proposals.

* Invited Speakers: 

  - Jerome H. Friedman, Stanford University, USA
  - Jim Gray, Microsoft Research, USA
(The 1999 Turing Award Winner)
  - Pat Langley, Institute for the Study of Learning and Expertise, USA
  - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA
(President, IEEE Computer Society)

* ICDM '01 Tutorials (November 29, 2001):

  - Text and Data Mining for Bioinformatics, by Hinrich Schuetze
([EMAIL PROTECTED])
  - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED])

* ICDM '01 Workshops (November 29, 2001):

  - Text Mining (TextDM '2001)
(http://www-ai.ijs.si/DunjaMladenic/TextDM01/)
  - Integrating Data Mining and Knowledge Management 
(http://cui.unige.ch/~hilario/icdm-01/cfp.html)

* Paper Presentations (November 30 - December 2, 2001): Out of 365
  paper submissions, the IEEE ICDM '01 Program Committee accepted 72
  papers for regular presentation, and an additional 39 papers for
  poster presentation.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Fwd: Re: diff in proportions

2001-11-17 Thread Rich Strauss

This is true.  I simulated the null distributions, those obtained when the
null hypothesis is true, which is what the centered t-distribution
represents.  I didn't look at the sampling distributions for different
effect sizes.

Date: Sat, 17 Nov 2001 00:19:06 -0600
From: jim clark [EMAIL PROTECTED]
Subject: Re: diff in proportions
Sender: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Organization: The University of Winnipeg
X-Authentication-warning: dex.pathlink.com: news set sender to
 [EMAIL PROTECTED] using -f
Original-recipient: rfc822;[EMAIL PROTECTED]

Hi

On 16 Nov 2001, Rich Strauss wrote:
 I've just done some quick simulations in Matlab, constructing randomized
 null distributions of the t-statistic under both scenarious: (1) sample
 variances based on sample means vs. (2) variances about the pooled mean.
 Assuming I've done everything correctly, the result is that the null
 distribution of the t-statistic in the second case consistently
 approximates the theoretical t-distribution more closely that that of the
 first case.  This seems to be true regardless of sample sizes and of
 whether the two sample sizes are identical or different.  This result
 implies that the t-statistic should indeed be calculated about a pooled
 estimate of the common mean, as Jerry Dallal suggested.
 
 I could pass on the details of my simulation if anyone is interested, but
 mostly I'd appreciate it if someone could repeat this simulation
 independently of mine to see whether it holds up.

This simply cannot be generally true.  It probably only applies
when the null is in fact true, which may be the case for your
simulations.  To appreciate the illogical nature of this
recommendation, consider creating a real difference of x between
your population means, then 2x, then 3x, and so on.  By the
common mean approach, you are treating the variability between
groups as though it were noise (i.e., a component in your
estimate of sigma^2, the variance about the null-hypothesis of
a common mean).  It is critical to keep in mind that the null
hypothesis is in fact just that, a hypothesis that may or may
not be correct.  Computing the within-group variance about the
group means is the correct way to estimate sigma^2, however,
irrespective of whether the Ho about the means is true or not.

Best wishes
Jim


James M. Clark (204) 786-9757
Department of Psychology   (204) 774-4134 Fax
University of Winnipeg 4L05D
Winnipeg, Manitoba  R3B 2E9[EMAIL PROTECTED]
CANADA http://www.uwinnipeg.ca/~clark




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=
 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Most Frequently Used Clustering Algorithm

2001-11-17 Thread Kurt Watzka

Chia C Chong [EMAIL PROTECTED] writes:

Hi!

I am new in this area..I wonder which clustering algorithm is the most
frequently used and maybe the most robust??

This question has may levels, ranging from the decision between agglomerative
and seed based methods, touching the choice of an appropriate measure of
diatance or similarity and terminating in the descision on a method to
form clusters.

I will try to answer the question of choosing one of the classic 
agglomerations method for agglomerative hierarchical cluster analysis.  
The choice may depend on what you are trying to achive. 

If you want to detect outliers, single linkage is the method of choice. 
Observations that are joined very late, and at rather high levels of 
dissimlarity are potential canditates for further inspection 
(probable outliers). The disadvantage of single linkage is that two
groups can be joined at an early stage if there is a single observation
that formas a bridge between them. 

The complete linkage method has a tendency to form small homogenous
clusters at an early stage, but because the distance between clusters
is defined as the dinstance between their most dissimilar members, 
clusters that are in fact quite similar can stay separate until 
quite a late stage of the agglomeration process. 

Ward's method will stress the demand for homogentiy within a cluster,
but it will probably not be your tool of choice if you are interested
in detecting sturctures in your data that go beyond mere within
cluster homogenity.

Average linkage will be computationaly expensive, with may or may not
be a point to take into consideration depending on the size of your
data set, but avoids some disadvantages of the other methods, depending
on what you are trying to achive.

Maybe the most important point to make about cluster analysis was made 
by Fowlkes et al. (1987, Variable selection in clustering and other contexts): 
In the murky area of cluster analysis, where there is so little guiding
theory, informal graphical approaches which can be used in a highly
interactive manner are not only very useful but perhaps even essential
for getting the job done.

There is no silver bullet for detecting clusters. The important thing
is to look at your results in connection with your data. A useful 
technique is to use a graphical display of your data to visualize and
evaluate different approaches to detect clusters.

Kurt

-- 
| Kurt Watzka 
| [EMAIL PROTECTED]


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



F distribution

2001-11-17 Thread Myles Gartland

In an F distribution, the critical value for the lower tail is the
reciprical of the the critical value of the upper tail (with the
degrees of freedom swithced).

Why? I understand how to calculate it, but do not get why the math
works.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



NJ Stat Conference: Deming Applied Statistics, Dec 10-13

2001-11-17 Thread Alfred Barron

  ANNOUNCING...

  The 57th Annual Deming Conference 
 on Applied Statistics
   Atlantic City, New Jersey
  December 10-13, 2001
  
 For details, registration costs, etc. see

  http://nimbus.ocis.temple.edu/~kghosh/deming01/

 The Program will include:
==
   Regression Modeling Strategies
   Professor Frank E. Harrell Jr. 
   University of Virginia
 
 • Modeling Variance and Covariance Structure 
   in Mixed Linear Models 
   Professor Ramon C. Littell 
   University of Florida 
 
1:00-4:00 
 • Bayesian Computation and its Application  
   to Non-linear Classification and Regression 
   Professor Bani K. Mallick
   Texas AM University

 • Analysis of Covariance: Repeated Measures 
   and Some Other Interesting Applications 
   Professor George A. Milliken 

   Statistical Methods for Clinical Trials
   Mark X. Norleans, M.D., Ph.D. 
   The National Cancer Institute
 
 • Experiments: Planning, Analysis and Parameter   
   Design Optimization 
   Professor Jeff Wu 
   University of Michigan 
 
1:00-4:00 
 • Sequential Clinical Trials: Design,
   Monitoring  Analysis 
   Vlad Dragalin, PhD 
   GlaxoSmithKline

 • Multiple Comparisons for Making Decisions 
   Professor Jason C. Hsu 
   Ohio State University  
 
   Simultaneous Monitoring and Adjustment 
   Professor J. Stuart Hunter 
   Princeton University 
 
 • Applied Logistic Regression 
   Professor Stanley A. Lemeshow 
   Ohio State University 
 
1:00-4:00 
 • Permutation Methods: A Distance 
   Function Approach 
   Professor Paul W. Mielke, Jr. 
   Colorado State University 

 • Approaches to the Analysis of Microarray Data 
   and Related Issues 
   Profs Elisabetta Manduchi and Warren Ewens 
   University of Pennsylvania  
 
 • Experimental Design and the Statistical Analysis 
   of Spotted Microarrays 
   Professor Kathleen Kerr 
   University of Washington 

 • Challenges Posed by the Human Genome Project 
   Professor Warren Gish 
   Washington University in St. Louis
 
   Measurement Error in Nonlinear Models 
   Professor David Ruppert 
   Cornell University 
 
 




__
Do You Yahoo!?
Find the one for you at Yahoo! Personals
http://personals.yahoo.com


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=