IEEE Data Mining 2001: Final Call for Participation
[Apologies if you receive this more than once] IEEE Data Mining 2001: Final Call for Participation === The 2001 IEEE International Conference on Data Mining Doubletree Hotel, San Jose, California, USA November 29 - December 2, 2001 On-line registration at http://www.cs.uvm.edu/~xwu/icdm/reg-01.html Hotel reservation information at http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml Conference program and other information at http://www.cs.uvm.edu/~xwu/icdm-01.html With the support of both world-renowned experts and new researchers from the international data mining community, ICDM '01 has received an overwhelming response compared to any other data mining related conference this year: 365 paper submissions, 8 workshop proposals, and 29 tutorial proposals. * Invited Speakers: - Jerome H. Friedman, Stanford University, USA - Jim Gray, Microsoft Research, USA (The 1999 Turing Award Winner) - Pat Langley, Institute for the Study of Learning and Expertise, USA - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA (President, IEEE Computer Society) * ICDM '01 Tutorials (November 29, 2001): - Text and Data Mining for Bioinformatics, by Hinrich Schuetze ([EMAIL PROTECTED]) - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED]) * ICDM '01 Workshops (November 29, 2001): - Text Mining (TextDM '2001) (http://www-ai.ijs.si/DunjaMladenic/TextDM01/) - Integrating Data Mining and Knowledge Management (http://cui.unige.ch/~hilario/icdm-01/cfp.html) * Paper Presentations (November 30 - December 2, 2001): Out of 365 paper submissions, the IEEE ICDM '01 Program Committee accepted 72 papers for regular presentation, and an additional 39 papers for poster presentation. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Fwd: Re: diff in proportions
This is true. I simulated the null distributions, those obtained when the null hypothesis is true, which is what the centered t-distribution represents. I didn't look at the sampling distributions for different effect sizes. Date: Sat, 17 Nov 2001 00:19:06 -0600 From: jim clark [EMAIL PROTECTED] Subject: Re: diff in proportions Sender: [EMAIL PROTECTED] X-Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Organization: The University of Winnipeg X-Authentication-warning: dex.pathlink.com: news set sender to [EMAIL PROTECTED] using -f Original-recipient: rfc822;[EMAIL PROTECTED] Hi On 16 Nov 2001, Rich Strauss wrote: I've just done some quick simulations in Matlab, constructing randomized null distributions of the t-statistic under both scenarious: (1) sample variances based on sample means vs. (2) variances about the pooled mean. Assuming I've done everything correctly, the result is that the null distribution of the t-statistic in the second case consistently approximates the theoretical t-distribution more closely that that of the first case. This seems to be true regardless of sample sizes and of whether the two sample sizes are identical or different. This result implies that the t-statistic should indeed be calculated about a pooled estimate of the common mean, as Jerry Dallal suggested. I could pass on the details of my simulation if anyone is interested, but mostly I'd appreciate it if someone could repeat this simulation independently of mine to see whether it holds up. This simply cannot be generally true. It probably only applies when the null is in fact true, which may be the case for your simulations. To appreciate the illogical nature of this recommendation, consider creating a real difference of x between your population means, then 2x, then 3x, and so on. By the common mean approach, you are treating the variability between groups as though it were noise (i.e., a component in your estimate of sigma^2, the variance about the null-hypothesis of a common mean). It is critical to keep in mind that the null hypothesis is in fact just that, a hypothesis that may or may not be correct. Computing the within-group variance about the group means is the correct way to estimate sigma^2, however, irrespective of whether the Ho about the means is true or not. Best wishes Jim James M. Clark (204) 786-9757 Department of Psychology (204) 774-4134 Fax University of Winnipeg 4L05D Winnipeg, Manitoba R3B 2E9[EMAIL PROTECTED] CANADA http://www.uwinnipeg.ca/~clark = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Most Frequently Used Clustering Algorithm
Chia C Chong [EMAIL PROTECTED] writes: Hi! I am new in this area..I wonder which clustering algorithm is the most frequently used and maybe the most robust?? This question has may levels, ranging from the decision between agglomerative and seed based methods, touching the choice of an appropriate measure of diatance or similarity and terminating in the descision on a method to form clusters. I will try to answer the question of choosing one of the classic agglomerations method for agglomerative hierarchical cluster analysis. The choice may depend on what you are trying to achive. If you want to detect outliers, single linkage is the method of choice. Observations that are joined very late, and at rather high levels of dissimlarity are potential canditates for further inspection (probable outliers). The disadvantage of single linkage is that two groups can be joined at an early stage if there is a single observation that formas a bridge between them. The complete linkage method has a tendency to form small homogenous clusters at an early stage, but because the distance between clusters is defined as the dinstance between their most dissimilar members, clusters that are in fact quite similar can stay separate until quite a late stage of the agglomeration process. Ward's method will stress the demand for homogentiy within a cluster, but it will probably not be your tool of choice if you are interested in detecting sturctures in your data that go beyond mere within cluster homogenity. Average linkage will be computationaly expensive, with may or may not be a point to take into consideration depending on the size of your data set, but avoids some disadvantages of the other methods, depending on what you are trying to achive. Maybe the most important point to make about cluster analysis was made by Fowlkes et al. (1987, Variable selection in clustering and other contexts): In the murky area of cluster analysis, where there is so little guiding theory, informal graphical approaches which can be used in a highly interactive manner are not only very useful but perhaps even essential for getting the job done. There is no silver bullet for detecting clusters. The important thing is to look at your results in connection with your data. A useful technique is to use a graphical display of your data to visualize and evaluate different approaches to detect clusters. Kurt -- | Kurt Watzka | [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
F distribution
In an F distribution, the critical value for the lower tail is the reciprical of the the critical value of the upper tail (with the degrees of freedom swithced). Why? I understand how to calculate it, but do not get why the math works. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
NJ Stat Conference: Deming Applied Statistics, Dec 10-13
ANNOUNCING... The 57th Annual Deming Conference on Applied Statistics Atlantic City, New Jersey December 10-13, 2001 For details, registration costs, etc. see http://nimbus.ocis.temple.edu/~kghosh/deming01/ The Program will include: == Regression Modeling Strategies Professor Frank E. Harrell Jr. University of Virginia Modeling Variance and Covariance Structure in Mixed Linear Models Professor Ramon C. Littell University of Florida 1:00-4:00 Bayesian Computation and its Application to Non-linear Classification and Regression Professor Bani K. Mallick Texas AM University Analysis of Covariance: Repeated Measures and Some Other Interesting Applications Professor George A. Milliken Statistical Methods for Clinical Trials Mark X. Norleans, M.D., Ph.D. The National Cancer Institute Experiments: Planning, Analysis and Parameter Design Optimization Professor Jeff Wu University of Michigan 1:00-4:00 Sequential Clinical Trials: Design, Monitoring Analysis Vlad Dragalin, PhD GlaxoSmithKline Multiple Comparisons for Making Decisions Professor Jason C. Hsu Ohio State University Simultaneous Monitoring and Adjustment Professor J. Stuart Hunter Princeton University Applied Logistic Regression Professor Stanley A. Lemeshow Ohio State University 1:00-4:00 Permutation Methods: A Distance Function Approach Professor Paul W. Mielke, Jr. Colorado State University Approaches to the Analysis of Microarray Data and Related Issues Profs Elisabetta Manduchi and Warren Ewens University of Pennsylvania Experimental Design and the Statistical Analysis of Spotted Microarrays Professor Kathleen Kerr University of Washington Challenges Posed by the Human Genome Project Professor Warren Gish Washington University in St. Louis Measurement Error in Nonlinear Models Professor David Ruppert Cornell University __ Do You Yahoo!? Find the one for you at Yahoo! Personals http://personals.yahoo.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =