CFP: IEEE Data Mining 2002 (new)
[Apologies if you receive this more than once] !!! NOTE: The Conference Date Changed to December 9-12, 2002 !!! - ICDM '02: The 2002 IEEE International Conference on Data Mining Sponsored by the IEEE Computer Society - Maebashi TERRSA, Maebashi City, Japan December 9 - 12, 2002 Home Page: http://kis.maebashi-it.ac.jp/icdm02 Mirror Page: http://www.wi-lab.com/icdm02 CORPORATE SPONSORS: AdIn Research, Inc. The Japan Research Institute, Limited Maebashi Convention Bureau Maebashi City Government Gunma Prefecture Government Maebashi Institute of Technology US AOARD, AROFE Call for Papers *** The 2002 IEEE International Conference on Data Mining (IEEE ICDM '02) provides a leading international forum for the sharing of original research results and practical development experiences among researchers and application developers from different data mining related areas such as machine learning, automated scientific discovery, statistics, pattern recognition, knowledge acquisition, soft computing, databases and data warehousing, data visualization, and knowledge-based systems. The conference seeks solutions to challenging problems facing the development of data mining systems, and shapes future directions of research by promoting high quality, novel and daring research findings. As an important part of the conference, the workshops program will focus on new research challenges and initiatives. Topics of Interest == Topics related to the design, analysis and implementation of data mining theory, systems and applications are of interest. These include, but are not limited to the following areas: - Foundations and principles of data mining - Data mining algorithms and methods in traditional areas (such as classification, clustering, probabilistic modeling, and association analysis), and in new areas - Data and knowledge representation for data mining - Modeling of structured, textual, temporal, spatial, multimedia and Web data to support data mining - Complexity, efficiency, and scalability issues in data mining - Data pre-processing, data reduction, feature selection and feature transformation - Statistics and probability in large-scale data mining - Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining - Integration of data warehousing, OLAP and data mining - Man-machine interaction in data mining and visual data mining - Artificial intelligence contributions to data mining - High performance and distributed data mining - Machine learning, pattern recognition and automated scientific discovery - Quality assessment and interestingness metrics of data mining results - Process centric data mining and models of data mining process - Security and social impact of data mining - Emerging data mining applications, such as electronic commerce, bioinformatics, Web intelligence, and intelligent learning database systems Conference Publications and ICDM Best Paper Awards == High quality papers in all data mining areas are solicited. Papers exploring new directions will receive a careful and supportive review. There are two different types of paper submission for IEEE ICDM '02: (1) main track submissions and (2) industry track submissions. For the main track submission, all submitted papers should be limited to a maximum of 6,000 words (approximately 20 A4 pages), and will be reviewed by the Program Committee on the basis of technical quality, relevance to data mining, originality, significance, and clarity. Accepted papers will be published in the conference proceedings by the IEEE Computer Society Press. All main track paper submissions will be handled electronically. Please use the Submission Form (for main track) at the ICDM '02 webpage: http://kis.maebashi-it.ac.jp/icdm02 to submit your paper (the due date is June 5, 2002). For the industry track submission, please first check the following conditions before your submission: (a) At least one author of each industry track paper should be from a company (rather than a university), and the paper should be about industrial or other real-world applications of data mining. (b) The authors accepted as industry track papers need both oral presentations and system demos at the conference. All papers submitted to the Industry Track will be reviewed by the mini Industry Track Program Committee, and each acc
ANN: New Online Master of Science in Data Mining at CCSU
CCSU Launches Online Master of Science in Data Mining Central Connecticut State University (CCSU) announces the launching of an online Master of Science program in Data Mining, the first such program to be offered online. Data mining is the search for interesting patterns and trends in large databases using statistical methods. The MIT Technology Review chose data mining as one of ten emerging technologies that will change the world. Data mining expertise is the most sought after among information technology professionals, according to the 1999 Information Week National Salary Survey. In a 2001 KDNuggets survey, 27% of data mining professionals earned more than $100,000 (US) annually. All courses in the data mining MS program are offered online. This means that class is as close as your computer, whether you live in Beijing, New York, Singapore, or Canton. Further, all courses are asynchronous, meaning that students can work when they want to work, whether at 3:00 in the afternoon, or 3:00 in the morning. The 33-credit program, which can be completed in two years, consists of courses in data mining, artificial intelligence, statistical analysis, and computer science. The MS in data mining is fully licensed by the State of Connecticut Department of Higher Education. The program stresses the solution of real-world problems, using applications and case studies, while gaining a deep appreciation of the underlying models. These applications include customer relationship management, credit-card fraud, and profit/cost optimization. Students will apply methodologies such as decision trees, market basket analysis, neural networks, association rules, and cluster detection. Students will gain strong exposure to state-of-the-art software such as the Clementine data mining suite from SPSS. Courses available online, starting in January, include Introduction to Data Mining, Data Mining Methods, Linear Models, Foundations of Computer Science, Database Concepts, and Mathematical Statistics II. Some prerequisite courses are also offered online. To register for these courses, proceed to OnlineCSU at http://onlinecsu.ctstateu.edu/. For more information about the data mining program, including how to apply, please visit www.ccsu.edu/datamining, or contact Program Director Daniel T. Larose, Ph.D. at [EMAIL PROTECTED] or 860-832-2862. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
data mining course- feb 28- Palo Alto
Short course: Statistical learning and data mining Trevor Hastie and Robert Tibshirani, Stanford Univ. Sheraton Hotel Palo Alto, Ca., Feb 28- Mar 1, 2002 This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics and other high tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. This sequel to our popular Modern Regression and Classification course covers many new areas of unsupervised learning and data mining, and gives an in-depth treatment of some of the hottest tools in supervised learning. The first course is not a pre-requisite for this new course. Day one focusses on state-of-art methods for supervised learning including PRIM, boosting and support vector machines. Day two covers unsupervised learning including clustering, principal components, principal curves and self-organizing maps. Many applications will be discussed, including DNA expression arrays. These are one of the hottest new areas in biology! ### Much of the material is based on the new book: Elements of Statistical Learning: data mining, inference and prediction (Hastie, Tibshirani & Friedman, Springer -Verlag, 2001). A copy of this book will be given to all attendees. ### go to the site http://www-stat.stanford.edu/~hastie/mrc.html for more information and online registration. Please Email me if you have specific questions ([EMAIL PROTECTED]). -- ** Rob Tibshirani, Dept of Health Research & Policy and Dept of Statistics HRP Redwood Bldg Stanford University Stanford, California 94305-5405 phone: HRP: 650-723-7264 (Voice mail), Statistics 650-723-1185 FAX 650-725-8977 [EMAIL PROTECTED] http://www-stat.stanford.edu/~tibs = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
IEEE Data Mining 2001: Final Call for Participation
[Apologies if you receive this more than once] IEEE Data Mining 2001: Final Call for Participation === The 2001 IEEE International Conference on Data Mining Doubletree Hotel, San Jose, California, USA November 29 - December 2, 2001 On-line registration at http://www.cs.uvm.edu/~xwu/icdm/reg-01.html Hotel reservation information at http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml Conference program and other information at http://www.cs.uvm.edu/~xwu/icdm-01.html With the support of both world-renowned experts and new researchers from the international data mining community, ICDM '01 has received an overwhelming response compared to any other data mining related conference this year: 365 paper submissions, 8 workshop proposals, and 29 tutorial proposals. * Invited Speakers: - Jerome H. Friedman, Stanford University, USA - Jim Gray, Microsoft Research, USA (The 1999 Turing Award Winner) - Pat Langley, Institute for the Study of Learning and Expertise, USA - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA (President, IEEE Computer Society) * ICDM '01 Tutorials (November 29, 2001): - Text and Data Mining for Bioinformatics, by Hinrich Schuetze ([EMAIL PROTECTED]) - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED]) * ICDM '01 Workshops (November 29, 2001): - Text Mining (TextDM '2001) (http://www-ai.ijs.si/DunjaMladenic/TextDM01/) - Integrating Data Mining and Knowledge Management (http://cui.unige.ch/~hilario/icdm-01/cfp.html) * Paper Presentations (November 30 - December 2, 2001): Out of 365 paper submissions, the IEEE ICDM '01 Program Committee accepted 72 papers for regular presentation, and an additional 39 papers for poster presentation. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
ANN: New Online Master of Science in Data Mining
CCSU Launches Online Master of Science in Data Mining Central Connecticut State University (CCSU) announces the launching of an online Master of Science program in Data Mining, the first such program to be offered online. Data mining is the search for interesting patterns and trends in large databases using statistical methods. The MIT Technology Review chose data mining as one of ten emerging technologies that will change the world. Data mining expertise is the most sought after among information technology professionals, according to the 1999 Information Week National Salary Survey. In a 2001 KDNuggets survey, 27% of data mining professionals earned more than $100,000 (US) annually. All courses in the data mining MS program are offered online. This means that class is as close as your computer, whether you live in Beijing, New York, Singapore, or Canton. Further, all courses are asynchronous, meaning that students can work when they want to work, whether at 3:00 in the afternoon, or 3:00 in the morning. The 33-credit program, which can be completed in two years, consists of courses in data mining, artificial intelligence, statistical analysis, and computer science. The MS in data mining is fully licensed by the State of Connecticut Department of Higher Education. The program stresses the solution of real-world problems, using applications and case studies, while gaining a deep appreciation of the underlying models. These applications include customer relationship management, credit-card fraud, and profit/cost optimization. Students will apply methodologies such as decision trees, market basket analysis, neural networks, association rules, and cluster detection. Students will gain strong exposure to state-of-the-art software such as the Clementine data mining suite from SPSS. Courses available online, starting in January, include Introduction to Data Mining, Data Mining Methods, Linear Models, Foundations of Computer Science, Database Concepts, and Mathematical Statistics II. Some prerequisite courses are also offered online. To register for these courses, proceed to OnlineCSU at http://onlinecsu.ctstateu.edu/. For more information about the data mining program, including how to apply, please visit www.ccsu.edu/datamining, or contact Program Director Daniel T. Larose, Ph.D. at [EMAIL PROTECTED] or 860-832-2862. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
ANN: Book: Principles of Data Mining
I thought readers of sci.stat.edu might be interested in this book. For more information please visit http://mitpress.mit.edu/026208290X Principles of Data Mining David J. Hand, Heikki Mannila, and Padhraic Smyth The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing. David J. Hand is Professor of Statistics, Department of Mathematics, Imperial College, London. Heikki Mannila is Research Fellow at Nokia Research Center and Professor, Department of Computer Science and Engineering, Helsinki University of Technology. Padhraic Smyth is Associate Professor, Department of Information and Computer Science, the University of California, Irvine. 8 x 9, 425 pp. cloth ISBN 0-262-08290-X Adaptive Computation and Machine Learning series A Bradford Book = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
CFP: IEEE Data Mining 2002
[Apologies if you receive this more than once] - ICDM '02: The 2002 IEEE International Conference on Data Mining Sponsored by the IEEE Computer Society -- Maebashi TERRSA, Maebashi City, Japan November 26 - 29, 2002 Home Page: http://kis.maebashi-it.ac.jp/icdm02 Mirror Page: http://www.wi-lab.com/icdm02 Call for Papers *** The 2002 IEEE International Conference on Data Mining (IEEE ICDM '02) provides a leading international forum for the sharing of original research results and practical development experiences among researchers and application developers from different data mining related areas such as machine learning, automated scientific discovery, statistics, pattern recognition, knowledge acquisition, soft computing, databases and data warehousing, data visualization, and knowledge-based systems. The conference seeks solutions to challenging problems facing the development of data mining systems, and shapes future directions of research by promoting high quality, novel and daring research findings. As an important part of the conference, the workshops program will focus on new research challenges and initiatives. Topics of Interest == Topics related to the design, analysis and implementation of data mining theory, systems and applications are of interest. These include, but are not limited to the following areas: - Foundations and principles of data mining - Data mining algorithms and methods in traditional areas (such as classification, clustering, probabilistic modeling, and association analysis), and in new areas - Data and knowledge representation for data mining - Modeling of structured, textual, temporal, spatial, multimedia and Web data to support data mining - Complexity, efficiency, and scalability issues in data mining - Data pre-processing, data reduction, feature selection and feature transformation - Statistics and probability in large-scale data mining - Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining - Integration of data warehousing, OLAP and data mining - Man-machine interaction in data mining and visual data mining - Artificial intelligence contributions to data mining - High performance and distributed data mining - Machine learning, pattern recognition and automated scientific discovery - Quality assessment and interestingness metrics of data mining results - Process centric data mining and models of data mining process - Security and social impact of data mining - Emerging data mining applications, such as electronic commerce, bioinformatics, Web mining and intelligent learning database systems Conference Publications and ICDM Best Paper Awards == High quality papers in all data mining areas are solicited. Papers exploring new directions will receive a careful and supportive review. There are two different types of paper submission for IEEE ICDM '02: (1) main track submissions and (2) industry track submissions. All submitted papers should be limited to a maximum of 6,000 words (approximately 20 A4 pages), and will be reviewed on the basis of technical quality, relevance to data mining, originality, significance, and clarity. Accepted papers will be published in the conference proceedings by the IEEE Computer Society Press. A selected number of IEEE ICDM '02 accepted papers will be expanded and revised for possible inclusion in the Knowledge and Information Systems journal (http://kais.mines.edu/~kais/) by Springer-Verlag. IEEE ICDM Best Paper Awards will be conferred on the authors of the best papers at the conference. Important Dates === June 5, 2002 Main track paper submissions Industry track paper submissions June 30, 2002 Tutorial submissions Panel submissions Workshop proposals August 9, 2002 Paper acceptance notices September 2, 2002 Final camera-readies November 26-29, 2002 Conference All paper submissions will be handled electronically. Detailed instructions are provided on the conference home page at http://kis.maebashi-it.ac.jp/icdm02 and http://www.wi-lab.com/icdm02 Honorary Chair: === Setsuo Ohsuga, Waseda University, Japan Conference Chairs: == Ning Zhong, Maebashi Institute of Technology, Japan ([EMAIL PROTECTED]) Philip S. Yu, IBM T.J. Watson Research Center, USA ([EMAIL PROTECTED]) Program Committee Chairs: = Vipin
IEEE Data Mining 2001: Call for Participation
[Apologies if you receive this more than once] IEEE Data Mining 2001: Call for Participation = The 2001 IEEE International Conference on Data Mining Doubletree Hotel, San Jose, California, USA November 29 - December 2, 2001 * On-line registration (and other information) at http://www.cs.uvm.edu/~xwu/icdm-01.html (Register by November 6 to save $100!) * Be sure to book hotel rooms by November 7 for discounted rates! (http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml) The 2001 IEEE International Conference on Data Mining (ICDM '01) provides a forum for the sharing of original research results and practical development experiences among researchers and application developers from different data mining related areas such as machine learning, automated scientific discovery, statistics, pattern recognition, knowledge acquisition, soft computing, databases and data warehousing, data visualization, and knowledge-based systems. The conference seeks solutions to challenging problems facing the development of data mining systems, and shapes future directions of research by promoting high quality, novel and daring research findings. As an important part of the conference, the workshops program will focus on new research challenges and initiatives. With the support of both world-renowned experts and new researchers from the international data mining community, ICDM '01 has received an overwhelming response compared to any other data mining related conference this year: 365 paper submissions, 8 workshop proposals, and 29 tutorial proposals. * Invited Speakers: - Jerome H. Friedman, Stanford University, USA - Jim Gray, Microsoft Research, USA (The 1999 Turing Award Winner) - Pat Langley, Institute for the Study of Learning and Expertise, USA - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA (President, IEEE Computer Society) * ICDM '01 Tutorials (November 29, 2001): - Text and Data Mining for Bioinformatics, by Hinrich Schuetze ([EMAIL PROTECTED]) - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED]) * ICDM '01 Workshops (November 29, 2001): - Text Mining (TextDM '2001) (http://www-ai.ijs.si/DunjaMladenic/TextDM01/) - Integrating Data Mining and Knowledge Management (http://cui.unige.ch/~hilario/icdm-01/cfp.html) * Paper Presentations (November 30 - December 2, 2001): Out of 365 paper submissions, the IEEE ICDM '01 Program Committee accepted 72 papers for regular presentation, and an additional 37 papers for poster presentation. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
IEEE Data Mining 2001: Call for Participation
[Apologies if you receive this more than once] IEEE Data Mining 2001: Call for Participation = The 2001 IEEE International Conference on Data Mining Doubletree Hotel, San Jose, California, USA November 29 - December 2, 2001 On-line registration (and other information) at http://www.cs.uvm.edu/~xwu/icdm-01.html The 2001 IEEE International Conference on Data Mining (ICDM '01) provides a forum for the sharing of original research results and practical development experiences among researchers and application developers from different data mining related areas such as machine learning, automated scientific discovery, statistics, pattern recognition, knowledge acquisition, soft computing, databases and data warehousing, data visualization, and knowledge-based systems. The conference seeks solutions to challenging problems facing the development of data mining systems, and shapes future directions of research by promoting high quality, novel and daring research findings. As an important part of the conference, the workshops program will focus on new research challenges and initiatives. With the support of both world-renowned experts and new researchers from the international data mining community, ICDM '01 has received an overwhelming response compared to any other data mining related conference this year: 365 paper submissions, 8 workshop proposals, and 29 tutorial proposals. * Invited Speakers: - Jerome H. Friedman, Stanford University, USA - Jim Gray, Microsoft Research, USA (The 1999 Turing Award Winner) - Pat Langley, Institute for the Study of Learning and Expertise, USA - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA (President, IEEE Computer Society) * ICDM '01 Tutorials (November 29, 2001): - Text and Data Mining for Bioinformatics, by Hinrich Schuetze ([EMAIL PROTECTED]) - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED]) * ICDM '01 Workshops (November 29, 2001): - Text Mining (TextDM '2001) (http://www-ai.ijs.si/DunjaMladenic/TextDM01/) - Integrating Data Mining and Knowledge Management (http://cui.unige.ch/~hilario/icdm-01/cfp.html) * Paper Presentations (November 30 - December 2, 2001): Out of 365 paper submissions, the IEEE ICDM '01 Program Committee accepted 72 papers for regular presentation, and an additional 39 papers for poster presentation. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fundamental differences between Statistics and Data Mining?
T.S. Lim wrote: > I'm attempting to compile an online list of the fundamental differences > between our field Statistics and Data Mining. Several online references > that touch on the topic include > >http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps >http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand >http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila > > Let me know your point of view or opinion. Thanks much. > Can I add the magnificent Greater and Lesser Statistics: A Choice for Future Research J. M. Chambers from http://www.wavelet.org/who/jmc/pub.html I find it almost unbearably sad. It certainly suggests that, while the statistical community might have the knowledge and skills to address `data mining' style problems, its value-system makes it unwilling to do so -- or to value the work if it is attempted. There are af course many honourable exceptions, including Chambers himself. Peter = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fundamental differences between Statistics and Data Mining?
Here are two other sources that may be relevant: "Putting Data Minig in its Place" by D. Pyle (used to be at http://www.vldb.com/articles/Pyle/pyle.html; can't access it at the moment) "Data Mining from a Statistical Perspective" by J. Maindonald (http://wwwmaths.anu.edu.au/~johnm/dm/dmpaper.html) As a user of statistical and/or other DM methods at best, rather than providing an amateur opinion, I can only thank you for the references you have provided. Gaj Vidmar Univ. of Ljubljana, Dept. of Psychology = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fundamental differences between Statistics and Data Mining?
In article <8v087k$tm5$[EMAIL PROTECTED]>, T.S. Lim <[EMAIL PROTECTED]> wrote: > I'm attempting to compile an online list of the fundamental differences > between our field Statistics and Data Mining. Several online references > that touch on the topic include > >http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps >http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand >http://www.acm.org/sigkdd/explorations/issue1- 2/contents.htm#mannila > > Let me know your point of view or opinion. Thanks much. More references have been posted at http://www.recursive-partitioning.com/dcforum/DCForumID4/2.html -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fundamental differences between Statistics and Data Mining?
>> I'm attempting to compile an online list of the fundamental differences >> between our field Statistics and Data Mining. Several online references >> that touch on the topic include It's very simple. Data Mining is everything they taught you _not_ do do when you took statistics. -- --(Signature) Robert M. Hamer 732 235 4218 Use my last name @rci.rutgers.edu "Mit der Dummheit kaempfen Goetter selbst vergebens" -- Schiller = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Fundamental differences between Statistics and Data Mining?
This is a multi-part message in MIME format. --7016062B1E244333164619B2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, my opinion is that datamining is just a marketing name, because datamining techniques are a part of statistics. May be an exception to this is neural networks, but I believe that good neural networks use also statistics. Francois. "T.S. Lim" wrote: > I'm attempting to compile an online list of the fundamental differences > between our field Statistics and Data Mining. Several online references > that touch on the topic include > >http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps >http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand >http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila > > Let me know your point of view or opinion. Thanks much. > > -- > T.S. Lim > [EMAIL PROTECTED] > www.Recursive-Partitioning.com > _ > Get paid to write reviews! http://recursive-partitioning.epinions.com > > Sent via Deja.com http://www.deja.com/ > Before you buy. --7016062B1E244333164619B2 Content-Type: text/x-vcard; charset=us-ascii; name="francois.bergeret.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Francois Bergeret Content-Disposition: attachment; filename="francois.bergeret.vcf" begin:vcard n:Bergeret;Francois tel;work:33-561191205 x-mozilla-html:FALSE org:Motorola;Device Engineering, MOS20 adr:;; version:2.1 email;internet:[EMAIL PROTECTED] title:Statistician and Six Sigma Black Belt x-mozilla-cpt:;-28000 fn:Bergeret, Francois end:vcard --7016062B1E244333164619B2-- = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Fundamental differences between Statistics and Data Mining?
I'm attempting to compile an online list of the fundamental differences between our field Statistics and Data Mining. Several online references that touch on the topic include http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila Let me know your point of view or opinion. Thanks much. -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Data Mining
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Kuldeep Kumar) wrote: > Colleagues > I am looking for some data base dealing with patient records for any > disease preferably diabetes or cancer. This is basically for exercise in > statistical modelling to see which factors are significant and to classify > whether the patient has the disease or not. Any other data base where > logistic model can be applied will be also useful. I am sure this kind of > data will be available somewhere on the web. Any help will be appreciated. > Thanks. > Deep Visit the "Data Sets" section of http://www.kdcentral.com -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Data Mining
Colleagues I am looking for some data base dealing with patient records for any disease preferably diabetes or cancer. This is basically for exercise in statistical modelling to see which factors are significant and to classify whether the patient has the disease or not. Any other data base where logistic model can be applied will be also useful. I am sure this kind of data will be available somewhere on the web. Any help will be appreciated. Thanks. Deep = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: data mining
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Richard M. Barton) wrote: > I have been trying to search for reviews of data mining software (e.g., > MineSet, Clementine) with little success. In the past, some of you have had > recommendations/advice about stat packages; I wonder if you might share your > views on data mining: Specifically, > > 1) Any feelings (+ or -) on data mining in general? > > 2) Any views (+ or -) on available software? > > 3) Any suggestions on where else I might look for info? > > Thanks for your help. > > rick > > Richard Barton, Statistical Consultant > > Dartmouth College > > Peter Kiewit Computing Services > > 6224 Baker/Berry > > Hanover, NH 03755 > > (603)-646-0255 Visit http://www.kdcentral.com and browse the Tutorials section. BTW, Data Mining is Statistics reborn with a new name. :) -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: data mining
a few links i spotted ... hope these help ... could have listed more but, here is enough to shake a stick at!! http://www.spss.com/datamine/ http://www.spss.com/datamine/techniques.htm http://www.dci.com/events/datamin1/ http://www.dci.com/events/datamin2/ http://www.cs.bham.ac.uk/~anp/TheDataMine.html http://www.galaxy.gmu.edu/stats/syllabi/DMLIST.html http://www3.shore.net/~kht/ http://www.almaden.ibm.com/cs/quest/ http://www.dmbenchmarking.com/ http://datamining.itsc.uah.edu/ http://www.ncdm.uic.edu/ At 02:39 PM 9/25/00 -0400, you wrote: >I have been trying to search for reviews of data mining software (e.g., >MineSet, Clementine) with little success. In the past, some of you have >had recommendations/advice about stat packages; I wonder if you might >share your views on data mining: Specifically, > >1) Any feelings (+ or -) on data mining in general? >2) Any views (+ or -) on available software? >3) Any suggestions on where else I might look for info? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
data mining
I have been trying to search for reviews of data mining software (e.g., MineSet, Clementine) with little success. In the past, some of you have had recommendations/advice about stat packages; I wonder if you might share your views on data mining: Specifically, 1) Any feelings (+ or -) on data mining in general? 2) Any views (+ or -) on available software? 3) Any suggestions on where else I might look for info? Thanks for your help. rick Richard Barton, Statistical Consultant Dartmouth College Peter Kiewit Computing Services 6224 Baker/Berry Hanover, NH 03755 (603)-646-0255 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
IEEE Data Mining 2001: Call for Papers
[Apologies if you receive this more than once] -- ICDM '01: The 2001 IEEE International Conference on Data Mining Sponsored by the IEEE Computer Society -- Silicon Valley, California, USA November 29 - December 2, 2001 Home Page: http://kais.mines.edu/~xwu/icdm/icdm-01.html Call for Papers *** The 2001 IEEE International Conference on Data Mining (ICDM '01) provides a forum for the sharing of original research results and practical development experiences among researchers and application developers from different data mining related areas such as machine learning, automated scientific discovery, statistics, pattern recognition, knowledge acquisition, soft computing, databases and data warehousing, data visualization, and knowledge-based systems. The conference seeks solutions to challenging problems facing the development of data mining systems, and shapes future directions of research by promoting high quality, novel and daring research findings. As an important part of the conference, the workshops program will focus on new research challenges and initiatives. Topics of Interest == Topics related to the design, analysis and implementation of data mining theory, systems and applications are of interest. These include, but are not limited to the following areas: - Foundations and principles of data mining - Data mining algorithms and methods in traditional areas (such as classification, clustering, probabilistic modeling, and association analysis), and in new areas - Data and knowledge representation for data mining - Modeling of structured, textual, temporal, spatial, multimedia and Web data to support data mining - Complexity, efficiency, and scalability issues in data mining - Data pre-processing, data reduction, feature selection and feature transformation - Statistics and probability in large-scale data mining - Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining - Integration of data warehousing, OLAP and data mining - Man-machine interaction in data mining and visual data mining - Artificial intelligence contributions to data mining - High performance and distributed data mining - Machine learning, pattern recognition and automated scientific discovery - Quality assessment and interestingness metrics of data mining results - Process centric data mining and models of data mining process - Security and social impact of data mining - Emerging data mining applications, such as electronic commerce, Web mining and intelligent learning database systems Conference Publications and ICDM Best Paper Awards == High quality papers in all data mining areas are solicited. Papers exploring new directions will receive a careful and supportive review. All submitted papers should be limited to a maximum of 6,000 words (approximately 20 A4 pages), and will be reviewed on the basis of technical quality, relevance to data mining, originality, significance, and clarity. Accepted papers will be published in the conference proceedings by the IEEE Computer Society Press. A selected number of ICDM '01 accepted papers will be expanded and revised for possible inclusion in the Knowledge and Information Systems journal (http://kais.mines.edu/~kais/) by Springer-Verlag. ICDM Best Paper Awards will be conferred on the authors of the best papers at the conference. Important Dates === June 15, 2001Paper submissions. July 31, 2001Acceptance notices. August 31, 2001 Final camera-readies. Nov 29 - Dec 2, 2001 Conference. Detailed instructions for paper submissions will be provided on the conference home page at http://kais.mines.edu/~xwu/icdm/icdm-01.html. Conference Chair: = Xindong Wu, Colorado School of Mines, USA ([EMAIL PROTECTED]) Program Committee Chairs: = Nick Cercone, University of Waterloo, Canada ([EMAIL PROTECTED]) T.Y. Lin, San Jose State University, USA ([EMAIL PROTECTED]) ICDM '01 Workshops Chair: = Johannes Gehrke, Cornell University, USA ([EMAIL PROTECTED]) ICDM '01 Tutorials Chair: = Chris Clifton, MITRE, USA ([EMAIL PROTECTED]) ICDM '01 Panels Chair: == Ramamohanarao Kot
Re: Data Mining blooper
I believe the author that Ellen quoted was referring to MCS and, if so, I agree with that author. IMHO, there will eventually be MCS software that will allow high school students to run circles around what today's PhDs do with closed form solutions. William Chambers wrote: > Ellen, > > It amazes me to read the self-righteous judgements of people on this > thread.. a number of whom have made incompetent criticisms of corresponding > correlations with the same arrogance and stupidity that they attribute to > the data mining boys, When the purpose becomes making money and not > pursuing truth, then we see such arrogance, lies and eventual evil. Real > people get hurt. Truth itself begins to appear to be an illusion, What ever > sells appears to be the good, When people are not willing to learn and to > discuss; to test and to discover, then we get the sort of ridiculous > examples of the pot calling the kettle black making up this thread. I like > your call for discernment. Good luck. > > Bill Chambers > > Ellen Hertz wrote in message <[EMAIL PROTECTED]>... > >I looked up one and copied it: > > > > "For the first time, thanks to the increased power of computers, new > >methods replace the skill of the statistical artisan with > massive-computational > >methods, obtaining equal or better results in far less time without > requiring > >any specialised knowledge." > > > >In all fairness, I haven't read the whole paper and if he is referring > purely to > >computations such as generating maximum likelihood estimates or inverting > >matrices, he is quite right that computers beat pencils. If he means to > just run > >programs without knowing what they mean and generate GIGO, that certainly > is > >dangerous. > > > >Ellen Hertz > > > >Zubin wrote: > > > >> Can you be more specific on what the misleading statements are? And why > you > >> think they are misleading. > >> > >> T.S. Lim wrote in message > >> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > >> > While hunting for URLs for KDCentral.com, I encountered several > >> > misleading statements about Statistics made by Data Mining people. > >> > I've posted 3 of them to my bulletin board. If you encounter other > >> > wrong remarks, I invite you to post them to the board too at > >> > > >> >http://www.recursive-partitioning.com/forums > >> > > >> > Thanks. > >> > > >> > > >> > > >> > > >> > -- > >> > T.S. Lim > >> > [EMAIL PROTECTED] > >> > www.Recursive-Partitioning.com > >> > > >> > > >> > > >> > > >> > Get paid to write review! http://recursive-partitioning.epinions.com > > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Megaputer ships PolyAnalyst 4.1 - the first data mining tool supporting OLE DB for Data Mining
Bloomington, IN May 2, 2000. -- Megaputer Intelligence Inc. today announced the release of PolyAnalyst 4.1, a next version of the leading data mining system featuring the support for an innovative SQL-based protocol, OLE DB for Data Mining. The new system also implements a powerful Data Import Wizard and a special architecture for dealing with very large databases. A non-traditional Decision Tree algorithm, a limited-time free addition to PolyAnalyst 4.1 scheduled for release on May 20-th, further enriches the suite of eleven unique machine learning techniques offered by the system. PolyAnalyst 4.1 and PolyAnalyst Knowledge Server 4.1 from Megaputer are the first commercial data mining applications shipping with a built-in support for all major functions of OLE DB for Data Mining. This new standard simplifies communication and provides deep integration of data mining applications with data storage and management tools. OLE DB for Data Mining was introduced recently by Microsoft Corporation and is backed by a dozen leading data mining vendors. The beta specification for OLE DB for Data Mining is currently available at http://www.microsoft.com/data/oledb/ and will be open for public review until May 15, 2000. "The implementation for OLE DB for Data Mining support in PolyAnalyst 4.1 is the first commercial illustration of the deep integration of RDBMS and Data Mining applications provided by this new protocol", says Steve Murchie, SQL Server group product manager, Microsoft Corporation. "Megaputer Intelligence delivered this exciting new version of PolyAnalyst in record time." The support for OLE DB for Data Mining built in the new data mining solution from Megaputer represents a crucial step toward making data mining functionality available to any business analyst or developer, without the need for specialized knowledge of a particular data mining tool. An integration of this functionality with the broadest set of machine learning algorithms offered by PolyAnalyst 4.1 allows the users to readily address any data mining task. An evaluation copy of PolyAnalyst 4.1 accompanied by a tutorial illustrating the use of OLE DB for Data Mining is available for downloading at http://www.megaputer.com. Founded in 1993, Megaputer is a leader in software for Business Intelligence. The company provides a complete family of innovative solutions that help customers make better business decisions. Megaputer offers best-of-breed tools for data mining, semantic text analysis, and information management. # Megaputer and PolyAnalyst are registered trademarks of Megaputer Intelligence Inc. in the United States and/or other countries. The names of other companies and products mentioned herein may be the trademarks of their respective owners. For more information: Sergei Ananyan, Megaputer, (812) 330-0110, [EMAIL PROTECTED] Note to editors: If you are interested in viewing additional information on Megaputer, please visit the Megaputer Web page at http://www.megaputer.com/. # If you do not wish to receive future press releases from Megaputer, please reply to this message with "remove" in the subject line. Our database will be updated correspondingly. Thank you. # === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
William Chambers wrote: "It amazes me to read the self-righteous judgements of people on this thread.. a number of whom have made incompetent criticisms of corresponding correlations with the same arrogance and stupidity that they attribute to the data mining boys, ..." As one of the 'data mining boys', I would assert that the data mining field is littered with exactly the kind of mistaken thinking which has been described in this thread. I don't know that data miners (as a group) are a whole lot worse than statisticians (as a group), but there are certainly problems. I will leave questions of self-righteousness to others, but I would point out that whether or not the critics are incompetent, etc. has nothing to do with whether their claims of data mining incompetence are true. Will Dwinnell [EMAIL PROTECTED] === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
"Data Mining" is a loosely and vaguely defined term that refers to things that people do to understand and explore data. It means different things when used by different people. It may mean one of the following: 1. Classical data analysis/statistical modeling such as linear regression. 2. AI stuff (neural networks, fuzzy logic, MARS, nearest neighbor, etc) 3. Database architecture and data warehousing. More like computer science instead of statistics. One good example is the the "Data mining Review" magazine. Check out their web site and you will realize that it has little to do with statistics: http://www.dmreview.com/ K. Freeman Debasmit Mohanty wrote: > I think, now is the time when we have to decide "Do we accept DATA MINING as a > part of statistics or do we keep neglecting this field as before". > > I am sure there would be few statistics students like me who feel that Data > Mining is very much the part of statistics. > > Thanks > Debasmit > The contents of this message express only the sender's opinion. This message does not necessarily reflect the policy or views of my employer, Merck & Co., Inc. All responsibility for the statements made in this Usenet posting resides solely and completely with the sender. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
An example of specialized knowledge: Last Friday, a colleague showed me how he was using a data mining program to cluster over 1000 genes using 5 variables. After clustering, he used the program to generate a pretty, spinnable 3-D plot of his data on 3 of the original variables. It had color-coded clusters; and one could also click on a plotted point and its id # and variable values would pop up. Some problems: 1) Four of the variables were measured on a scale of 0-2, the 5th was on a scale of 0-107. He had no idea that with the distance measure he was using (Euclidean) that his clustering could be dominated by that 5th variable. 2) He chose a final cluster solution of two clusters simply because the program suggested that was the best solution (not indicating why). But he was using k-means clustering, and was setting his initial estimate of number of clusters to 2. 3) He clearly had some outliers in his data set that were being masked. 4) He didn't realize that a different choice of 3 variables for plotting could result in a very different picture of his data. 5) He had chosen his plotting symbol to be large enough that, when points had similar coordinates, some points were hidden. I pointed out some of these issues, we played with the data and the output, and it was a learning experience for both of us: he gained some knowledge of stats and I got to see some of the advantages/disadvantages of a data mining program. I suppose the programs can be useful tools in the right hands; this comes from someone who, as a kid, didn't know that a hatchet was not the preferred tool for chopping ice off a roof. rick --- "Donald F. Burrill" wrote: Thanks, Ellen. Evocative quote, isn't it? It's that "without requiring *any* (!) specialized knowledge" that will be the dangerous part, if read too literally by the naive. Interesting that you could get to Lim's URL at all. When _I _ tried it, several days ago, the system seemed to be trying to tell me that the /forums part of the URL wasn't accessible. But perhaps the problem was only temporary. -- Don. On Sun, 30 Apr 2000, Ellen Hertz wrote: > I looked up one and copied it: > > "For the first time, thanks to the increased power of computers, > new methods replace the skill of the statistical artisan with > massive-computational methods, obtaining equal or better results in far > less time without requiring any specialised knowledge." > > In all fairness, I haven't read the whole paper and if he is referring > purely to computations such as generating maximum likelihood estimates > or inverting matrices, he is quite right that computers beat pencils. > If he means to just run programs without knowing what they mean ... "untouched by the human mind", as Heidi Kass used to put it ... > and generate GIGO, that certainly is dangerous. Ayuh. -- DFB. > Ellen Hertz Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === --- end of quote --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
Ellen, It amazes me to read the self-righteous judgements of people on this thread.. a number of whom have made incompetent criticisms of corresponding correlations with the same arrogance and stupidity that they attribute to the data mining boys, When the purpose becomes making money and not pursuing truth, then we see such arrogance, lies and eventual evil. Real people get hurt. Truth itself begins to appear to be an illusion, What ever sells appears to be the good, When people are not willing to learn and to discuss; to test and to discover, then we get the sort of ridiculous examples of the pot calling the kettle black making up this thread. I like your call for discernment. Good luck. Bill Chambers Ellen Hertz wrote in message <[EMAIL PROTECTED]>... >I looked up one and copied it: > > "For the first time, thanks to the increased power of computers, new >methods replace the skill of the statistical artisan with massive-computational >methods, obtaining equal or better results in far less time without requiring >any specialised knowledge." > >In all fairness, I haven't read the whole paper and if he is referring purely to >computations such as generating maximum likelihood estimates or inverting >matrices, he is quite right that computers beat pencils. If he means to just run >programs without knowing what they mean and generate GIGO, that certainly is >dangerous. > >Ellen Hertz > >Zubin wrote: > >> Can you be more specific on what the misleading statements are? And why you >> think they are misleading. >> >> T.S. Lim wrote in message >> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... >> > While hunting for URLs for KDCentral.com, I encountered several >> > misleading statements about Statistics made by Data Mining people. >> > I've posted 3 of them to my bulletin board. If you encounter other >> > wrong remarks, I invite you to post them to the board too at >> > >> >http://www.recursive-partitioning.com/forums >> > >> > Thanks. >> > >> > >> > >> > >> > -- >> > T.S. Lim >> > [EMAIL PROTECTED] >> > www.Recursive-Partitioning.com >> > >> > >> > >> > >> > Get paid to write review! http://recursive-partitioning.epinions.com > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
Thanks, Ellen. Evocative quote, isn't it? It's that "without requiring *any* (!) specialized knowledge" that will be the dangerous part, if read too literally by the naive. Interesting that you could get to Lim's URL at all. When _I _ tried it, several days ago, the system seemed to be trying to tell me that the /forums part of the URL wasn't accessible. But perhaps the problem was only temporary. -- Don. On Sun, 30 Apr 2000, Ellen Hertz wrote: > I looked up one and copied it: > > "For the first time, thanks to the increased power of computers, > new methods replace the skill of the statistical artisan with > massive-computational methods, obtaining equal or better results in far > less time without requiring any specialised knowledge." > > In all fairness, I haven't read the whole paper and if he is referring > purely to computations such as generating maximum likelihood estimates > or inverting matrices, he is quite right that computers beat pencils. > If he means to just run programs without knowing what they mean ... "untouched by the human mind", as Heidi Kass used to put it ... > and generate GIGO, that certainly is dangerous. Ayuh. -- DFB. > Ellen Hertz Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
I looked up one and copied it: "For the first time, thanks to the increased power of computers, new methods replace the skill of the statistical artisan with massive-computational methods, obtaining equal or better results in far less time without requiring any specialised knowledge." In all fairness, I haven't read the whole paper and if he is referring purely to computations such as generating maximum likelihood estimates or inverting matrices, he is quite right that computers beat pencils. If he means to just run programs without knowing what they mean and generate GIGO, that certainly is dangerous. Ellen Hertz Zubin wrote: > Can you be more specific on what the misleading statements are? And why you > think they are misleading. > > T.S. Lim wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > While hunting for URLs for KDCentral.com, I encountered several > > misleading statements about Statistics made by Data Mining people. > > I've posted 3 of them to my bulletin board. If you encounter other > > wrong remarks, I invite you to post them to the board too at > > > >http://www.recursive-partitioning.com/forums > > > > Thanks. > > > > > > > > > > -- > > T.S. Lim > > [EMAIL PROTECTED] > > www.Recursive-Partitioning.com > > > > > > > > > > Get paid to write review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
- Forwarded message from Frank E Harrell Jr - I'd like to make a somewhat related point. There are many educational tools that I've found have a great effect on non-statisticians. One if these is to take one of their datasets, randomly permute the column of Y-values, go through their data mining procedure, and see what it finds. The more that it finds, the more the client becomes properly afraid of the technique and respectful of the statistician's careful approach. -Frank Harrell - End of forwarded message from Frank E Harrell Jr - That's a nice example, though I would not have the confidence that they would not see it as a wonderful way to discover even more "relationships"!-) You could also try sorting the X and Y columns independently to boost R^2. Some of my students just supplied another example. I was ill and emailed class cancellation well in advance. Some of these folks seem to practice procrastination as a religion, and put off the experimental design assignment due the day I was out and did it with the time series assignment due at the next class. As a result, some of them introduced a "trend" variable into the experimental design data, and got a "significant" p-value. I have not had the opportunity yet to ask them what "trend" measures in this context, or what value I should plug in for it if I want to make a prediction of sales when commission is 5% in Division C. I used to have a sheet with four residual plots per side that I asked students to interpret on exams. One plot was a big smiley face. The answer I hoped for was, "There seems to be a pattern here, but I don't think any of the techniques we have studied would be appropriate to deal with it." I wonder what the data mining software would do with it? Smile back maybe? _ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | Rural Route 1, Box 10 /| Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_/ [EMAIL PROTECTED] fax (603) 535-2943 (work) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
I'd like to make a somewhat related point. There are many educational tools that I've found have a great effect on non-statisticians. One if these is to take one of their datasets, randomly permute the column of Y-values, go through their data mining procedure, and see what it finds. The more that it finds, the more the client becomes properly afraid of the technique and respectful of the statistician's careful approach. -Frank Harrell "Silvert, Henry" wrote: > I respectfully disagree with Michael Wyatt. I come from an academic > background and now work outside of academia, except for the occassional > course here or there. I too report to a manager or managers, depending on > the circumstances. But my experiences have not been the same as his. I am > constantly urged to use all my skills as a statistician and a research > methodologist by "my managers." (Horrid!!!) > > Henry M. Silvert PHD > Research Statistician > The Conference Board > 845 3rd. Avenue > New York, NY 10022 > Tel. No.: (212) 339-0438 > Fax No.: (212) 836-3825 > > > -Original Message- > > From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] > > Sent: Friday, April 28, 2000 7:52 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Data Mining blooper and Related Subjects > > > > ...And it extends even further. Many of us who toil in areas outside of > > academia have our work and productivity "supervised" by managers or > > directors who have little or no training in statistics, beyond a survey > > course. They receive the flashy brochures and read the ads that promise > > analytical software that will provide significant information, without > > the bother of of formulating one of those fancy-shmancy hypotheses. > > > > The higher-ups come to view data mining, decision support, outcomes > > analysis, & etc. as requiring no more skill than the ability to use a PC. > > I call it "The Myth of the Statistical Meat Grinder". The push of a > > button or two will generate the answer to all corporate questions, plus a > > few neat-o graphs for the board of directors packets. > > > > Michael T. Wyatt, Ph.D. > > (Embittered) Healthcare Analyst > > Quality Improvement Dept. > > DCH Regional Medical Center > > Tuscaloosa, AL > > > > > > > > On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]> > > writes: > > > At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: > > > > > > > > > >It does not surprise me one bit. The typical statistics > > > >course teaches statistical methods and pronouncements, with > > > >no attempt to achieve understanding. snip of more > > > > > > this is something i happen to agree with herman about ... but, it is > > > a much > > > broader problem than can be attributed to what happens in one course > > > > > > it is an attitude about what higher education is all about ... and > > > what the > > > goals are for it > > > > > > 'going to college' ... be it undergraduate level or graduate level > > > ... has > > > become a much more hit and miss experience, residence has little > > > meaning > > > ... that is being tailored more and more to the convenience of > > > students ... > > > and to what is 'user' friendly (or it won't SELL). studying > > > principles in > > > disciplines is hard work ... NOT user friendly ... so, less and less > > > is > > > being required in the way of diligent study. > > > > > > take graduate school for example ... there was a time, was there not > > > ... > > > where doctoral students were REALLY expected to be responsible for > > > their > > > dissertations AND were expected to be the experts in that particular > > > area > > > of inquiry ... AND to be competent enough to have done the work > > > him/herself > > > ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT > > > > > > but, what i have noticed over many years is that dissertations are > > > becoming > > > more of a committee effort ... yes, the student MAY have had the > > > idea > > > (though not necessarily) but, from there ... he/she gets help with > > > the > > > design ... has someone else do the analysis (because he/she did not > > > take > > > any/sufficient work in analytic methods to understand what is going > > > on) ... > > > gets help in writing and editin
Re: Data Mining blooper and Related Subjects (fwd)
- Forwarded message from Debasmit Mohanty - I think, now is the time when we have to decide "Do we accept DATA MINING as a part of statistics or do we keep neglecting this field as before". I am sure there would be few statistics students like me who feel that Data Mining is very much the part of statistics. - End of forwarded message from Debasmit Mohanty - It may be a disagreement over words. Much of the work Tukey et al. did in the 60s, called exploratory data analysis, had to do with looking at data and trying to detect patterns. However, if you sift through data you will find many "patterns" that are just flukes of chance. How do you avoid taking these seriously? This was a criticism directed at Tukey then, and even more so at what goes on today under the name of "Data Mining". But I have a sense that Tukey had a much deeper awareness of the underlying statitical issues than most of the miners have!-) _ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | Rural Route 1, Box 10 /| Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_/ [EMAIL PROTECTED] fax (603) 535-2943 (work) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
I have been following the discussion on Data Mining blooper for a while. Being a first year graduate student in statistics, my comments on this issue might sound premature. Nevertheless, I would put forward my observations. What I have learnt so far from my interaction with the statisticians in the academics as well as in the industry is the following: 1) Many of the statisticians still feel that "Data Mining" as a discipline should be left for the people in computer science. Of course, I don't agree to this statement at all. If you read the paper "Data Mining and Statistics" by Dr. J. Friedman, you would realize how statisticians have neglected this emerging field over last few years. 2) There are few statistics graduate programs which emphasize on "Data Mining" research. Of course, there are few ones like Carnegie Mellon. But overall, we are yet to give the much needed attention it needs. I think, now is the time when we have to decide "Do we accept DATA MINING as a part of statistics or do we keep neglecting this field as before". I am sure there would be few statistics students like me who feel that Data Mining is very much the part of statistics. Thanks Debasmit -- Debasmit Mohanty Graduate Student - Statistics http://bama.ua.edu/~mohan001/ -- Date: Wed, 26 Apr 2000 11:38:28 -0400 From: dennis roberts <[EMAIL PROTECTED]> Subject: At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: >It does not surprise me one bit. The typical statistics >course teaches statistical methods and pronouncements, with >no attempt to achieve understanding. snip of more this is something i happen to agree with herman about ... but, it is a much broader problem than can be attributed to what happens in one course it is an attitude about what higher education is all about ... and what the goals are for it 'going to college' ... be it undergraduate level or graduate level ... has become a much more hit and miss experience, residence has little meaning ... that is being tailored more and more to the convenience of students ... and to what is 'user' friendly (or it won't SELL). studying principles in disciplines is hard work ... NOT user friendly ... so, less and less is being required in the way of diligent study. take graduate school for example ... there was a time, was there not ... where doctoral students were REALLY expected to be responsible for their dissertations AND were expected to be the experts in that particular area of inquiry ... AND to be competent enough to have done the work him/herself ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT but, what i have noticed over many years is that dissertations are becoming more of a committee effort ... yes, the student MAY have had the idea (though not necessarily) but, from there ... he/she gets help with the design ... has someone else do the analysis (because he/she did not take any/sufficient work in analytic methods to understand what is going on) ... gets help in writing and editing .. and, even gets help in terms of what their results MEAN ... gives new meaning to the term: "cooperative learning" Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Data Mining blooper and Related Subjects
I respectfully disagree with Michael Wyatt. I come from an academic background and now work outside of academia, except for the occassional course here or there. I too report to a manager or managers, depending on the circumstances. But my experiences have not been the same as his. I am constantly urged to use all my skills as a statistician and a research methodologist by "my managers." (Horrid!!!) Henry M. Silvert PHD Research Statistician The Conference Board 845 3rd. Avenue New York, NY 10022 Tel. No.: (212) 339-0438 Fax No.: (212) 836-3825 > -Original Message- > From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] > Sent: Friday, April 28, 2000 7:52 AM > To: [EMAIL PROTECTED] > Subject: Re: Data Mining blooper and Related Subjects > > ...And it extends even further. Many of us who toil in areas outside of > academia have our work and productivity "supervised" by managers or > directors who have little or no training in statistics, beyond a survey > course. They receive the flashy brochures and read the ads that promise > analytical software that will provide significant information, without > the bother of of formulating one of those fancy-shmancy hypotheses. > > The higher-ups come to view data mining, decision support, outcomes > analysis, & etc. as requiring no more skill than the ability to use a PC. > I call it "The Myth of the Statistical Meat Grinder". The push of a > button or two will generate the answer to all corporate questions, plus a > few neat-o graphs for the board of directors packets. > > Michael T. Wyatt, Ph.D. > (Embittered) Healthcare Analyst > Quality Improvement Dept. > DCH Regional Medical Center > Tuscaloosa, AL > > > > On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]> > writes: > > At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: > > > > > > >It does not surprise me one bit. The typical statistics > > >course teaches statistical methods and pronouncements, with > > >no attempt to achieve understanding. snip of more > > > > this is something i happen to agree with herman about ... but, it is > > a much > > broader problem than can be attributed to what happens in one course > > > > it is an attitude about what higher education is all about ... and > > what the > > goals are for it > > > > 'going to college' ... be it undergraduate level or graduate level > > ... has > > become a much more hit and miss experience, residence has little > > meaning > > ... that is being tailored more and more to the convenience of > > students ... > > and to what is 'user' friendly (or it won't SELL). studying > > principles in > > disciplines is hard work ... NOT user friendly ... so, less and less > > is > > being required in the way of diligent study. > > > > take graduate school for example ... there was a time, was there not > > ... > > where doctoral students were REALLY expected to be responsible for > > their > > dissertations AND were expected to be the experts in that particular > > area > > of inquiry ... AND to be competent enough to have done the work > > him/herself > > ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT > > > > but, what i have noticed over many years is that dissertations are > > becoming > > more of a committee effort ... yes, the student MAY have had the > > idea > > (though not necessarily) but, from there ... he/she gets help with > > the > > design ... has someone else do the analysis (because he/she did not > > take > > any/sufficient work in analytic methods to understand what is going > > on) ... > > gets help in writing and editing .. and, even gets help in terms of > > what > > their results MEAN ... > > > > gives new meaning to the term: "cooperative learning" > > > > > > > > > > > > > = > == > > This list is open to everyone. Occasionally, less thoughtful > > people send inappropriate messages. Please DO NOT COMPLAIN TO > > THE POSTMASTER about these messages because the postmaster has no > > way of controlling them, and excessive complaints will result in > > termination of the list. > > > > For information about this list, including information about the > > problem of inappropriate messages and information about how to > > unsubscribe, please see the web page at > > http://jse.stat.ncsu.edu/ > &g
Re: Data Mining blooper and Related Subjects
...And it extends even further. Many of us who toil in areas outside of academia have our work and productivity "supervised" by managers or directors who have little or no training in statistics, beyond a survey course. They receive the flashy brochures and read the ads that promise analytical software that will provide significant information, without the bother of of formulating one of those fancy-shmancy hypotheses. The higher-ups come to view data mining, decision support, outcomes analysis, & etc. as requiring no more skill than the ability to use a PC. I call it "The Myth of the Statistical Meat Grinder". The push of a button or two will generate the answer to all corporate questions, plus a few neat-o graphs for the board of directors packets. Michael T. Wyatt, Ph.D. (Embittered) Healthcare Analyst Quality Improvement Dept. DCH Regional Medical Center Tuscaloosa, AL On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]> writes: > At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: > > > >It does not surprise me one bit. The typical statistics > >course teaches statistical methods and pronouncements, with > >no attempt to achieve understanding. snip of more > > this is something i happen to agree with herman about ... but, it is > a much > broader problem than can be attributed to what happens in one course > > it is an attitude about what higher education is all about ... and > what the > goals are for it > > 'going to college' ... be it undergraduate level or graduate level > ... has > become a much more hit and miss experience, residence has little > meaning > ... that is being tailored more and more to the convenience of > students ... > and to what is 'user' friendly (or it won't SELL). studying > principles in > disciplines is hard work ... NOT user friendly ... so, less and less > is > being required in the way of diligent study. > > take graduate school for example ... there was a time, was there not > ... > where doctoral students were REALLY expected to be responsible for > their > dissertations AND were expected to be the experts in that particular > area > of inquiry ... AND to be competent enough to have done the work > him/herself > ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT > > but, what i have noticed over many years is that dissertations are > becoming > more of a committee effort ... yes, the student MAY have had the > idea > (though not necessarily) but, from there ... he/she gets help with > the > design ... has someone else do the analysis (because he/she did not > take > any/sufficient work in analytic methods to understand what is going > on) ... > gets help in writing and editing .. and, even gets help in terms of > what > their results MEAN ... > > gives new meaning to the term: "cooperative learning" > > > > > > = == > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > = == YOU'RE PAYING TOO MUCH FOR THE INTERNET! Juno now offers FREE Internet Access! Try it today - there's no risk! For your FREE software, visit: http://dl.www.juno.com/get/tagj. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
They are amazingly misleading. A basic stat book will explain why. One example is the statement that traditional statistical methods assume that predictor variables are uncorrelated with each other - incredible! Zubin wrote: > Can you be more specific on what the misleading statements are? And why you > think they are misleading. > > T.S. Lim wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > While hunting for URLs for KDCentral.com, I encountered several > > misleading statements about Statistics made by Data Mining people. > > I've posted 3 of them to my bulletin board. If you encounter other > > wrong remarks, I invite you to post them to the board too at > > > >http://www.recursive-partitioning.com/forums > > > > Thanks. > > > > > > > > > > -- > > T.S. Lim > > [EMAIL PROTECTED] > > www.Recursive-Partitioning.com > > > > > > > > > > Get paid to write review! http://recursive-partitioning.epinions.com -- Frank E Harrell Jr Professor of Biostatistics and Statistics Division of Biostatistics and Epidemiology Department of Health Evaluation Sciences University of Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper
Can you be more specific on what the misleading statements are? And why you think they are misleading. T.S. Lim wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > While hunting for URLs for KDCentral.com, I encountered several > misleading statements about Statistics made by Data Mining people. > I've posted 3 of them to my bulletin board. If you encounter other > wrong remarks, I invite you to post them to the board too at > >http://www.recursive-partitioning.com/forums > > Thanks. > > > > > -- > T.S. Lim > [EMAIL PROTECTED] > www.Recursive-Partitioning.com > > > > > Get paid to write review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: >It does not surprise me one bit. The typical statistics >course teaches statistical methods and pronouncements, with >no attempt to achieve understanding. snip of more this is something i happen to agree with herman about ... but, it is a much broader problem than can be attributed to what happens in one course it is an attitude about what higher education is all about ... and what the goals are for it 'going to college' ... be it undergraduate level or graduate level ... has become a much more hit and miss experience, residence has little meaning ... that is being tailored more and more to the convenience of students ... and to what is 'user' friendly (or it won't SELL). studying principles in disciplines is hard work ... NOT user friendly ... so, less and less is being required in the way of diligent study. take graduate school for example ... there was a time, was there not ... where doctoral students were REALLY expected to be responsible for their dissertations AND were expected to be the experts in that particular area of inquiry ... AND to be competent enough to have done the work him/herself ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT but, what i have noticed over many years is that dissertations are becoming more of a committee effort ... yes, the student MAY have had the idea (though not necessarily) but, from there ... he/she gets help with the design ... has someone else do the analysis (because he/she did not take any/sufficient work in analytic methods to understand what is going on) ... gets help in writing and editing .. and, even gets help in terms of what their results MEAN ... gives new meaning to the term: "cooperative learning" === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
In article <002601bfaf29$cfbaa9a0$[EMAIL PROTECTED]>, David A. Heiser <[EMAIL PROTECTED]> wrote: >- Original Message - >From: T.S. Lim >To: <[EMAIL PROTECTED]> >Sent: Tuesday, April 25, 2000 10:49 AM >Subject: Data Mining blooper >> While hunting for URLs for KDCentral.com, I encountered several >> misleading statements about Statistics made by Data Mining people. >> I've posted 3 of them to my bulletin board. If you encounter other >> wrong remarks, I invite you to post them to the board too at >>http://www.recursive-partitioning.com/forums >> Thanks. >'''. >... >This essentially supports my argument over the last few years. The >commercial selling of overpriced black boxes generates so much profit for >these companies that they can make any claim whatsoever, and people will buy >it, just like in politics. >The basic selling line is, "you may be stupid, with absolutely no knowledge >of anything, but if you buy my overpriced $20,000 software, you become a >noted expert in anything. You don't have to know anything to use my software >(or vote for me, or)". It amazes me that college graduates buy this >hook, line and sinker. Then they ask questions on edstat about what does the >output mean. >DAH It does not surprise me one bit. The typical statistics course teaches statistical methods and pronouncements, with no attempt to achieve understanding. How many coming out of such a course are cognizant that a significance statement is a statement about the probability BEFORE the observations are taken that the null hypothesis will be rejected? How many understand what the likelihood function means, and why one should even consider the likelihood principle? If students come out of a statistics course believing that statistics is a black box into which one puts the data, with no assumptions, and it spews out the state of the universe, or at least the "statistical conclusions", how could it be expected that they NOT consider what is offered as just a better black box. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
- Original Message - From: T.S. Lim To: <[EMAIL PROTECTED]> Sent: Tuesday, April 25, 2000 10:49 AM Subject: Data Mining blooper > While hunting for URLs for KDCentral.com, I encountered several > misleading statements about Statistics made by Data Mining people. > I've posted 3 of them to my bulletin board. If you encounter other > wrong remarks, I invite you to post them to the board too at > >http://www.recursive-partitioning.com/forums > > Thanks. '''. ... This essentially supports my argument over the last few years. The commercial selling of overpriced black boxes generates so much profit for these companies that they can make any claim whatsoever, and people will buy it, just like in politics. The basic selling line is, "you may be stupid, with absolutely no knowledge of anything, but if you buy my overpriced $20,000 software, you become a noted expert in anything. You don't have to know anything to use my software (or vote for me, or)". It amazes me that college graduates buy this hook, line and sinker. Then they ask questions on edstat about what does the output mean. DAH === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Data Mining blooper
While hunting for URLs for KDCentral.com, I encountered several misleading statements about Statistics made by Data Mining people. I've posted 3 of them to my bulletin board. If you encounter other wrong remarks, I invite you to post them to the board too at http://www.recursive-partitioning.com/forums Thanks. -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com Get paid to write review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
( how did we get to HERE, from Data Mining?) On 15 Apr 2000 17:50:05 GMT, [EMAIL PROTECTED] (Radford Neal) wrote: > In article <[EMAIL PROTECTED]>, > Rich Ulrich <[EMAIL PROTECTED]> wrote: > > >One thing that remains true about stock investment schemes: There may > >be some overall growth, somewhere, but in a specific, narrow > >perspective, the whole market makes up a zero-sum game. If someone > >wins, someone else has to lose. > > The above is internally contradictory, but the final statement is > clearly false. Hey, the final statement is a DEFINITION of zero-sum game. Where is YOUR mind wandering to? I have no objection to wise investments, and that is why I specified tried to specify a different context, that is, "schemes." - Sorry that I > Of course, short-term "day trading" is largely a zero-sum game, as the > return to be expected over such a short time period is very small. - much of it only becomes zero-sum, when the time period is LONG. There are fortunes made on a soaring market. - actually, I expect there are a few Wise Guys who will extract most of the profit, so techno-stocks will be negative-sum for most investors. There is a LONG history like that: In the 1830s and 1840s investors poured money into building canals in the U.S. and England. The countries benefitted from canals; a few manipulators got rich; most of the companies went broke and most of the investors lost money. Railroads followed the same pattern in the second half of that century. In the 1910s, the "wireless telegraph" had the investors flocking -- the U.S. government got involved in prosecuting traders for fraudulent offerings. But I don't know if that was as big as Railroads, in terms of dollars. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
In article <[EMAIL PROTECTED]>, Rich Ulrich <[EMAIL PROTECTED]> wrote: >One thing that remains true about stock investment schemes: There may >be some overall growth, somewhere, but in a specific, narrow >perspective, the whole market makes up a zero-sum game. If someone >wins, someone else has to lose. The above is internally contradictory, but the final statement is clearly false. Consider a pharmaceutical company with a research program. Suppose the general, well-founded opinion is that this program is not likely to produce much. The company's stock is low. But it turns that they get lucky, and discover a marvelous drug, that will save millions of lives, and make them lots of money. The company's stock goes up. The owners of the stock win. And it may be that nobody else loses. (It could be that owners of stock in a competing company lose, but if the drug is much better than previous drugs, they'll tend to lose less than the winners win. And perhaps there was NO drug for the disease before, in which case there were no competitors.) Aside from wins due to such surpises, there is indeed a general positive rate of return, stemming from the fact that capital actually is a useful factor in production, and there is also the possibility of an overall gain or loss as a result of a shift in the general degree of preference for consumption now over consumption later (expressed in terms of interest rates). Of course, short-term "day trading" is largely a zero-sum game, as the return to be expected over such a short time period is very small. Radford Neal === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
On 12 Apr 2000 15:21:21 -0700, [EMAIL PROTECTED] (Paul Bernhardt) wrote: > I suspect in this forum, almost as bad as the F-word or N-word are the > DM-words... Data Mining... I agree, but wonder about criteria. - since IBM started touting a product by that name, it is hard to ignore the new environment It is still possible that someone will start will a small amount of information, and "torture the data until it confesses." But online data collection produces databases with millions of sales events, organized by date, store, etc. What can be learned? > Often in our various research domains we have no choice but to use > retrospective data. A classic example might be validating an investment > approach by examining historical data, which some call backtesting. > > What are the criteria, how can we know when we have chance findings? > Try to look for "independence" so that you have an N that gives you increasing confidence; use something more extreme than 5% -- though you may be fooling yourself if you think that your reported level below the 0.1% level is really accurate. > I've argued that if the model is based on an a priori hypothesis, or can > be justfied by previously established theories, the possibility of data > mining may be ignored. When the pre-existing theory is less substantial, - How substantial is "less substantial" or how substantial was the PRIOR? If you are sure something is there, maybe you don't need much more evidence, okay. Right, more shoppers on a sunny day. On a payday. > one may ask if the discovered model fits data not included in the > original model (data which occurs after the model was discovered, or data > which precedes the data originally used to create the model). > > I'd like to hear the views of people on this forum. > > The specific situation I'm refering to is an investment model called the > Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) > which was found to beat the S&P500 and Dow 30 over the period from 1973 > through 1993. Since that date, and further backtested to 1961, it has not > similarly beat those traditional benchmark indexes, but also has not > performed worse (both of which could be due to lack of power). The > Foolish Four is based on a reasonable hypothesis that the worse < snip > One thing that remains true about stock investment schemes: There may be some overall growth, somewhere, but in a specific, narrow perspective, the whole market makes up a zero-sum game. If someone wins, someone else has to lose. IF there is an amount of regression-to-the-mean that you once were able to count on, then AFTER it is publicized, it can't keep on working for very long. If too many people try to cash in at once, strict application of the formula can suddenly become a big loser. Okay, you can work around the edges, and try to figure what stocks really *ought* to have been the ones in that group, before eager anticipation drove their prices up. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
Paul Bernhardt wrote: > I am not affiliated with the Motley Fool (where this investment strategy > is touted) nor am I advertising for them. It is just an interesting > practical problem which raises a question I think many statiticians face, > how to explain when someone has conducted data mining and when they might > have sussed out a valid truth. > > Paul Bernhardt > University of Utah > Department of Educational Psychology Looks like it tries to capitalize on regression to the mean. RttM only applies where something is made up of a true score lus a random component. Focussing on volatile stocks they seem to be attempting to choose sticks with relatively high random components. Thom === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
"T.S. Lim" wrote: > Data Mining = Statistics reborn with a new name. > > You ask the wrong crowd. Go to > >http://www.kdcentral.com > > and subscribe to datamine-l mailing list. That's debatable. The poster's question has as much to do with regression to the mean as with modeling, and anyway data mining has everything to do with statistics. -Frank Harrell === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
Data Mining = Statistics reborn with a new name. You ask the wrong crowd. Go to http://www.kdcentral.com and subscribe to datamine-l mailing list. In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >I suspect in this forum, almost as bad as the F-word or N-word are the >DM-words... Data Mining... I agree, but wonder about criteria. > >Often in our various research domains we have no choice but to use >retrospective data. A classic example might be validating an investment >approach by examining historical data, which some call backtesting. > >What are the criteria, how can we know when we have chance findings? > >I've argued that if the model is based on an a priori hypothesis, or can >be justfied by previously established theories, the possibility of data >mining may be ignored. When the pre-existing theory is less substantial, >one may ask if the discovered model fits data not included in the >original model (data which occurs after the model was discovered, or data >which precedes the data originally used to create the model). > >I'd like to hear the views of people on this forum. > >The specific situation I'm refering to is an investment model called the >Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) >which was found to beat the S&P500 and Dow 30 over the period from 1973 >through 1993. Since that date, and further backtested to 1961, it has not >similarly beat those traditional benchmark indexes, but also has not >performed worse (both of which could be due to lack of power). The >Foolish Four is based on a reasonable hypothesis that the worse >performing Dow Jones Industrial Average companies are poised to turn >around because they are simply too great to fail over the long term. The >judgement on poor performance is based on the stock yield (a high >yielding stock has a relatively high interest payment compared to price), >therefore a reasonable hypothesis is used to justify this approach. >Selection of 4 of the 5 worst performing Dow companies (the worst is >excluded because often these companies are in actual long term financial >trouble) is what makes up the Foolish Four. > >I am not affiliated with the Motley Fool (where this investment strategy >is touted) nor am I advertising for them. It is just an interesting >practical problem which raises a question I think many statiticians face, >how to explain when someone has conducted data mining and when they might >have sussed out a valid truth. > >Paul Bernhardt >University of Utah >Department of Educational Psychology -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Data Mining
I suspect in this forum, almost as bad as the F-word or N-word are the DM-words... Data Mining... I agree, but wonder about criteria. Often in our various research domains we have no choice but to use retrospective data. A classic example might be validating an investment approach by examining historical data, which some call backtesting. What are the criteria, how can we know when we have chance findings? I've argued that if the model is based on an a priori hypothesis, or can be justfied by previously established theories, the possibility of data mining may be ignored. When the pre-existing theory is less substantial, one may ask if the discovered model fits data not included in the original model (data which occurs after the model was discovered, or data which precedes the data originally used to create the model). I'd like to hear the views of people on this forum. The specific situation I'm refering to is an investment model called the Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) which was found to beat the S&P500 and Dow 30 over the period from 1973 through 1993. Since that date, and further backtested to 1961, it has not similarly beat those traditional benchmark indexes, but also has not performed worse (both of which could be due to lack of power). The Foolish Four is based on a reasonable hypothesis that the worse performing Dow Jones Industrial Average companies are poised to turn around because they are simply too great to fail over the long term. The judgement on poor performance is based on the stock yield (a high yielding stock has a relatively high interest payment compared to price), therefore a reasonable hypothesis is used to justify this approach. Selection of 4 of the 5 worst performing Dow companies (the worst is excluded because often these companies are in actual long term financial trouble) is what makes up the Foolish Four. I am not affiliated with the Motley Fool (where this investment strategy is touted) nor am I advertising for them. It is just an interesting practical problem which raises a question I think many statiticians face, how to explain when someone has conducted data mining and when they might have sussed out a valid truth. Paul Bernhardt University of Utah Department of Educational Psychology === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: data mining and spatial correlation
Hi Can anyone tell me how to research what is the size of the total advertising budgets in the developed countries. Thanks * Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping. Smart is Beautiful === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
data mining and spatial correlation
Could somebody help me with references on data mining and spatial correlation? THank you very much === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
[job] Positions in machine learning, statistics, and data mining
Athene Software, Inc. Positions in Machine Learning, Statistics, and Data Mining Athene Software, based in Boulder, Colorado, has immediate openings for professionals in machine learning, statistics, and data mining. We are seeking qualified candidates to develop and enhance models of subscriber behavior for telecommunications companies. Responsibilities include: statistical investigation of large data sets, building predictive and decision-making models using the latest advances in machine learning techniques, developing and tuning data representations, and presentation of results to internal and external customers. Candidates must hold a Ph.D. in Computer Science, Statistics, Electrical Engineering, or related field. The ideal candidate will have experience in pattern recognition or mathematical modeling on real world problems, familiarity with experimental design and data analysis, and some background in relational database systems. Strong communication skills are extremely important. Athene has a long-term committment to cultivating a dynamic, stimulating environment for its Ph.D. research staff. The group is slated to double over the next few years. Athene encourages publication of research results and active participation in the research community. And Athene has established a research advisory board consisting of leaders in machine learning, including Dr. Satinder Singh Baveja (AT&T Labs - Research), Prof. Geoffrey Hinton (University College London), Prof. John Moody (OGI), Prof. Andrew Moore (CMU), and Prof. Michael Mozer (Boulder). Send applications to: Dr. Robert Dodier Athene Software, Inc. 2060 Broadway, Suite 300 Boulder, CO 80302 email: [EMAIL PROTECTED] company URL: www.athenesoft.com Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
Re: Textbooks for a course in data mining for scientists and engineers
In article <86nrjb$ljd$[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >Can anyone suggest a good textbook for a course in data mining? The >students would graduate students in science and engineering with the >typical background being one or two undergraduate courses in >probability and statistics. > >-- >Brian Borchers [EMAIL PROTECTED] >Department of Mathematics http://www.nmt.edu/~borchers/ >New Mexico Tech Phone: 505-835-5813 >Socorro, NM 87801 FAX: 505-835-5366 There's no perfect book, as usual. You may need to combine several books. Or, you may choose a book and supplement it with notes from other books. Go to www.recursive-partitioning.com/books.html for some recent data mining and machine learning books. You also need to take into account software to try the various methods. -- Tjen-Sien Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
Re: Textbooks for a course in data mining for scientists and engineers
One very good book is "How to Find Noise in Data" by "I. Ben Fooled". Sorry - I couldn't resist. Frank E Harrell Jr Professor of Biostatistics and Statistics Division of Biostatistics and Epidemiology Department of Health Evaluation Sciences University of Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat Brian Borchers wrote: > Can anyone suggest a good textbook for a course in data mining? The > students would graduate students in science and engineering with the > typical background being one or two undergraduate courses in > probability and statistics. > > -- > Brian Borchers [EMAIL PROTECTED] > Department of Mathematics http://www.nmt.edu/~borchers/ > New Mexico Tech Phone: 505-835-5813 > Socorro, NM 87801 FAX: 505-835-5366 -- === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
Textbooks for a course in data mining for scientists and engineers
Can anyone suggest a good textbook for a course in data mining? The students would graduate students in science and engineering with the typical background being one or two undergraduate courses in probability and statistics. -- Brian Borchers [EMAIL PROTECTED] Department of Mathematics http://www.nmt.edu/~borchers/ New Mexico Tech Phone: 505-835-5813 Socorro, NM 87801 FAX: 505-835-5366 === This list is open to everyone. Occasionally, people lacking respect for other members of the list send messages that are inappropriate or unrelated to the list's discussion topics. Please just delete the offensive email. For information concerning the list, please see the following web page: http://jse.stat.ncsu.edu/ ===
PolyAnalyst 4.0 - Final Release of the Leading Data Mining Solution
Megaputer Intelligence www.megaputer.com Megaputer announces the final release of PolyAnalyst 4.0, the newest version of the leading data mining solution. The Megaputer development team extends many thanks to numerous beta-testers who helped perfecting the system. An evaluation copy of PolyAnalyst 4.0 can be downloaded from www.megaputer.com/html/webshop.html Version 4.0 represents a major upgrade of the system, positioning PolyAnalyst as the most comprehensive and versatile suite of data mining algorithms available today. PolyAnalyst now utilizes Distributed Component Object Model (DCOM) technology, features ten unique machine learning algorithms, and furnishes versatile data manipulation, visualization, and scoring capabilities. In addition to clustering, predicting, dependency detecting, and yes/no classifying, PolyAnalyst 4.0 solves tasks of explicit modeling, detection of association rules in transactional data, and classification to multiple categories. An open DCOM architecture makes PolyAnalyst 4.0 easily extendable, upgradable and customizable. It provides the user with an option to purchase only the necessary machine learning algorithms as individual modules and utilize these modules as an integral part of their data storage and management system. A DCOM-based PolyAnalyst Knowledge Server can support several client stations on a local network. New features of PolyAnalyst 4.0 include: * Unique Market Basket Analysis algorithm for processing transactional data. Groups of products sold together well and the corresponding directed association rules are identified an order of magnitude faster than by traditional algorithms. * New Memory Based Reasoning algorithm based on a combination of Nearest Neighbor and Genetic Algorithms. The new method is used efficiently for classification into multiple categories, as well as prediction of numerical variables. * Implementation of the DCOM architecture. Now individual PolyAnalyst algorithms can be easily utilized in the form of ActiveX modules in external decision support or data management applications. New PolyAnalyst machine learning modules can be easily added and upgraded. * Support for the analysis of large datasets. The maximum volume of data accepted by the system has been significantly increased and new mechanisms for dealing with large datasets have been implemented. * Redesigned user interface. The project contents are organized in a Windows-standard tree-like style, while preserving all the best features of the traditional PolyAnalyst interface. * Dynamic HTML reports for exploration engines. The new interactive reports can be customized, saved, printed, or copied in a standard portable format, and exchanged with external applications * Mouse-driven Rule Assistant for simple creation of user defined transformation rules. * Lift and Gain Charts for interpreting the results obtained by machine learning algorithms are especially valuable for direct marketing tasks. These features are very useful for measuring the extra profit reaped by a marketer making decisions based on the knowledge discovered by PolyAnalyst. * New Snake Chart for convenient visual comparison of different data sets. * New Chart Designer for the development of custom charts and advanced data visualization capabilities * New PA Scheduler enabling batch process data mining in PolyAnalyst. The user can record a sequence of actions and schedule the created script to be run by PolyAnalyst at a specified time and on the specified datasets. * Sampling capabilities, allowing random selection of records from a dataset. * Direct data exchange with Oracle Express and IBM Visual Warehouse. By offering these new features, Megaputer has clearly positioned PolyAnalyst 4.0 as the number one modern business solution for data analysis, says Sergei Arseniev, CEO of Megaputer Intelligence. Now PolyAnalyst combines the broadest selection of versatile machine learning algorithms with the convenience and flexibility of DCOM architecture. The new cutting-edge technology helped Megaputer significantly increase its DM market share during the last year. A number of Fortune 100 companies, such as Allstate Insurance, Boeing, and DuPont Dow Elastomers, have already switched to PolyAnalyst 4.0. Our customers are very enthusiastic about the system: Analytical engines do an excellent job of finding relations amongst many fields without overfitting. -- Timothy E. Nagle, Nycor Group, Consulting Scientist to 3M We chose PolyAnalyst because it offered broad analytic functionality and ease of use beyond any other product. -- Carl Cozine, Principal, CACTUS Strategies The software provides a unique and powerful set of tools for data mining applications, including promotion response analysis, customer segmentation and profiling, and cross-selling analysis. -- Raymond Burke, Chair of BA, Kelley School of Business, Indiana University Platforms: Windows NT/95/98/2000