CFP: IEEE Data Mining 2002 (new)

2002-02-09 Thread icdm02


[Apologies if you receive this more than once]

!!! NOTE: The Conference Date Changed to December 9-12, 2002 !!!

-
   ICDM '02: The 2002 IEEE International Conference on Data Mining
 Sponsored by the IEEE Computer Society
-
 Maebashi TERRSA, Maebashi City, Japan
  December 9 - 12, 2002 
   Home Page: http://kis.maebashi-it.ac.jp/icdm02
 Mirror Page: http://www.wi-lab.com/icdm02   

CORPORATE SPONSORS: 
AdIn Research, Inc.
 The Japan Research Institute, Limited
 Maebashi Convention Bureau 
  Maebashi City Government 
 Gunma Prefecture Government 
   Maebashi Institute of Technology 
 US AOARD, AROFE

 Call for Papers
 ***

The 2002 IEEE International Conference on Data Mining (IEEE ICDM '02)
provides a leading international forum for the sharing of original
research results and practical development experiences among
researchers and application developers from different data mining
related areas such as machine learning, automated scientific
discovery, statistics, pattern recognition, knowledge acquisition,
soft computing, databases and data warehousing, data visualization,
and knowledge-based systems.  The conference seeks solutions to
challenging problems facing the development of data mining systems,
and shapes future directions of research by promoting high quality,
novel and daring research findings.  As an important part of the
conference, the workshops program will focus on new research
challenges and initiatives.

Topics of Interest
==

Topics related to the design, analysis and implementation of data
mining theory, systems and applications are of interest.  These
include, but are not limited to the following areas:

  - Foundations and principles of data mining 
  - Data mining algorithms and methods in traditional areas (such as
classification, clustering, probabilistic modeling, and
association analysis), and in new areas
  - Data and knowledge representation for data mining 
  - Modeling of structured, textual, temporal, spatial, multimedia and
Web data to support data mining
  - Complexity, efficiency, and scalability issues in data mining
  - Data pre-processing, data reduction, feature selection and feature
transformation
  - Statistics and probability in large-scale data mining
  - Soft computing (including neural networks, fuzzy logic,
evolutionary computation, and rough sets) and uncertainty
management for data mining
  - Integration of data warehousing, OLAP and data mining 
  - Man-machine interaction in data mining and visual data mining 
  - Artificial intelligence contributions to data mining 
  - High performance and distributed data mining 
  - Machine learning, pattern recognition and automated scientific
discovery
  - Quality assessment and interestingness metrics of data mining
results
  - Process centric data mining and models of data mining process 
  - Security and social impact of data mining 
  - Emerging data mining applications, such as electronic commerce,
bioinformatics, Web intelligence, and intelligent learning database 
systems

Conference Publications and ICDM Best Paper Awards
==

High quality papers in all data mining areas are solicited.  Papers
exploring new directions will receive a careful and supportive review.

There are two different types of paper submission for IEEE ICDM '02:
(1) main track submissions and (2) industry track submissions.

For the main track submission, all submitted papers should be limited
to a maximum of 6,000 words (approximately 20 A4 pages), and will be
reviewed by the Program Committee on the basis of technical quality,
relevance to data mining, originality, significance, and
clarity. Accepted papers will be published in the conference
proceedings by the IEEE Computer Society Press. All main track paper
submissions will be handled electronically. Please use the Submission
Form (for main track) at the ICDM '02 webpage: 
http://kis.maebashi-it.ac.jp/icdm02 
to submit your paper (the due date is June 5, 2002).

For the industry track submission, please first check the following
conditions before your submission: 

(a) At least one author of each industry track paper should be
from a company (rather than a university), and the paper should be
about industrial or other real-world applications of data mining.
(b) The authors accepted as industry track papers need both oral
presentations and system demos at the conference.

All papers submitted to the Industry Track will be reviewed by the
mini Industry Track Program Committee, and each acc

ANN: New Online Master of Science in Data Mining at CCSU

2001-12-21 Thread Ann Wang

CCSU Launches Online Master of Science in Data Mining

Central Connecticut State University (CCSU) announces the launching of
an online Master of Science program in Data Mining, the first such
program to be offered online.

Data mining is the search for interesting patterns and trends in large
databases using statistical methods.  The MIT Technology Review chose
data mining as one of ten emerging technologies that will change the
world.  Data mining expertise is the most sought after among
information technology professionals, according to the 1999
Information Week National Salary Survey.  In a 2001 KDNuggets survey,
27% of data mining professionals earned more than $100,000 (US)
annually.

All courses in the data mining MS program are offered online.  This
means that class is as close as your computer, whether you live in
Beijing, New York, Singapore, or Canton.  Further, all courses are
asynchronous, meaning that students can work when they want to work,
whether at 3:00 in the afternoon, or 3:00 in the morning.

The 33-credit program, which can be completed in two years, consists
of courses in data mining, artificial intelligence, statistical
analysis, and computer science.  The MS in data mining is fully
licensed by the State of Connecticut Department of Higher Education.

The program stresses the solution of real-world problems, using
applications and case studies, while gaining a deep appreciation of
the underlying models.  These applications include customer
relationship management, credit-card fraud, and profit/cost
optimization.  Students will apply methodologies such as decision
trees, market basket analysis, neural networks, association rules, and
cluster detection.  Students will gain strong exposure to
state-of-the-art software such as the Clementine data mining suite
from SPSS.

Courses available online, starting in January, include Introduction to
Data Mining, Data Mining Methods, Linear Models, Foundations of
Computer Science, Database Concepts, and Mathematical Statistics II. 
Some prerequisite courses are also offered online.  To register for
these courses, proceed to OnlineCSU at http://onlinecsu.ctstateu.edu/.

For more information about the data mining program, including how to
apply, please visit www.ccsu.edu/datamining, or contact Program
Director Daniel T. Larose, Ph.D. at [EMAIL PROTECTED] or 860-832-2862.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



data mining course- feb 28- Palo Alto

2001-11-27 Thread Rob Tibshirani


Short course: Statistical learning and data mining

Trevor Hastie and Robert Tibshirani, Stanford Univ.


Sheraton Hotel
Palo Alto, Ca., Feb 28- Mar 1, 2002 


This two-day course gives a detailed overview of statistical
models for data mining, inference and prediction.
With the rapid developments in internet technology, genomics and
other high tech industries, we rely increasingly more on  data analysis
and statistical models to exploit the vast amounts of data
at our fingertips. 

This sequel to our popular Modern Regression and Classification course
covers many new areas of unsupervised learning and data mining,
and  gives an in-depth treatment of some of the hottest tools
in supervised learning.

The first course is not a pre-requisite for this new course.

Day one focusses on state-of-art  methods for supervised
learning including PRIM, boosting and support vector machines.
Day  two covers unsupervised learning including clustering,
principal components, principal curves and  self-organizing maps.
Many applications will be discussed, including DNA expression arrays.
These are one of the hottest new areas in biology!

###
Much of the material is based on the new book:

  Elements of Statistical Learning: data mining, inference and prediction 

(Hastie, Tibshirani & Friedman, Springer -Verlag, 2001).
 A copy of this book will be given to all attendees. 

###


go to the site

http://www-stat.stanford.edu/~hastie/mrc.html

for more information and online registration.


Please Email me if you have specific questions
([EMAIL PROTECTED]).




-- 
**
Rob Tibshirani, Dept of Health Research & Policy
 and Dept of Statistics
HRP Redwood Bldg
Stanford University
Stanford, California 94305-5405

phone: HRP: 650-723-7264 (Voice mail),  Statistics 650-723-1185
FAX 650-725-8977
[EMAIL PROTECTED]
http://www-stat.stanford.edu/~tibs


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



IEEE Data Mining 2001: Final Call for Participation

2001-11-17 Thread Ning Zhong

[Apologies if you receive this more than once]

IEEE Data Mining 2001: Final Call for Participation
===

The 2001 IEEE International Conference on Data Mining
Doubletree Hotel, San Jose, California, USA
November 29 - December 2, 2001

On-line registration at
http://www.cs.uvm.edu/~xwu/icdm/reg-01.html

Hotel reservation information at
http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml

Conference program and other information at
http://www.cs.uvm.edu/~xwu/icdm-01.html

With the support of both world-renowned experts and new researchers
from the international data mining community, ICDM '01 has received an
overwhelming response compared to any other data mining related
conference this year: 365 paper submissions, 8 workshop proposals, and
29 tutorial proposals.

* Invited Speakers: 

  - Jerome H. Friedman, Stanford University, USA
  - Jim Gray, Microsoft Research, USA
(The 1999 Turing Award Winner)
  - Pat Langley, Institute for the Study of Learning and Expertise, USA
  - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA
(President, IEEE Computer Society)

* ICDM '01 Tutorials (November 29, 2001):

  - Text and Data Mining for Bioinformatics, by Hinrich Schuetze
([EMAIL PROTECTED])
  - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED])

* ICDM '01 Workshops (November 29, 2001):

  - Text Mining (TextDM '2001)
(http://www-ai.ijs.si/DunjaMladenic/TextDM01/)
  - Integrating Data Mining and Knowledge Management 
(http://cui.unige.ch/~hilario/icdm-01/cfp.html)

* Paper Presentations (November 30 - December 2, 2001): Out of 365
  paper submissions, the IEEE ICDM '01 Program Committee accepted 72
  papers for regular presentation, and an additional 39 papers for
  poster presentation.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



ANN: New Online Master of Science in Data Mining

2001-11-12 Thread Daniel Larose

CCSU Launches Online Master of Science in Data Mining

Central Connecticut State University (CCSU) announces the launching of
an online Master of Science program in Data Mining, the first such
program to be offered online.

Data mining is the search for interesting patterns and trends in large
databases using statistical methods.  The MIT Technology Review chose
data mining as one of ten emerging technologies that will change the
world.  Data mining expertise is the most sought after among
information technology professionals, according to the 1999
Information Week National Salary Survey.  In a 2001 KDNuggets survey,
27% of data mining professionals earned more than $100,000 (US)
annually.

All courses in the data mining MS program are offered online.  This
means that class is as close as your computer, whether you live in
Beijing, New York, Singapore, or Canton.  Further, all courses are
asynchronous, meaning that students can work when they want to work,
whether at 3:00 in the afternoon, or 3:00 in the morning.

The 33-credit program, which can be completed in two years, consists
of courses in data mining, artificial intelligence, statistical
analysis, and computer science.  The MS in data mining is fully
licensed by the State of Connecticut Department of Higher Education.

The program stresses the solution of real-world problems, using
applications and case studies, while gaining a deep appreciation of
the underlying models.  These applications include customer
relationship management, credit-card fraud, and profit/cost
optimization.  Students will apply methodologies such as decision
trees, market basket analysis, neural networks, association rules, and
cluster detection.  Students will gain strong exposure to
state-of-the-art software such as the Clementine data mining suite
from SPSS.

Courses available online, starting in January, include Introduction to
Data Mining, Data Mining Methods, Linear Models, Foundations of
Computer Science, Database Concepts, and Mathematical Statistics II. 
Some prerequisite courses are also offered online.  To register for
these courses, proceed to OnlineCSU at http://onlinecsu.ctstateu.edu/.

For more information about the data mining program, including how to
apply, please visit www.ccsu.edu/datamining, or contact Program
Director Daniel T. Larose, Ph.D. at [EMAIL PROTECTED] or 860-832-2862.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



ANN: Book: Principles of Data Mining

2001-10-23 Thread Jud Wolfskill

I thought readers of sci.stat.edu might be interested in this book.  For
more information please visit http://mitpress.mit.edu/026208290X

Principles of Data Mining
David J. Hand, Heikki Mannila, and Padhraic Smyth

The growing interest in data mining is motivated by a common problem
across disciplines: how does one store, access, model, and ultimately
describe and understand very large data sets? Historically, different
aspects of data mining have been addressed independently by different
disciplines. This is the first truly interdisciplinary text on data
mining, blending the contributions of information science, computer
science, and statistics.

The book consists of three sections. The first, foundations, provides a
tutorial overview of the principles underlying data mining algorithms
and their application. The presentation emphasizes intuition rather than
rigor. The second section, data mining algorithms, shows how algorithms
are constructed to solve specific problems in a principled manner. The
algorithms covered include trees and rules for classification and
regression, association rules, belief networks, classical statistical
models, nonlinear models such as neural networks, and local
"memory-based" models. The third section shows how all of the preceding
analysis fits together when applied to real-world data mining problems.
Topics include the role of metadata, how to handle missing data, and
data preprocessing.

David J. Hand is Professor of Statistics, Department of Mathematics,
Imperial College, London. Heikki Mannila is Research Fellow at Nokia
Research Center and Professor, Department of Computer Science and
Engineering, Helsinki University of Technology. Padhraic Smyth is
Associate Professor, Department of Information and Computer Science, the
University of California, Irvine.

8 x 9, 425 pp.
cloth ISBN 0-262-08290-X
Adaptive Computation and Machine Learning series
A Bradford Book


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



CFP: IEEE Data Mining 2002

2001-10-16 Thread icdm02

[Apologies if you receive this more than once]

-
   ICDM '02: The 2002 IEEE International Conference on Data Mining
Sponsored by the IEEE Computer Society
--
Maebashi TERRSA, Maebashi City, Japan
  November 26 - 29, 2002 
   Home Page: http://kis.maebashi-it.ac.jp/icdm02
 Mirror Page: http://www.wi-lab.com/icdm02   

  Call for Papers
  ***

The 2002 IEEE International Conference on Data Mining (IEEE ICDM '02)
provides a leading international forum for the sharing of original
research results and practical development experiences among
researchers and application developers from different data mining
related areas such as machine learning, automated scientific
discovery, statistics, pattern recognition, knowledge acquisition,
soft computing, databases and data warehousing, data visualization,
and knowledge-based systems.  The conference seeks solutions to
challenging problems facing the development of data mining systems,
and shapes future directions of research by promoting high quality,
novel and daring research findings.  As an important part of the
conference, the workshops program will focus on new research
challenges and initiatives.

Topics of Interest
==

Topics related to the design, analysis and implementation of data
mining theory, systems and applications are of interest.  These
include, but are not limited to the following areas:

  - Foundations and principles of data mining 
  - Data mining algorithms and methods in traditional areas (such as
classification, clustering, probabilistic modeling, and
association analysis), and in new areas
  - Data and knowledge representation for data mining 
  - Modeling of structured, textual, temporal, spatial, multimedia and
Web data to support data mining
  - Complexity, efficiency, and scalability issues in data mining
  - Data pre-processing, data reduction, feature selection and feature
transformation
  - Statistics and probability in large-scale data mining
  - Soft computing (including neural networks, fuzzy logic,
evolutionary computation, and rough sets) and uncertainty
management for data mining
  - Integration of data warehousing, OLAP and data mining 
  - Man-machine interaction in data mining and visual data mining 
  - Artificial intelligence contributions to data mining 
  - High performance and distributed data mining 
  - Machine learning, pattern recognition and automated scientific
discovery
  - Quality assessment and interestingness metrics of data mining
results
  - Process centric data mining and models of data mining process 
  - Security and social impact of data mining 
  - Emerging data mining applications, such as electronic commerce,
bioinformatics, Web mining and intelligent learning database systems

Conference Publications and ICDM Best Paper Awards
==

High quality papers in all data mining areas are solicited.  Papers
exploring new directions will receive a careful and supportive review.

There are two different types of paper submission for IEEE ICDM '02:
(1) main track submissions and (2) industry track submissions.

All submitted papers should be limited to a maximum of 6,000 words
(approximately 20 A4 pages), and will be reviewed on the basis of
technical quality, relevance to data mining, originality,
significance, and clarity.

Accepted papers will be published in the conference proceedings by the
IEEE Computer Society Press.  A selected number of IEEE ICDM '02
accepted papers will be expanded and revised for possible inclusion in
the Knowledge and Information Systems journal
(http://kais.mines.edu/~kais/) by Springer-Verlag.

IEEE ICDM Best Paper Awards will be conferred on the authors of the
best papers at the conference.

Important Dates
===

  June 5, 2002   Main track paper submissions 
 Industry track paper submissions 

  June 30, 2002  Tutorial submissions
 Panel submissions
 Workshop proposals

  August 9, 2002 Paper acceptance notices

  September 2, 2002  Final camera-readies

  November 26-29, 2002   Conference

All paper submissions will be handled electronically.  Detailed
instructions are provided on the conference home page at
http://kis.maebashi-it.ac.jp/icdm02 and http://www.wi-lab.com/icdm02 

Honorary Chair: 
===

Setsuo Ohsuga, Waseda University, Japan

Conference Chairs:
==

  Ning Zhong, Maebashi Institute of Technology, Japan
  ([EMAIL PROTECTED])

  Philip S. Yu, IBM T.J. Watson Research Center, USA
  ([EMAIL PROTECTED])

Program Committee Chairs:
=

  Vipin 

IEEE Data Mining 2001: Call for Participation

2001-10-04 Thread Ning Zhong

[Apologies if you receive this more than once]

IEEE Data Mining 2001: Call for Participation
=

The 2001 IEEE International Conference on Data Mining
Doubletree Hotel, San Jose, California, USA
November 29 - December 2, 2001

 * On-line registration (and other information) at
   http://www.cs.uvm.edu/~xwu/icdm-01.html
   (Register by November 6 to save $100!)

 * Be sure to book hotel rooms by November 7 for discounted rates!
   (http://www.cs.uvm.edu/~xwu/icdm/hotel-01.shtml)

The 2001 IEEE International Conference on Data Mining (ICDM '01)
provides a forum for the sharing of original research results and
practical development experiences among researchers and application
developers from different data mining related areas such as machine
learning, automated scientific discovery, statistics, pattern
recognition, knowledge acquisition, soft computing, databases and data
warehousing, data visualization, and knowledge-based systems. The
conference seeks solutions to challenging problems facing the
development of data mining systems, and shapes future directions of
research by promoting high quality, novel and daring research
findings. As an important part of the conference, the workshops
program will focus on new research challenges and initiatives.

With the support of both world-renowned experts and new researchers
from the international data mining community, ICDM '01 has received an
overwhelming response compared to any other data mining related
conference this year: 365 paper submissions, 8 workshop proposals, and
29 tutorial proposals.

* Invited Speakers: 

  - Jerome H. Friedman, Stanford University, USA
  - Jim Gray, Microsoft Research, USA
(The 1999 Turing Award Winner)
  - Pat Langley, Institute for the Study of Learning and Expertise, USA
  - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA
(President, IEEE Computer Society)

* ICDM '01 Tutorials (November 29, 2001):

  - Text and Data Mining for Bioinformatics, by Hinrich Schuetze
([EMAIL PROTECTED])
  - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED])

* ICDM '01 Workshops (November 29, 2001):

  - Text Mining (TextDM '2001)
(http://www-ai.ijs.si/DunjaMladenic/TextDM01/)
  - Integrating Data Mining and Knowledge Management 
(http://cui.unige.ch/~hilario/icdm-01/cfp.html)

* Paper Presentations (November 30 - December 2, 2001): Out of 365
  paper submissions, the IEEE ICDM '01 Program Committee accepted 72
  papers for regular presentation, and an additional 37 papers for
  poster presentation.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



IEEE Data Mining 2001: Call for Participation

2001-08-20 Thread Ning Zhong

[Apologies if you receive this more than once]


IEEE Data Mining 2001: Call for Participation
=

The 2001 IEEE International Conference on Data Mining
Doubletree Hotel, San Jose, California, USA
November 29 - December 2, 2001

On-line registration (and other information) at
http://www.cs.uvm.edu/~xwu/icdm-01.html

The 2001 IEEE International Conference on Data Mining (ICDM '01)
provides a forum for the sharing of original research results and
practical development experiences among researchers and application
developers from different data mining related areas such as machine
learning, automated scientific discovery, statistics, pattern
recognition, knowledge acquisition, soft computing, databases and data
warehousing, data visualization, and knowledge-based systems. The
conference seeks solutions to challenging problems facing the
development of data mining systems, and shapes future directions of
research by promoting high quality, novel and daring research
findings. As an important part of the conference, the workshops
program will focus on new research challenges and initiatives.

With the support of both world-renowned experts and new researchers
from the international data mining community, ICDM '01 has received an
overwhelming response compared to any other data mining related
conference this year: 365 paper submissions, 8 workshop proposals, and
29 tutorial proposals.

* Invited Speakers: 

  - Jerome H. Friedman, Stanford University, USA
  - Jim Gray, Microsoft Research, USA
(The 1999 Turing Award Winner)
  - Pat Langley, Institute for the Study of Learning and Expertise, USA
  - Benjamin W. Wah, University of Illinois, Urbana-Champaign, USA
(President, IEEE Computer Society)

* ICDM '01 Tutorials (November 29, 2001):

  - Text and Data Mining for Bioinformatics, by Hinrich Schuetze
([EMAIL PROTECTED])
  - Mining Time Series Data, by Eamonn Keogh ([EMAIL PROTECTED])

* ICDM '01 Workshops (November 29, 2001):

  - Text Mining (TextDM '2001)
(http://www-ai.ijs.si/DunjaMladenic/TextDM01/)
  - Integrating Data Mining and Knowledge Management 
(http://cui.unige.ch/~hilario/icdm-01/cfp.html)

* Paper Presentations (November 30 - December 2, 2001): Out of 365
  paper submissions, the IEEE ICDM '01 Program Committee accepted 72
  papers for regular presentation, and an additional 39 papers for
  poster presentation.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fundamental differences between Statistics and Data Mining?

2000-11-20 Thread P.G.Hamer

T.S. Lim wrote:

> I'm attempting to compile an online list of the fundamental differences
> between our field Statistics and Data Mining. Several online references
> that touch on the topic include
>
>http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps
>http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand
>http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila
>
> Let me know your point of view or opinion. Thanks much.
>

Can I add  the magnificent
Greater and Lesser Statistics: A Choice for Future Research
J. M. Chambers
from http://www.wavelet.org/who/jmc/pub.html

I find it almost unbearably sad.

It certainly suggests that, while the statistical community might have
the knowledge and skills to address `data mining' style problems, its
value-system makes it unwilling to do so -- or to value the work if
it is attempted.

There are af course many honourable exceptions, including Chambers
himself.

Peter



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fundamental differences between Statistics and Data Mining?

2000-11-19 Thread Gaj Vidmar

Here are two other sources that may be relevant:

"Putting Data Minig in its Place" by D. Pyle
(used to be at http://www.vldb.com/articles/Pyle/pyle.html; can't access it
at the moment)

"Data Mining from a Statistical Perspective" by J. Maindonald
(http://wwwmaths.anu.edu.au/~johnm/dm/dmpaper.html)

As a user of statistical and/or other DM methods at best, rather than
providing an amateur opinion, I can only thank you for the references you
have provided.

Gaj Vidmar
Univ. of Ljubljana, Dept. of Psychology





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fundamental differences between Statistics and Data Mining?

2000-11-18 Thread T.S. Lim

In article <8v087k$tm5$[EMAIL PROTECTED]>,
  T.S. Lim <[EMAIL PROTECTED]> wrote:
> I'm attempting to compile an online list of the fundamental
differences
> between our field Statistics and Data Mining. Several online
references
> that touch on the topic include
>
>http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps
>http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand
>http://www.acm.org/sigkdd/explorations/issue1-
2/contents.htm#mannila
>
> Let me know your point of view or opinion. Thanks much.


More references have been posted at

   http://www.recursive-partitioning.com/dcforum/DCForumID4/2.html

--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
_
Get paid to write reviews! http://recursive-partitioning.epinions.com


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fundamental differences between Statistics and Data Mining?

2000-11-17 Thread Robert Hamer

>> I'm attempting to compile an online list of the fundamental differences
>> between our field Statistics and Data Mining. Several online references
>> that touch on the topic include

It's very simple.  Data Mining is everything they
taught you _not_ do do when you took statistics.

-- 
--(Signature)  Robert M. Hamer 732 235 4218
  Use my last name @rci.rutgers.edu
  "Mit der Dummheit kaempfen Goetter selbst vergebens" -- Schiller


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Fundamental differences between Statistics and Data Mining?

2000-11-17 Thread Francois Bergeret

This is a multi-part message in MIME format.
--7016062B1E244333164619B2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

my opinion is that datamining is just a marketing name, because datamining
techniques are a part of statistics. May be an exception to this is neural
networks, but I believe that good neural networks use also statistics.

Francois.

"T.S. Lim" wrote:

> I'm attempting to compile an online list of the fundamental differences
> between our field Statistics and Data Mining. Several online references
> that touch on the topic include
>
>http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps
>http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand
>http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila
>
> Let me know your point of view or opinion. Thanks much.
>
> --
> T.S. Lim
> [EMAIL PROTECTED]
> www.Recursive-Partitioning.com
> _
> Get paid to write reviews! http://recursive-partitioning.epinions.com
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

--7016062B1E244333164619B2
Content-Type: text/x-vcard; charset=us-ascii;
 name="francois.bergeret.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Francois Bergeret
Content-Disposition: attachment;
 filename="francois.bergeret.vcf"

begin:vcard 
n:Bergeret;Francois
tel;work:33-561191205
x-mozilla-html:FALSE
org:Motorola;Device Engineering, MOS20
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Statistician and Six Sigma Black Belt
x-mozilla-cpt:;-28000
fn:Bergeret, Francois
end:vcard

--7016062B1E244333164619B2--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Fundamental differences between Statistics and Data Mining?

2000-11-16 Thread T.S. Lim

I'm attempting to compile an online list of the fundamental differences
between our field Statistics and Data Mining. Several online references
that touch on the topic include

   http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps
   http://www.acm.org/sigkdd/explorations/issue1-1/contents.htm#Hand
   http://www.acm.org/sigkdd/explorations/issue1-2/contents.htm#mannila

Let me know your point of view or opinion. Thanks much.

--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
_
Get paid to write reviews! http://recursive-partitioning.epinions.com


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Data Mining

2000-11-08 Thread T.S. Lim

In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] (Kuldeep Kumar) wrote:
> Colleagues
> I am looking for some data base dealing with patient records for any
> disease preferably diabetes or cancer. This is basically for exercise
in
> statistical modelling to see which factors are significant and to
classify
> whether the patient has the disease or not. Any other data base where
> logistic model can be applied will be also useful. I am sure this
kind of
> data will be available somewhere on the web. Any help will be
appreciated.
> Thanks.
> Deep


Visit the "Data Sets" section of

   http://www.kdcentral.com

--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
_
Get paid to write reviews! http://recursive-partitioning.epinions.com


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Data Mining

2000-11-07 Thread Kuldeep Kumar

Colleagues
I am looking for some data base dealing with patient records for any 
disease preferably diabetes or cancer. This is basically for exercise in 
statistical modelling to see which factors are significant and to classify 
whether the patient has the disease or not. Any other data base where 
logistic model can be applied will be also useful. I am sure this kind of 
data will be available somewhere on the web. Any help will be appreciated.
Thanks.
Deep



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: data mining

2000-09-25 Thread T.S. Lim

In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] (Richard M. Barton) wrote:
> I have been trying to search for reviews of data mining software
(e.g.,
> MineSet, Clementine) with little success.  In the past, some of you
have had
> recommendations/advice about stat packages; I wonder if you might
share your
> views on data mining:  Specifically,
>
> 1)  Any feelings (+ or -) on data mining in general?
>
> 2)  Any views (+ or -) on available software?
>
> 3) Any suggestions on where else I might look for info?
>
> Thanks for your help.
>
> rick
>
> Richard Barton, Statistical Consultant
>
> Dartmouth College
>
> Peter Kiewit Computing Services
>
> 6224 Baker/Berry
>
> Hanover, NH 03755
>
> (603)-646-0255


Visit

   http://www.kdcentral.com

and browse the Tutorials section. BTW, Data Mining is Statistics reborn
with a new name. :)

--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
_
Get paid to write reviews! http://recursive-partitioning.epinions.com


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: data mining

2000-09-25 Thread dennis roberts

a few links i spotted ... hope these help ... could have listed more but, 
here is enough to shake a stick at!!

http://www.spss.com/datamine/
http://www.spss.com/datamine/techniques.htm

http://www.dci.com/events/datamin1/
http://www.dci.com/events/datamin2/

http://www.cs.bham.ac.uk/~anp/TheDataMine.html

http://www.galaxy.gmu.edu/stats/syllabi/DMLIST.html

http://www3.shore.net/~kht/

http://www.almaden.ibm.com/cs/quest/

http://www.dmbenchmarking.com/

http://datamining.itsc.uah.edu/

http://www.ncdm.uic.edu/



At 02:39 PM 9/25/00 -0400, you wrote:
>I have been trying to search for reviews of data mining software (e.g., 
>MineSet, Clementine) with little success.  In the past, some of you have 
>had recommendations/advice about stat packages; I wonder if you might 
>share your views on data mining:  Specifically,
>
>1)  Any feelings (+ or -) on data mining in general?
>2)  Any views (+ or -) on available software?
>3) Any suggestions on where else I might look for info?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



data mining

2000-09-25 Thread Richard M. Barton
I have been trying to search for reviews of data mining software (e.g., MineSet, Clementine) with little success.  In the past, some of you have had recommendations/advice about stat packages; I wonder if you might share your views on data mining:  Specifically,

1)  Any feelings (+ or -) on data mining in general?
2)  Any views (+ or -) on available software?
3) Any suggestions on where else I might look for info?

Thanks for your help.
rick



Richard Barton, Statistical Consultant
Dartmouth College
Peter Kiewit Computing Services
6224 Baker/Berry
Hanover, NH 03755

(603)-646-0255

= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = 

IEEE Data Mining 2001: Call for Papers

2000-07-30 Thread Ning Zhong

[Apologies if you receive this more than once]

--
   ICDM '01: The 2001 IEEE International Conference on Data Mining
   Sponsored by the IEEE Computer Society
--
   Silicon Valley, California, USA
November 29 - December 2, 2001
   Home Page: http://kais.mines.edu/~xwu/icdm/icdm-01.html

Call for Papers
***

The  2001  IEEE International Conference  on  Data  Mining  (ICDM '01)
provides a forum  for  the sharing  of  original research results  and
practical development experiences  among  researchers  and application
developers  from different data mining related areas  such as  machine
learning,   automated   scientific   discovery,  statistics,   pattern
recognition, knowledge acquisition, soft computing, databases and data
warehousing,  data visualization,  and  knowledge-based  systems.  The
conference   seeks  solutions  to  challenging   problems  facing  the
development of data mining systems,  and  shapes  future directions of
research   by  promoting  high  quality,  novel  and  daring  research
findings.  As  an important part  of  the  conference,  the  workshops
program will focus on new research challenges and initiatives.

Topics of Interest
==

Topics  related to  the design,  analysis  and  implementation of data
mining  theory,  systems  and  applications  are  of  interest.  These
include, but are not limited to the following areas:

  - Foundations and principles of data mining 
  - Data mining algorithms and methods in traditional areas (such as
classification, clustering, probabilistic modeling, and
association analysis), and in new areas
  - Data and knowledge representation for data mining 
  - Modeling of structured, textual, temporal, spatial, multimedia and
Web data to support data mining
  - Complexity, efficiency, and scalability issues in data mining
  - Data pre-processing, data reduction, feature selection and feature
transformation
  - Statistics and probability in large-scale data mining
  - Soft computing (including neural networks, fuzzy logic,
evolutionary computation, and rough sets) and uncertainty
management for data mining
  - Integration of data warehousing, OLAP and data mining 
  - Man-machine interaction in data mining and visual data mining 
  - Artificial intelligence contributions to data mining 
  - High performance and distributed data mining 
  - Machine learning, pattern recognition and automated scientific
discovery
  - Quality assessment and interestingness metrics of data mining
results
  - Process centric data mining and models of data mining process 
  - Security and social impact of data mining 
  - Emerging data mining applications, such as electronic commerce,
Web mining and intelligent learning database systems

Conference Publications and ICDM Best Paper Awards
==

High quality papers  in all data mining areas  are  solicited.  Papers
exploring  new  directions  will  receive  a  careful  and  supportive
review.  All submitted papers should be limited to a maximum of  6,000
words (approximately 20 A4 pages),  and  will be reviewed on the basis
of   technical  quality,  relevance  to  data   mining,   originality,
significance,  and clarity.  Accepted papers  will be published in the
conference proceedings by the IEEE Computer Society Press.  A selected
number of ICDM '01 accepted papers  will be  expanded and revised  for
possible  inclusion  in  the Knowledge and Information Systems journal
(http://kais.mines.edu/~kais/) by Springer-Verlag.

ICDM Best Paper Awards  will be conferred  on the authors  of the best
papers at the conference.

Important Dates
===

 June 15, 2001Paper submissions. 
 July 31, 2001Acceptance notices.
 August 31, 2001  Final camera-readies.
 Nov 29 - Dec 2, 2001 Conference.

Detailed instructions  for paper submissions  will be provided  on the
conference home page at http://kais.mines.edu/~xwu/icdm/icdm-01.html.

Conference Chair:
=

  Xindong Wu, Colorado School of Mines, USA
 ([EMAIL PROTECTED])
 
Program Committee Chairs:
=

  Nick Cercone,  University of Waterloo, Canada
 ([EMAIL PROTECTED])
  T.Y. Lin, San Jose State University, USA
 ([EMAIL PROTECTED])
 
ICDM '01 Workshops Chair:
=

  Johannes Gehrke, Cornell University, USA
 ([EMAIL PROTECTED])

ICDM '01 Tutorials Chair:
=

  Chris Clifton, MITRE, USA
 ([EMAIL PROTECTED])

ICDM '01 Panels Chair:
==

  Ramamohanarao Kot

Re: Data Mining blooper

2000-05-23 Thread Bill Watkins

I believe the author that Ellen quoted was referring to MCS and, if so, I agree
with that author.  IMHO, there will eventually be MCS software that will allow
high school students to run circles around what today's PhDs do with closed form
solutions.

William Chambers wrote:

> Ellen,
>
> It amazes me to read the self-righteous judgements of people on this
> thread.. a number of whom have made incompetent criticisms of corresponding
> correlations with the same arrogance and stupidity that they attribute to
> the data mining boys,  When the purpose becomes making money and not
> pursuing truth, then we see such arrogance,  lies and eventual evil. Real
> people get hurt.  Truth itself begins to appear to be an illusion, What ever
> sells appears to be the good, When people are not willing to learn and to
> discuss; to test and to discover, then we get the sort of ridiculous
> examples of the pot calling the kettle black making up this thread. I like
> your call for discernment.  Good luck.
>
> Bill Chambers
>
> Ellen Hertz wrote in message <[EMAIL PROTECTED]>...
> >I looked up one and copied it:
> >
> >  "For the first time, thanks to the increased power of computers, new
> >methods replace the skill of the statistical artisan with
> massive-computational
> >methods, obtaining equal or better results in far less time without
> requiring
> >any specialised knowledge."
> >
> >In all fairness, I haven't read the whole paper and if he is referring
> purely to
> >computations such as generating maximum likelihood estimates or inverting
> >matrices, he is quite right that computers beat pencils. If he means to
> just run
> >programs without knowing what they mean and generate GIGO, that  certainly
> is
> >dangerous.
> >
> >Ellen Hertz
> >
> >Zubin wrote:
> >
> >> Can you be more specific on what the misleading statements are?  And why
> you
> >> think they are misleading.
> >>
> >> T.S. Lim  wrote in message
> >> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> >> > While hunting for URLs for KDCentral.com, I encountered several
> >> > misleading statements about Statistics made by Data Mining people.
> >> > I've posted 3 of them to my bulletin board. If you encounter other
> >> > wrong remarks, I invite you to post them to the board too at
> >> >
> >> >http://www.recursive-partitioning.com/forums
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > T.S. Lim
> >> > [EMAIL PROTECTED]
> >> > www.Recursive-Partitioning.com
> >> >
> >> >
> >> >
> >> > 
> >> > Get paid to write review! http://recursive-partitioning.epinions.com
> >



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Megaputer ships PolyAnalyst 4.1 - the first data mining tool supporting OLE DB for Data Mining

2000-05-15 Thread Sergei Ananyan

 Bloomington, IN – May 2, 2000. -- Megaputer Intelligence Inc. today
announced the release of  PolyAnalyst 4.1, a next version of the leading
data mining system featuring the support for an innovative SQL-based
protocol, OLE DB for Data Mining. The new system also implements a powerful
Data Import Wizard and a special architecture for dealing with very large
databases. A non-traditional Decision Tree algorithm, a limited-time free
addition to PolyAnalyst 4.1 scheduled for release on  May 20-th, further
enriches the suite of eleven unique machine learning techniques offered by
the system.

 PolyAnalyst 4.1 and PolyAnalyst Knowledge Server 4.1 from Megaputer are
the first commercial data mining applications shipping with a built-in
support for all major functions of  OLE DB for Data Mining. This new
standard simplifies communication and provides deep integration of data
mining applications with data storage and management tools. OLE DB for Data
Mining was  introduced recently by Microsoft Corporation and is backed by a
dozen leading data mining vendors. The beta specification for OLE DB for
Data Mining is currently available at http://www.microsoft.com/data/oledb/
and will be open for public review until May 15, 2000.

  "The implementation for OLE DB for Data Mining support in PolyAnalyst
4.1 is the first commercial illustration of the deep integration of RDBMS
and Data Mining applications provided by this new protocol", says Steve
Murchie, SQL Server group product manager, Microsoft Corporation. "Megaputer
Intelligence delivered this exciting new version of PolyAnalyst in record
time."

  The support for OLE DB for Data Mining built in the new data mining
solution from Megaputer represents a crucial step toward making data mining
functionality available to any business analyst or developer, without the
need for specialized knowledge of a particular data mining tool. An
integration of this functionality with the broadest set of machine learning
algorithms offered by PolyAnalyst 4.1 allows the users to readily address
any data mining task. An evaluation copy of PolyAnalyst 4.1 accompanied by a
tutorial illustrating the use of OLE DB for Data Mining is available for
downloading at http://www.megaputer.com.

  Founded in 1993, Megaputer is a leader in software for Business
Intelligence. The company provides a complete family of innovative solutions
that help customers make better business decisions. Megaputer offers
best-of-breed tools for data mining, semantic text analysis, and information
management.

  #

Megaputer and PolyAnalyst are registered trademarks of Megaputer
Intelligence Inc. in the United States and/or other countries. The names of
other companies and products mentioned herein may be the trademarks of their
respective owners.

For more information:

  Sergei Ananyan, Megaputer, (812) 330-0110, [EMAIL PROTECTED]
Note to editors: If you are interested in viewing additional information on
Megaputer, please visit the Megaputer Web page at http://www.megaputer.com/.

#
If you do not wish to receive future press releases from Megaputer, please
reply to this message with "remove" in the subject line. Our database will
be updated correspondingly. Thank you.
#



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-05-13 Thread Will Dwinnell

William Chambers wrote:
"It amazes me to read the self-righteous judgements of people on this thread.. a
number of whom have made incompetent criticisms of corresponding correlations
with the same arrogance and stupidity that they attribute to the data mining
boys, ..."

As one of the 'data mining boys', I would assert that the data mining field is
littered with exactly the kind of mistaken thinking which has been described in
this thread.  I don't know that data miners (as a group) are a whole lot worse
than statisticians (as a group), but there are certainly problems.  I will leave
questions of self-righteousness to others, but I would point out that whether or
not the critics are incompetent, etc. has nothing to do with whether their
claims of data mining incompetence are true.

Will Dwinnell
[EMAIL PROTECTED]



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-05-04 Thread Konrad Freeman

"Data Mining" is a loosely and vaguely defined term that refers to things that
people do to understand and explore data.  It means different things when used
by different people.  It may mean one of the following:

1. Classical data analysis/statistical modeling such as linear regression.
2. AI stuff (neural networks, fuzzy logic, MARS, nearest neighbor, etc)
3. Database architecture and data warehousing.  More like computer science
instead of statistics. One good example is the the "Data mining Review"
magazine.  Check out their web site and you will realize that it has little to
do with statistics:  http://www.dmreview.com/

K. Freeman

Debasmit Mohanty wrote:

> I think, now is the time when we have to decide "Do we accept DATA MINING as a
> part of statistics or do we keep neglecting this field as before".
>
> I am sure there would be few statistics students like me who feel that Data
> Mining is very much the part of statistics.
>
> Thanks
> Debasmit
>



   The contents of this message express only the sender's opinion.
   This message does not necessarily reflect the policy or views of
   my employer, Merck & Co., Inc.  All responsibility for the statements
   made in this Usenet posting resides solely and completely with the
   sender.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-05-01 Thread Richard M. Barton

An example of specialized knowledge:

Last Friday, a colleague showed me how he was using a data mining program to cluster 
over 1000 genes using 5 variables.   After clustering, he used the program to generate 
a pretty, spinnable 3-D plot of his data on 3 of the original variables.  It had 
color-coded clusters; and  one could also click on a plotted point and its id # and 
variable values would pop up. 

Some problems:
  
1)  Four of the variables were measured on a scale of 0-2, the 5th was on a scale of 
0-107.  He had no idea that with the distance measure he was using (Euclidean) that 
his clustering could be dominated by that 5th variable.

2) He chose a final cluster solution of two clusters simply because the program 
suggested that was the best solution (not indicating why).   But he was using k-means 
clustering, and was setting his initial estimate of number of clusters to 2.

3) He clearly had some outliers in his data set that were being masked.

4) He didn't realize that a different choice of 3 variables for plotting could result 
in a very different picture of his data.

5) He had chosen his plotting symbol to be large enough that, when points had similar 
coordinates, some points were hidden.


I pointed out some of these issues, we played with the data and the output, and it was 
a learning experience for both of us: he gained some knowledge of stats and I got to 
see some of the advantages/disadvantages of a data mining program.  I suppose the 
programs can be useful tools in the right hands; this comes from someone who, as a 
kid, didn't know that a hatchet was not the preferred tool for chopping ice off a roof.

rick



--- "Donald F. Burrill" wrote:
Thanks, Ellen.  Evocative quote, isn't it?  It's that "without requiring 
*any* (!) specialized knowledge" that will be the dangerous part, if read 
too literally by the naive.  
Interesting that you could get to Lim's URL at all.  When _I _ 
tried it, several days ago, the system seemed to be trying to tell me that 
the  /forums  part of the URL wasn't accessible.  But perhaps the problem 
was only temporary.
-- Don.

On Sun, 30 Apr 2000, Ellen Hertz wrote:

> I looked up one and copied it:
> 
>   "For the first time, thanks to the increased power of computers, 
> new methods replace the skill of the statistical artisan with  
> massive-computational methods, obtaining equal or better results in far 
> less time without requiring any specialised knowledge."
> 
> In all fairness, I haven't read the whole paper and if he is referring 
> purely to computations such as generating maximum likelihood estimates 
> or inverting matrices, he is quite right that computers beat pencils.  
> If he means to just run programs without knowing what they mean 

... "untouched by the human mind", as Heidi Kass used to put it ...

> and generate GIGO, that certainly is dangerous.
Ayuh.  -- DFB.
> Ellen Hertz

 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===
--- end of quote ---


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-04-30 Thread William Chambers

Ellen,

It amazes me to read the self-righteous judgements of people on this
thread.. a number of whom have made incompetent criticisms of corresponding
correlations with the same arrogance and stupidity that they attribute to
the data mining boys,  When the purpose becomes making money and not
pursuing truth, then we see such arrogance,  lies and eventual evil. Real
people get hurt.  Truth itself begins to appear to be an illusion, What ever
sells appears to be the good, When people are not willing to learn and to
discuss; to test and to discover, then we get the sort of ridiculous
examples of the pot calling the kettle black making up this thread. I like
your call for discernment.  Good luck.

Bill Chambers



Ellen Hertz wrote in message <[EMAIL PROTECTED]>...
>I looked up one and copied it:
>
>  "For the first time, thanks to the increased power of computers, new
>methods replace the skill of the statistical artisan with
massive-computational
>methods, obtaining equal or better results in far less time without
requiring
>any specialised knowledge."
>
>In all fairness, I haven't read the whole paper and if he is referring
purely to
>computations such as generating maximum likelihood estimates or inverting
>matrices, he is quite right that computers beat pencils. If he means to
just run
>programs without knowing what they mean and generate GIGO, that  certainly
is
>dangerous.
>
>Ellen Hertz
>
>Zubin wrote:
>
>> Can you be more specific on what the misleading statements are?  And why
you
>> think they are misleading.
>>
>> T.S. Lim  wrote in message
>> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>> > While hunting for URLs for KDCentral.com, I encountered several
>> > misleading statements about Statistics made by Data Mining people.
>> > I've posted 3 of them to my bulletin board. If you encounter other
>> > wrong remarks, I invite you to post them to the board too at
>> >
>> >http://www.recursive-partitioning.com/forums
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> > --
>> > T.S. Lim
>> > [EMAIL PROTECTED]
>> > www.Recursive-Partitioning.com
>> >
>> >
>> >
>> > 
>> > Get paid to write review! http://recursive-partitioning.epinions.com
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-04-30 Thread Donald F. Burrill

Thanks, Ellen.  Evocative quote, isn't it?  It's that "without requiring 
*any* (!) specialized knowledge" that will be the dangerous part, if read 
too literally by the naive.  
Interesting that you could get to Lim's URL at all.  When _I _ 
tried it, several days ago, the system seemed to be trying to tell me that 
the  /forums  part of the URL wasn't accessible.  But perhaps the problem 
was only temporary.
-- Don.

On Sun, 30 Apr 2000, Ellen Hertz wrote:

> I looked up one and copied it:
> 
>   "For the first time, thanks to the increased power of computers, 
> new methods replace the skill of the statistical artisan with  
> massive-computational methods, obtaining equal or better results in far 
> less time without requiring any specialised knowledge."
> 
> In all fairness, I haven't read the whole paper and if he is referring 
> purely to computations such as generating maximum likelihood estimates 
> or inverting matrices, he is quite right that computers beat pencils.  
> If he means to just run programs without knowing what they mean 

... "untouched by the human mind", as Heidi Kass used to put it ...

> and generate GIGO, that certainly is dangerous.
Ayuh.  -- DFB.
> Ellen Hertz

 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-04-29 Thread Ellen Hertz

I looked up one and copied it:

  "For the first time, thanks to the increased power of computers, new
methods replace the skill of the statistical artisan with  massive-computational
methods, obtaining equal or better results in far less time without requiring
any specialised knowledge."

In all fairness, I haven't read the whole paper and if he is referring purely to
computations such as generating maximum likelihood estimates or inverting
matrices, he is quite right that computers beat pencils. If he means to just run
programs without knowing what they mean and generate GIGO, that  certainly is
dangerous.

Ellen Hertz

Zubin wrote:

> Can you be more specific on what the misleading statements are?  And why you
> think they are misleading.
>
> T.S. Lim  wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > While hunting for URLs for KDCentral.com, I encountered several
> > misleading statements about Statistics made by Data Mining people.
> > I've posted 3 of them to my bulletin board. If you encounter other
> > wrong remarks, I invite you to post them to the board too at
> >
> >http://www.recursive-partitioning.com/forums
> >
> > Thanks.
> >
> >
> >
> >
> > --
> > T.S. Lim
> > [EMAIL PROTECTED]
> > www.Recursive-Partitioning.com
> >
> >
> >
> > 
> > Get paid to write review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-29 Thread Bob Hayden

- Forwarded message from Frank E Harrell Jr -

I'd like to make a somewhat related point.  There are many educational
tools that I've found have a great effect on non-statisticians.  One if these
is to take one of their datasets, randomly permute the column of Y-values,
go through their data mining procedure, and see what it finds.  The more
that it finds, the more the client becomes properly afraid of the technique
and respectful of the statistician's careful approach.  -Frank Harrell

- End of forwarded message from Frank E Harrell Jr -

That's a nice example, though I would not have the confidence that
they would not see it as a wonderful way to discover even more
"relationships"!-)  You could also try sorting the X and Y columns
independently to boost R^2.  Some of my students just supplied another
example.  I was ill and emailed class cancellation well in advance.
Some of these folks seem to practice procrastination as a religion,
and put off the experimental design assignment due the day I was out
and did it with the time series assignment due at the next class.  As
a result, some of them introduced a "trend" variable into the
experimental design data, and got a "significant" p-value.  I have not
had the opportunity yet to ask them what "trend" measures in this
context, or what value I should plug in for it if I want to make a
prediction of sales when commission is 5% in Division C.

I used to have a sheet with four residual plots per side that I asked
students to interpret on exams.  One plot was a big smiley face.  The
answer I hoped for was, "There seems to be a pattern here, but I don't
think any of the techniques we have studied would be appropriate to
deal with it."  I wonder what the data mining software would do with
it?  Smile back maybe?
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-29 Thread Frank E Harrell Jr

I'd like to make a somewhat related point.  There are many educational
tools that I've found have a great effect on non-statisticians.  One if these
is to take one of their datasets, randomly permute the column of Y-values,
go through their data mining procedure, and see what it finds.  The more
that it finds, the more the client becomes properly afraid of the technique
and respectful of the statistician's careful approach.  -Frank Harrell

"Silvert, Henry" wrote:

> I respectfully disagree with Michael Wyatt. I come from an academic
> background and now work outside of academia, except for the occassional
> course here or there. I too report to a manager or managers, depending on
> the circumstances. But my experiences have not been the same as his. I am
> constantly urged to use all my skills as a statistician and a research
> methodologist by "my managers." (Horrid!!!)
>
> Henry M. Silvert PHD
> Research Statistician
> The Conference Board
> 845 3rd. Avenue
> New York, NY 10022
> Tel. No.: (212) 339-0438
> Fax No.: (212) 836-3825
>
> > -Original Message-
> > From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
> > Sent: Friday, April 28, 2000 7:52 AM
> > To:   [EMAIL PROTECTED]
> > Subject:  Re: Data Mining blooper and Related Subjects
> >
> > ...And it extends even further. Many of us who toil in areas outside of
> > academia have our work and productivity "supervised" by managers or
> > directors who have little or no training in statistics, beyond a survey
> > course. They receive the flashy brochures and read the ads that promise
> > analytical software that will provide significant information, without
> > the bother of of formulating one of those fancy-shmancy hypotheses.
> >
> > The higher-ups come to view data mining, decision support, outcomes
> > analysis, & etc. as requiring no more skill than the ability to use a PC.
> >  I call it "The Myth of the Statistical Meat Grinder".  The push of a
> > button or two will generate the answer to all corporate questions, plus a
> > few neat-o graphs for the board of directors packets.
> >
> > Michael T. Wyatt, Ph.D.
> > (Embittered) Healthcare Analyst
> > Quality Improvement Dept.
> > DCH Regional Medical Center
> > Tuscaloosa, AL
> >
> >
> >
> > On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]>
> > writes:
> > > At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
> > >
> > >
> > > >It does not surprise me one bit.  The typical statistics
> > > >course teaches statistical methods and pronouncements, with
> > > >no attempt to achieve understanding.   snip of more
> > >
> > > this is something i happen to agree with herman about ... but, it is
> > > a much
> > > broader problem than can be attributed to what happens in one course
> > >
> > > it is an attitude about what higher education is all about ... and
> > > what the
> > > goals are for it
> > >
> > > 'going to college' ... be it undergraduate level or graduate level
> > > ... has
> > > become a much more hit and miss experience, residence has little
> > > meaning
> > > ... that is being tailored more and more to the convenience of
> > > students ...
> > > and to what is 'user' friendly (or it won't SELL). studying
> > > principles in
> > > disciplines is hard work ... NOT user friendly ... so, less and less
> > > is
> > > being required in the way of diligent study.
> > >
> > > take graduate school for example ... there was a time, was there not
> > > ...
> > > where doctoral students were REALLY expected to be responsible for
> > > their
> > > dissertations AND were expected to be the experts in that particular
> > > area
> > > of inquiry ... AND to be competent enough to have done the work
> > > him/herself
> > > ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
> > >
> > > but, what i have noticed over many years is that dissertations are
> > > becoming
> > > more of a committee effort ... yes, the student MAY have had the
> > > idea
> > > (though not necessarily) but, from there ... he/she gets help with
> > > the
> > > design ... has someone else do the analysis (because he/she did not
> > > take
> > > any/sufficient work in analytic methods to understand what is going
> > > on) ...
> > > gets help in writing and editin

Re: Data Mining blooper and Related Subjects (fwd)

2000-04-28 Thread Bob Hayden

- Forwarded message from Debasmit Mohanty -

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

- End of forwarded message from Debasmit Mohanty -

It may be a disagreement over words.  Much of the work Tukey et
al. did in the 60s, called exploratory data analysis, had to do with
looking at data and trying to detect patterns.  However, if you sift
through data you will find many "patterns" that are just flukes of
chance.  How do you avoid taking these seriously?  This was a
criticism directed at Tukey then, and even more so at what goes on
today under the name of "Data Mining".  But I have a sense that Tukey
had a much deeper awareness of the underlying statitical issues than
most of the miners have!-)
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-28 Thread Debasmit Mohanty

I have been following the discussion on Data Mining blooper for a while. 
Being a first year graduate student in statistics, my comments on this issue 
might sound premature. Nevertheless, I would put forward my observations.

What I have learnt so far from my interaction with the statisticians in the 
academics as well as in the industry is the following:

1) Many of the statisticians still feel that "Data Mining" as a discipline 
should be left for the people in computer science.
Of course, I don't agree to this statement at all. If you read the paper 
"Data Mining and Statistics" by Dr. J. Friedman, you would realize how 
statisticians have neglected this emerging field over last few years.

2) There are few statistics graduate programs which emphasize on "Data 
Mining" research. Of course, there are few ones like Carnegie Mellon.
But overall, we are yet to give the much needed attention it needs.

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

Thanks
Debasmit

--
Debasmit Mohanty
Graduate Student - Statistics
http://bama.ua.edu/~mohan001/
--


Date: Wed, 26 Apr 2000 11:38:28 -0400
From: dennis roberts <[EMAIL PROTECTED]>
Subject:

At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:


>It does not surprise me one bit.  The typical statistics
>course teaches statistical methods and pronouncements, with
>no attempt to achieve understanding.   snip of more

this is something i happen to agree with herman about ... but, it is a much
broader problem than can be attributed to what happens in one course

it is an attitude about what higher education is all about ... and what the
goals are for it

'going to college' ... be it undergraduate level or graduate level ... has
become a much more hit and miss experience, residence has little meaning
... that is being tailored more and more to the convenience of students ...
and to what is 'user' friendly (or it won't SELL). studying principles in
disciplines is hard work ... NOT user friendly ... so, less and less is
being required in the way of diligent study.

take graduate school for example ... there was a time, was there not ...
where doctoral students were REALLY expected to be responsible for their
dissertations AND were expected to be the experts in that particular area
of inquiry ... AND to be competent enough to have done the work him/herself
... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT

but, what i have noticed over many years is that dissertations are becoming
more of a committee effort ... yes, the student MAY have had the idea
(though not necessarily) but, from there ... he/she gets help with the
design ... has someone else do the analysis (because he/she did not take
any/sufficient work in analytic methods to understand what is going on) ...
gets help in writing and editing .. and, even gets help in terms of what
their results MEAN ...

gives new meaning to the term: "cooperative learning"

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Data Mining blooper and Related Subjects

2000-04-28 Thread Silvert, Henry

I respectfully disagree with Michael Wyatt. I come from an academic
background and now work outside of academia, except for the occassional
course here or there. I too report to a manager or managers, depending on
the circumstances. But my experiences have not been the same as his. I am
constantly urged to use all my skills as a statistician and a research
methodologist by "my managers." (Horrid!!!) 

Henry M. Silvert PHD
Research Statistician
The Conference Board
845 3rd. Avenue
New York, NY 10022
Tel. No.: (212) 339-0438
Fax No.: (212) 836-3825

> -Original Message-
> From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, April 28, 2000 7:52 AM
> To:   [EMAIL PROTECTED]
> Subject:  Re: Data Mining blooper and Related Subjects
> 
> ...And it extends even further. Many of us who toil in areas outside of
> academia have our work and productivity "supervised" by managers or
> directors who have little or no training in statistics, beyond a survey
> course. They receive the flashy brochures and read the ads that promise
> analytical software that will provide significant information, without
> the bother of of formulating one of those fancy-shmancy hypotheses.
> 
> The higher-ups come to view data mining, decision support, outcomes
> analysis, & etc. as requiring no more skill than the ability to use a PC.
>  I call it "The Myth of the Statistical Meat Grinder".  The push of a
> button or two will generate the answer to all corporate questions, plus a
> few neat-o graphs for the board of directors packets.
> 
> Michael T. Wyatt, Ph.D.
> (Embittered) Healthcare Analyst
> Quality Improvement Dept.
> DCH Regional Medical Center
> Tuscaloosa, AL
> 
> 
> 
> On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]>
> writes:
> > At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
> > 
> > 
> > >It does not surprise me one bit.  The typical statistics
> > >course teaches statistical methods and pronouncements, with
> > >no attempt to achieve understanding.   snip of more
> > 
> > this is something i happen to agree with herman about ... but, it is 
> > a much 
> > broader problem than can be attributed to what happens in one course
> > 
> > it is an attitude about what higher education is all about ... and 
> > what the 
> > goals are for it
> > 
> > 'going to college' ... be it undergraduate level or graduate level 
> > ... has 
> > become a much more hit and miss experience, residence has little 
> > meaning 
> > ... that is being tailored more and more to the convenience of 
> > students ... 
> > and to what is 'user' friendly (or it won't SELL). studying 
> > principles in 
> > disciplines is hard work ... NOT user friendly ... so, less and less 
> > is 
> > being required in the way of diligent study.
> > 
> > take graduate school for example ... there was a time, was there not 
> > ... 
> > where doctoral students were REALLY expected to be responsible for 
> > their 
> > dissertations AND were expected to be the experts in that particular 
> > area 
> > of inquiry ... AND to be competent enough to have done the work 
> > him/herself 
> > ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
> > 
> > but, what i have noticed over many years is that dissertations are 
> > becoming 
> > more of a committee effort ... yes, the student MAY have had the 
> > idea 
> > (though not necessarily) but, from there ... he/she gets help with 
> > the 
> > design ... has someone else do the analysis (because he/she did not 
> > take 
> > any/sufficient work in analytic methods to understand what is going 
> > on) ... 
> > gets help in writing and editing .. and, even gets help in terms of 
> > what 
> > their results MEAN ...
> > 
> > gives new meaning to the term: "cooperative learning"
> > 
> > 
> > 
> > 
> > 
> >
> =
> ==
> > This list is open to everyone.  Occasionally, less thoughtful
> > people send inappropriate messages.  Please DO NOT COMPLAIN TO
> > THE POSTMASTER about these messages because the postmaster has no
> > way of controlling them, and excessive complaints will result in
> > termination of the list.
> > 
> > For information about this list, including information about the
> > problem of inappropriate messages and information about how to
> > unsubscribe, please see the web page at
> > http://jse.stat.ncsu.edu/
> &g

Re: Data Mining blooper and Related Subjects

2000-04-28 Thread mtwyatt

...And it extends even further. Many of us who toil in areas outside of
academia have our work and productivity "supervised" by managers or
directors who have little or no training in statistics, beyond a survey
course. They receive the flashy brochures and read the ads that promise
analytical software that will provide significant information, without
the bother of of formulating one of those fancy-shmancy hypotheses.

The higher-ups come to view data mining, decision support, outcomes
analysis, & etc. as requiring no more skill than the ability to use a PC.
 I call it "The Myth of the Statistical Meat Grinder".  The push of a
button or two will generate the answer to all corporate questions, plus a
few neat-o graphs for the board of directors packets.

Michael T. Wyatt, Ph.D.
(Embittered) Healthcare Analyst
Quality Improvement Dept.
DCH Regional Medical Center
Tuscaloosa, AL



On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts <[EMAIL PROTECTED]>
writes:
> At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
> 
> 
> >It does not surprise me one bit.  The typical statistics
> >course teaches statistical methods and pronouncements, with
> >no attempt to achieve understanding.   snip of more
> 
> this is something i happen to agree with herman about ... but, it is 
> a much 
> broader problem than can be attributed to what happens in one course
> 
> it is an attitude about what higher education is all about ... and 
> what the 
> goals are for it
> 
> 'going to college' ... be it undergraduate level or graduate level 
> ... has 
> become a much more hit and miss experience, residence has little 
> meaning 
> ... that is being tailored more and more to the convenience of 
> students ... 
> and to what is 'user' friendly (or it won't SELL). studying 
> principles in 
> disciplines is hard work ... NOT user friendly ... so, less and less 
> is 
> being required in the way of diligent study.
> 
> take graduate school for example ... there was a time, was there not 
> ... 
> where doctoral students were REALLY expected to be responsible for 
> their 
> dissertations AND were expected to be the experts in that particular 
> area 
> of inquiry ... AND to be competent enough to have done the work 
> him/herself 
> ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
> 
> but, what i have noticed over many years is that dissertations are 
> becoming 
> more of a committee effort ... yes, the student MAY have had the 
> idea 
> (though not necessarily) but, from there ... he/she gets help with 
> the 
> design ... has someone else do the analysis (because he/she did not 
> take 
> any/sufficient work in analytic methods to understand what is going 
> on) ... 
> gets help in writing and editing .. and, even gets help in terms of 
> what 
> their results MEAN ...
> 
> gives new meaning to the term: "cooperative learning"
> 
> 
> 
> 
> 
>
=
==
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
> 
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
=
==


YOU'RE PAYING TOO MUCH FOR THE INTERNET!
Juno now offers FREE Internet Access!
Try it today - there's no risk!  For your FREE software, visit:
http://dl.www.juno.com/get/tagj.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-04-27 Thread Frank E Harrell Jr

They are amazingly misleading.  A basic stat book will explain why.  One example

is the statement that traditional statistical methods assume that predictor
variables
are uncorrelated with each other - incredible!

Zubin wrote:

> Can you be more specific on what the misleading statements are?  And why you
> think they are misleading.
>
> T.S. Lim  wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > While hunting for URLs for KDCentral.com, I encountered several
> > misleading statements about Statistics made by Data Mining people.
> > I've posted 3 of them to my bulletin board. If you encounter other
> > wrong remarks, I invite you to post them to the board too at
> >
> >http://www.recursive-partitioning.com/forums
> >
> > Thanks.
> >
> >
> >
> >
> > --
> > T.S. Lim
> > [EMAIL PROTECTED]
> > www.Recursive-Partitioning.com
> >
> >
> >
> > 
> > Get paid to write review! http://recursive-partitioning.epinions.com

--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper

2000-04-26 Thread Zubin

Can you be more specific on what the misleading statements are?  And why you
think they are misleading.


T.S. Lim  wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> While hunting for URLs for KDCentral.com, I encountered several
> misleading statements about Statistics made by Data Mining people.
> I've posted 3 of them to my bulletin board. If you encounter other
> wrong remarks, I invite you to post them to the board too at
>
>http://www.recursive-partitioning.com/forums
>
> Thanks.
>
>
>
>
> --
> T.S. Lim
> [EMAIL PROTECTED]
> www.Recursive-Partitioning.com
>
>
>
> 
> Get paid to write review! http://recursive-partitioning.epinions.com




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-26 Thread dennis roberts

At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:


>It does not surprise me one bit.  The typical statistics
>course teaches statistical methods and pronouncements, with
>no attempt to achieve understanding.   snip of more

this is something i happen to agree with herman about ... but, it is a much 
broader problem than can be attributed to what happens in one course

it is an attitude about what higher education is all about ... and what the 
goals are for it

'going to college' ... be it undergraduate level or graduate level ... has 
become a much more hit and miss experience, residence has little meaning 
... that is being tailored more and more to the convenience of students ... 
and to what is 'user' friendly (or it won't SELL). studying principles in 
disciplines is hard work ... NOT user friendly ... so, less and less is 
being required in the way of diligent study.

take graduate school for example ... there was a time, was there not ... 
where doctoral students were REALLY expected to be responsible for their 
dissertations AND were expected to be the experts in that particular area 
of inquiry ... AND to be competent enough to have done the work him/herself 
... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT

but, what i have noticed over many years is that dissertations are becoming 
more of a committee effort ... yes, the student MAY have had the idea 
(though not necessarily) but, from there ... he/she gets help with the 
design ... has someone else do the analysis (because he/she did not take 
any/sufficient work in analytic methods to understand what is going on) ... 
gets help in writing and editing .. and, even gets help in terms of what 
their results MEAN ...

gives new meaning to the term: "cooperative learning"





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-26 Thread Herman Rubin

In article <002601bfaf29$cfbaa9a0$[EMAIL PROTECTED]>,
David A. Heiser <[EMAIL PROTECTED]> wrote:

>- Original Message -
>From: T.S. Lim 
>To: <[EMAIL PROTECTED]>
>Sent: Tuesday, April 25, 2000 10:49 AM
>Subject: Data Mining blooper


>> While hunting for URLs for KDCentral.com, I encountered several
>> misleading statements about Statistics made by Data Mining people.
>> I've posted 3 of them to my bulletin board. If you encounter other
>> wrong remarks, I invite you to post them to the board too at

>>http://www.recursive-partitioning.com/forums

>> Thanks.
>'''.
>...
>This essentially supports my argument over the last few years. The
>commercial selling of overpriced black boxes generates so much profit for
>these companies that they can make any claim whatsoever, and people will buy
>it, just like in politics.

>The basic selling line is, "you may be stupid, with absolutely no knowledge
>of anything, but if you buy my overpriced $20,000 software, you become a
>noted expert in anything. You don't have to know anything to use my software
>(or vote for me, or)". It amazes me that college graduates buy this
>hook, line and sinker. Then they ask questions on edstat about what does the
>output mean.

>DAH


It does not surprise me one bit.  The typical statistics
course teaches statistical methods and pronouncements, with
no attempt to achieve understanding.  How many coming out
of such a course are cognizant that a significance
statement is a statement about the probability BEFORE the
observations are taken that the null hypothesis will be
rejected?  How many understand what the likelihood function
means, and why one should even consider the likelihood
principle?

If students come out of a statistics course believing that 
statistics is a black box into which one puts the data, with
no assumptions, and it spews out the state of the universe,
or at least the "statistical conclusions", how could it be
expected that they NOT consider what is offered as just a
better black box.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-25 Thread David A. Heiser


- Original Message -
From: T.S. Lim 
To: <[EMAIL PROTECTED]>
Sent: Tuesday, April 25, 2000 10:49 AM
Subject: Data Mining blooper


> While hunting for URLs for KDCentral.com, I encountered several
> misleading statements about Statistics made by Data Mining people.
> I've posted 3 of them to my bulletin board. If you encounter other
> wrong remarks, I invite you to post them to the board too at
>
>http://www.recursive-partitioning.com/forums
>
> Thanks.
'''.
...
This essentially supports my argument over the last few years. The
commercial selling of overpriced black boxes generates so much profit for
these companies that they can make any claim whatsoever, and people will buy
it, just like in politics.

The basic selling line is, "you may be stupid, with absolutely no knowledge
of anything, but if you buy my overpriced $20,000 software, you become a
noted expert in anything. You don't have to know anything to use my software
(or vote for me, or)". It amazes me that college graduates buy this
hook, line and sinker. Then they ask questions on edstat about what does the
output mean.

DAH





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Data Mining blooper

2000-04-25 Thread T.S. Lim

While hunting for URLs for KDCentral.com, I encountered several
misleading statements about Statistics made by Data Mining people.
I've posted 3 of them to my bulletin board. If you encounter other
wrong remarks, I invite you to post them to the board too at

   http://www.recursive-partitioning.com/forums

Thanks.




--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com




Get paid to write review! http://recursive-partitioning.epinions.com


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-17 Thread Rich Ulrich

 ( how did we get to HERE, from Data Mining?)

On 15 Apr 2000 17:50:05 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

> In article <[EMAIL PROTECTED]>,
> Rich Ulrich  <[EMAIL PROTECTED]> wrote:
> 
> >One thing that remains true about stock investment schemes:  There may
> >be some overall growth, somewhere, but in a specific, narrow
> >perspective, the whole market makes up a zero-sum game.  If someone
> >wins, someone else has to lose.  
> 
> The above is internally contradictory, but the final statement is
> clearly false.  

Hey, the final statement is a DEFINITION of zero-sum game.
Where is YOUR mind wandering to?

I have no objection to wise investments, and that is
why I specified tried to specify a different context,
that is,  "schemes."   - Sorry that I 


> Of course, short-term "day trading" is largely a zero-sum game, as the
> return to be expected over such a short time period is very small.

 - much of it only becomes zero-sum, when the time period is LONG.
There are fortunes made on a soaring market.

 - actually, I expect there are a few Wise Guys who will extract most
of the profit,  so techno-stocks will be negative-sum for most
investors.  There is a LONG history like that:  In the 1830s and 1840s
investors poured money into building canals in the U.S. and England.
The countries benefitted from canals; a few manipulators got rich;
most of the companies went broke and most of the investors lost money.
Railroads followed the same pattern in the second half of that
century.  

In the 1910s, the "wireless telegraph" had the investors flocking --
the U.S. government got involved in prosecuting traders for fraudulent
offerings.  But I don't know if that was as big as Railroads, in terms
of dollars.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-15 Thread Radford Neal

In article <[EMAIL PROTECTED]>,
Rich Ulrich  <[EMAIL PROTECTED]> wrote:

>One thing that remains true about stock investment schemes:  There may
>be some overall growth, somewhere, but in a specific, narrow
>perspective, the whole market makes up a zero-sum game.  If someone
>wins, someone else has to lose.  

The above is internally contradictory, but the final statement is
clearly false.  

Consider a pharmaceutical company with a research program.  Suppose
the general, well-founded opinion is that this program is not likely
to produce much.  The company's stock is low.  But it turns that they
get lucky, and discover a marvelous drug, that will save millions of
lives, and make them lots of money.  The company's stock goes up.  The
owners of the stock win.  And it may be that nobody else loses.  (It
could be that owners of stock in a competing company lose, but if the
drug is much better than previous drugs, they'll tend to lose less
than the winners win.  And perhaps there was NO drug for the disease
before, in which case there were no competitors.)

Aside from wins due to such surpises, there is indeed a general
positive rate of return, stemming from the fact that capital actually
is a useful factor in production, and there is also the possibility of
an overall gain or loss as a result of a shift in the general degree
of preference for consumption now over consumption later (expressed in
terms of interest rates).

Of course, short-term "day trading" is largely a zero-sum game, as the
return to be expected over such a short time period is very small.

   Radford Neal


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-14 Thread Rich Ulrich

On 12 Apr 2000 15:21:21 -0700, [EMAIL PROTECTED] (Paul Bernhardt)
wrote:

> I suspect in this forum, almost as bad as the F-word or N-word are the 
> DM-words... Data Mining... I agree, but wonder about criteria.

 - since IBM started touting a product by that name, it is hard to
ignore the new environment   It is still possible that someone
will start will a small amount of information, and "torture the data
until it confesses."   But online data collection produces databases
with millions of sales events, organized by date, store, etc.  What
can be learned?

> Often in our various research domains we have no choice but to use 
> retrospective data. A classic example might be validating an investment 
> approach by examining historical data, which some call backtesting. 
> 
> What are the criteria, how can we know when we have chance findings?
> 
Try to look for "independence"  so that you have an N that gives you
increasing confidence;  use something more extreme than 5% -- though
you may be fooling yourself if you think that your reported level
below the 0.1% level is really accurate.


> I've argued that if the model is based on an a priori hypothesis, or can 
> be justfied by previously established theories, the possibility of data 
> mining may be ignored. When the pre-existing theory is less substantial, 

 - How substantial is "less substantial" or how substantial was the
PRIOR?  If you are sure something is there, maybe you don't need much
more evidence, okay.  Right, more shoppers on a sunny day.  On a
payday.  

> one may ask if the discovered model fits data not included in the 
> original model (data which occurs after the model was discovered, or data 
> which precedes the data originally used to create the model).
> 
> I'd like to hear the views of people on this forum. 
> 
> The specific situation I'm refering to is an investment model called the 
> Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) 
> which was found to beat the S&P500 and Dow 30 over the period from 1973 
> through 1993. Since that date, and further backtested to 1961, it has not 
> similarly beat those traditional benchmark indexes, but also has not 
> performed worse (both of which could be due to lack of power). The 
> Foolish Four is based on a reasonable hypothesis that the worse 
 < snip >
 
One thing that remains true about stock investment schemes:  There may
be some overall growth, somewhere, but in a specific, narrow
perspective, the whole market makes up a zero-sum game.  If someone
wins, someone else has to lose.  

IF there is an amount of regression-to-the-mean that you once were
able to count on, then AFTER it is publicized, it can't keep on
working for very long.  If too many people try to cash in at once,
strict application of the formula can suddenly become a big loser.
Okay, you can work around the edges, and try to figure what stocks
really *ought*  to have been the ones in that group, before eager
anticipation drove their prices up.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-13 Thread Thom Baguley

Paul Bernhardt wrote:
> I am not affiliated with the Motley Fool (where this investment strategy
> is touted) nor am I advertising for them. It is just an interesting
> practical problem which raises a question I think many statiticians face,
> how to explain when someone has conducted data mining and when they might
> have sussed out a valid truth.
> 
> Paul Bernhardt
> University of Utah
> Department of Educational Psychology

Looks like it tries to capitalize on regression to the mean. RttM only applies
where something is made up of a true score lus a random component. Focussing
on volatile stocks they seem to be attempting to choose sticks with relatively
high random components.

Thom


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-13 Thread Frank E Harrell Jr



"T.S. Lim" wrote:

> Data Mining = Statistics reborn with a new name.
>
> You ask the wrong crowd. Go to
>
>http://www.kdcentral.com
>
> and subscribe to datamine-l mailing list.

That's debatable.  The poster's question has as much to do with regression
to the mean as with modeling, and anyway data mining has everything to
do with statistics.

-Frank Harrell



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-12 Thread T.S. Lim

Data Mining = Statistics reborn with a new name.

You ask the wrong crowd. Go to

   http://www.kdcentral.com

and subscribe to datamine-l mailing list.



In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...
>
>I suspect in this forum, almost as bad as the F-word or N-word are the 
>DM-words... Data Mining... I agree, but wonder about criteria.
>
>Often in our various research domains we have no choice but to use 
>retrospective data. A classic example might be validating an investment 
>approach by examining historical data, which some call backtesting. 
>
>What are the criteria, how can we know when we have chance findings?
>
>I've argued that if the model is based on an a priori hypothesis, or can 
>be justfied by previously established theories, the possibility of data 
>mining may be ignored. When the pre-existing theory is less substantial, 
>one may ask if the discovered model fits data not included in the 
>original model (data which occurs after the model was discovered, or data 
>which precedes the data originally used to create the model).
>
>I'd like to hear the views of people on this forum. 
>
>The specific situation I'm refering to is an investment model called the 
>Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) 
>which was found to beat the S&P500 and Dow 30 over the period from 1973 
>through 1993. Since that date, and further backtested to 1961, it has not 
>similarly beat those traditional benchmark indexes, but also has not 
>performed worse (both of which could be due to lack of power). The 
>Foolish Four is based on a reasonable hypothesis that the worse 
>performing Dow Jones Industrial Average companies are poised to turn 
>around because they are simply too great to fail over the long term. The 
>judgement on poor performance is based on the stock yield (a high 
>yielding stock has a relatively high interest payment compared to price), 
>therefore a reasonable hypothesis is used to justify this approach. 
>Selection of 4 of the 5 worst performing Dow companies (the worst is 
>excluded because often these companies are in actual long term financial 
>trouble) is what makes up the Foolish Four.
>
>I am not affiliated with the Motley Fool (where this investment strategy 
>is touted) nor am I advertising for them. It is just an interesting 
>practical problem which raises a question I think many statiticians face, 
>how to explain when someone has conducted data mining and when they might 
>have sussed out a valid truth.
>
>Paul Bernhardt
>University of Utah
>Department of Educational Psychology

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Data Mining

2000-04-12 Thread Paul Bernhardt

I suspect in this forum, almost as bad as the F-word or N-word are the 
DM-words... Data Mining... I agree, but wonder about criteria.

Often in our various research domains we have no choice but to use 
retrospective data. A classic example might be validating an investment 
approach by examining historical data, which some call backtesting. 

What are the criteria, how can we know when we have chance findings?

I've argued that if the model is based on an a priori hypothesis, or can 
be justfied by previously established theories, the possibility of data 
mining may be ignored. When the pre-existing theory is less substantial, 
one may ask if the discovered model fits data not included in the 
original model (data which occurs after the model was discovered, or data 
which precedes the data originally used to create the model).

I'd like to hear the views of people on this forum. 

The specific situation I'm refering to is an investment model called the 
Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) 
which was found to beat the S&P500 and Dow 30 over the period from 1973 
through 1993. Since that date, and further backtested to 1961, it has not 
similarly beat those traditional benchmark indexes, but also has not 
performed worse (both of which could be due to lack of power). The 
Foolish Four is based on a reasonable hypothesis that the worse 
performing Dow Jones Industrial Average companies are poised to turn 
around because they are simply too great to fail over the long term. The 
judgement on poor performance is based on the stock yield (a high 
yielding stock has a relatively high interest payment compared to price), 
therefore a reasonable hypothesis is used to justify this approach. 
Selection of 4 of the 5 worst performing Dow companies (the worst is 
excluded because often these companies are in actual long term financial 
trouble) is what makes up the Foolish Four.

I am not affiliated with the Motley Fool (where this investment strategy 
is touted) nor am I advertising for them. It is just an interesting 
practical problem which raises a question I think many statiticians face, 
how to explain when someone has conducted data mining and when they might 
have sussed out a valid truth.

Paul Bernhardt
University of Utah
Department of Educational Psychology


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: data mining and spatial correlation

2000-02-27 Thread andy99potter

Hi Can anyone tell me how to research
what is the size of the total advertising budgets in the
developed countries.
Thanks


* Sent from AltaVista http://www.altavista.com Where you can also find related Web 
Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



data mining and spatial correlation

2000-02-26 Thread Sang-Sub Lee

Could somebody help me with references on
data mining and spatial correlation?

THank you very much




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



[job] Positions in machine learning, statistics, and data mining

2000-02-08 Thread robertd

Athene Software, Inc.
Positions in Machine Learning, Statistics, and Data Mining

Athene Software, based in Boulder, Colorado, has immediate openings for
professionals in machine learning, statistics, and data mining.  We are
seeking qualified candidates to develop and enhance models of
subscriber behavior for telecommunications companies.

Responsibilities include: statistical investigation of large data sets,
building predictive and decision-making models using the latest
advances in machine learning techniques, developing and tuning data
representations, and presentation of results to internal and external
customers.

Candidates must hold a Ph.D. in Computer Science, Statistics,
Electrical Engineering, or related field.  The ideal candidate will
have experience in pattern recognition or mathematical modeling on real
world problems, familiarity with experimental design and data analysis,
and some background in relational database systems.  Strong
communication skills are extremely important.

Athene has a long-term committment to cultivating a dynamic,
stimulating environment for its Ph.D. research staff.   The group is
slated to double over the next few years.  Athene encourages
publication of research results and active participation in the
research community.  And Athene has established a research advisory
board consisting of leaders in machine learning, including Dr. Satinder
Singh Baveja (AT&T Labs - Research), Prof.  Geoffrey Hinton (University
College London), Prof. John Moody (OGI), Prof. Andrew Moore (CMU), and
Prof. Michael Mozer (Boulder).


Send applications to:

Dr. Robert Dodier
Athene Software, Inc.
2060 Broadway, Suite 300
Boulder, CO  80302

email: [EMAIL PROTECTED]
company URL:   www.athenesoft.com


Sent via Deja.com http://www.deja.com/
Before you buy.


===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===



Re: Textbooks for a course in data mining for scientists and engineers

2000-01-26 Thread T.-S. Lim

In article <86nrjb$ljd$[EMAIL PROTECTED]>, [EMAIL PROTECTED] says...
>
>Can anyone suggest a good textbook for a course in data mining?  The
>students would graduate students in science and engineering with the
>typical background being one or two undergraduate courses in 
>probability and statistics.  
>
>-- 
>Brian Borchers  [EMAIL PROTECTED]
>Department of Mathematics   http://www.nmt.edu/~borchers/
>New Mexico Tech Phone: 505-835-5813
>Socorro, NM 87801   FAX: 505-835-5366


There's no perfect book, as usual. You may need to combine several books. Or, 
you may choose a book and supplement it with notes from other books. Go to

   www.recursive-partitioning.com/books.html

for some recent data mining and machine learning books. You also need to take 
into account software to try the various methods.

-- 
Tjen-Sien Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===



Re: Textbooks for a course in data mining for scientists and engineers

2000-01-26 Thread Frank E Harrell Jr

One very good book is "How to Find Noise in Data" by "I. Ben Fooled".  Sorry - I 
couldn't resist.

Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat


Brian Borchers wrote:

> Can anyone suggest a good textbook for a course in data mining?  The
> students would graduate students in science and engineering with the
> typical background being one or two undergraduate courses in
> probability and statistics.
>
> --
> Brian Borchers  [EMAIL PROTECTED]
> Department of Mathematics   http://www.nmt.edu/~borchers/
> New Mexico Tech Phone: 505-835-5813
> Socorro, NM 87801   FAX: 505-835-5366

--




===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===



Textbooks for a course in data mining for scientists and engineers

2000-01-26 Thread Brian Borchers

Can anyone suggest a good textbook for a course in data mining?  The
students would graduate students in science and engineering with the
typical background being one or two undergraduate courses in 
probability and statistics.  

-- 
Brian Borchers  [EMAIL PROTECTED]
Department of Mathematics   http://www.nmt.edu/~borchers/
New Mexico Tech Phone: 505-835-5813
Socorro, NM 87801   FAX: 505-835-5366


===
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===



PolyAnalyst 4.0 - Final Release of the Leading Data Mining Solution

1999-12-10 Thread Sergei Ananyan

Megaputer Intelligence
www.megaputer.com

Megaputer announces the final release of PolyAnalyst 4.0, the newest version
of the leading data mining solution. The Megaputer development team extends
many thanks to numerous beta-testers who helped perfecting the system. An
evaluation copy of PolyAnalyst 4.0 can be downloaded from
www.megaputer.com/html/webshop.html

Version 4.0 represents a major upgrade of the system, positioning
PolyAnalyst as the most comprehensive and versatile suite of data mining
algorithms available today. PolyAnalyst now utilizes Distributed Component
Object Model (DCOM) technology, features ten unique machine learning
algorithms, and furnishes versatile data manipulation, visualization, and
scoring capabilities. In addition to clustering, predicting, dependency
detecting, and yes/no classifying, PolyAnalyst 4.0 solves tasks of explicit
modeling,  detection of association rules in transactional data, and
classification to multiple categories. An open DCOM architecture makes
PolyAnalyst 4.0 easily extendable, upgradable and customizable. It provides
the user with an option to purchase only the necessary machine learning
algorithms as individual modules and utilize these modules as an integral
part of their data storage and management system. A DCOM-based PolyAnalyst
Knowledge Server can support several client stations on a local network.

New features of PolyAnalyst 4.0 include:

* Unique Market Basket Analysis algorithm for processing transactional data.
Groups of products sold together well and the corresponding directed
association rules are identified an order of magnitude faster than by
traditional algorithms.
* New Memory Based Reasoning algorithm based on a combination of Nearest
Neighbor and Genetic Algorithms. The new method is used efficiently for
classification into multiple categories, as well as prediction of numerical
variables.
* Implementation of the DCOM architecture. Now individual PolyAnalyst
algorithms can be easily utilized in the form of ActiveX modules in external
decision support or data management applications. New PolyAnalyst machine
learning modules can be easily added and upgraded.
* Support for the analysis of large datasets. The maximum volume of data
accepted by the system has been significantly increased and new mechanisms
for dealing with large datasets have been implemented.
* Redesigned user interface. The project contents are organized in a
Windows-standard tree-like style, while preserving all the best features of
the traditional PolyAnalyst interface.
* Dynamic HTML reports for exploration engines. The new interactive reports
can be customized, saved, printed, or copied in a standard portable format,
and exchanged with external applications
* Mouse-driven Rule Assistant for simple creation of user defined
transformation rules.
* Lift and Gain Charts for interpreting the results obtained by machine
learning algorithms are especially valuable for direct marketing tasks.
These features are very useful for measuring the extra profit reaped by a
marketer making decisions based on the knowledge discovered by PolyAnalyst.
* New Snake Chart for convenient visual comparison of different data sets.
* New Chart Designer for the development of custom charts and advanced data
visualization capabilities
* New PA Scheduler enabling batch process data mining in PolyAnalyst. The
user can record a sequence of actions and schedule the created script to be
run by PolyAnalyst at a specified time and on the specified datasets.
* Sampling capabilities, allowing random selection of records from a
dataset.
* Direct data exchange with Oracle Express and IBM Visual Warehouse.

“By offering these new features, Megaputer has clearly positioned
PolyAnalyst 4.0 as the number one modern business solution for data
analysis,” says Sergei Arseniev, CEO of Megaputer Intelligence. “Now
PolyAnalyst combines the broadest selection of versatile machine learning
algorithms with the convenience and flexibility of DCOM architecture. The
new cutting-edge technology helped Megaputer significantly increase its DM
market share during the last year.”

A number of Fortune 100 companies, such as Allstate Insurance, Boeing, and
DuPont Dow Elastomers, have already switched to PolyAnalyst 4.0. Our
customers are very enthusiastic about the system:
“Analytical engines do an excellent job of finding relations amongst many
fields without overfitting.”
-- Timothy E. Nagle, Nycor Group, Consulting Scientist to 3M

“We chose PolyAnalyst because it offered broad analytic functionality and
ease of use beyond any other product.”
-- Carl Cozine, Principal, CACTUS Strategies

“The software provides a unique and powerful set of tools for data mining
applications, including promotion response analysis, customer segmentation
and profiling, and cross-selling analysis.”
-- Raymond Burke, Chair of BA, Kelley School of Business, Indiana University

Platforms: Windows NT/95/98/2000