Re: Data Mining blooper and Related Subjects

2000-04-29 Thread Frank E Harrell Jr

I'd like to make a somewhat related point.  There are many educational
tools that I've found have a great effect on non-statisticians.  One if these
is to take one of their datasets, randomly permute the column of Y-values,
go through their data mining procedure, and see what it finds.  The more
that it finds, the more the client becomes properly afraid of the technique
and respectful of the statistician's careful approach.  -Frank Harrell

"Silvert, Henry" wrote:

 I respectfully disagree with Michael Wyatt. I come from an academic
 background and now work outside of academia, except for the occassional
 course here or there. I too report to a manager or managers, depending on
 the circumstances. But my experiences have not been the same as his. I am
 constantly urged to use all my skills as a statistician and a research
 methodologist by "my managers." (Horrid!!!)

 Henry M. Silvert PHD
 Research Statistician
 The Conference Board
 845 3rd. Avenue
 New York, NY 10022
 Tel. No.: (212) 339-0438
 Fax No.: (212) 836-3825

  -Original Message-
  From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
  Sent: Friday, April 28, 2000 7:52 AM
  To:   [EMAIL PROTECTED]
  Subject:  Re: Data Mining blooper and Related Subjects
 
  ...And it extends even further. Many of us who toil in areas outside of
  academia have our work and productivity "supervised" by managers or
  directors who have little or no training in statistics, beyond a survey
  course. They receive the flashy brochures and read the ads that promise
  analytical software that will provide significant information, without
  the bother of of formulating one of those fancy-shmancy hypotheses.
 
  The higher-ups come to view data mining, decision support, outcomes
  analysis,  etc. as requiring no more skill than the ability to use a PC.
   I call it "The Myth of the Statistical Meat Grinder".  The push of a
  button or two will generate the answer to all corporate questions, plus a
  few neat-o graphs for the board of directors packets.
 
  Michael T. Wyatt, Ph.D.
  (Embittered) Healthcare Analyst
  Quality Improvement Dept.
  DCH Regional Medical Center
  Tuscaloosa, AL
 
 
 
  On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts [EMAIL PROTECTED]
  writes:
   At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
  
  
   It does not surprise me one bit.  The typical statistics
   course teaches statistical methods and pronouncements, with
   no attempt to achieve understanding.   snip of more
  
   this is something i happen to agree with herman about ... but, it is
   a much
   broader problem than can be attributed to what happens in one course
  
   it is an attitude about what higher education is all about ... and
   what the
   goals are for it
  
   'going to college' ... be it undergraduate level or graduate level
   ... has
   become a much more hit and miss experience, residence has little
   meaning
   ... that is being tailored more and more to the convenience of
   students ...
   and to what is 'user' friendly (or it won't SELL). studying
   principles in
   disciplines is hard work ... NOT user friendly ... so, less and less
   is
   being required in the way of diligent study.
  
   take graduate school for example ... there was a time, was there not
   ...
   where doctoral students were REALLY expected to be responsible for
   their
   dissertations AND were expected to be the experts in that particular
   area
   of inquiry ... AND to be competent enough to have done the work
   him/herself
   ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
  
   but, what i have noticed over many years is that dissertations are
   becoming
   more of a committee effort ... yes, the student MAY have had the
   idea
   (though not necessarily) but, from there ... he/she gets help with
   the
   design ... has someone else do the analysis (because he/she did not
   take
   any/sufficient work in analytic methods to understand what is going
   on) ...
   gets help in writing and editing .. and, even gets help in terms of
   what
   their results MEAN ...
  
   gives new meaning to the term: "cooperative learning"
  
  
  
  
  
  
  =
  ==
   This list is open to everyone.  Occasionally, less thoughtful
   people send inappropriate messages.  Please DO NOT COMPLAIN TO
   THE POSTMASTER about these messages because the postmaster has no
   way of controlling them, and excessive complaints will result in
   termination of the list.
  
   For information about this list, including information about the
   problem of inappropriate messages and information about how to
   unsubscribe, please see the web page at
   http://jse.stat.ncsu.edu/
  
  =
  ==
 
  
  YOU'RE PAYING TOO MUCH FOR TH

Re: Data Mining blooper and Related Subjects

2000-04-29 Thread Bob Hayden

- Forwarded message from Frank E Harrell Jr -

I'd like to make a somewhat related point.  There are many educational
tools that I've found have a great effect on non-statisticians.  One if these
is to take one of their datasets, randomly permute the column of Y-values,
go through their data mining procedure, and see what it finds.  The more
that it finds, the more the client becomes properly afraid of the technique
and respectful of the statistician's careful approach.  -Frank Harrell

- End of forwarded message from Frank E Harrell Jr -

That's a nice example, though I would not have the confidence that
they would not see it as a wonderful way to discover even more
"relationships"!-)  You could also try sorting the X and Y columns
independently to boost R^2.  Some of my students just supplied another
example.  I was ill and emailed class cancellation well in advance.
Some of these folks seem to practice procrastination as a religion,
and put off the experimental design assignment due the day I was out
and did it with the time series assignment due at the next class.  As
a result, some of them introduced a "trend" variable into the
experimental design data, and got a "significant" p-value.  I have not
had the opportunity yet to ask them what "trend" measures in this
context, or what value I should plug in for it if I want to make a
prediction of sales when commission is 5% in Division C.

I used to have a sheet with four residual plots per side that I asked
students to interpret on exams.  One plot was a big smiley face.  The
answer I hoped for was, "There seems to be a pattern here, but I don't
think any of the techniques we have studied would be appropriate to
deal with it."  I wonder what the data mining software would do with
it?  Smile back maybe?
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Data Mining blooper and Related Subjects

2000-04-28 Thread Silvert, Henry

I respectfully disagree with Michael Wyatt. I come from an academic
background and now work outside of academia, except for the occassional
course here or there. I too report to a manager or managers, depending on
the circumstances. But my experiences have not been the same as his. I am
constantly urged to use all my skills as a statistician and a research
methodologist by "my managers." (Horrid!!!) 

Henry M. Silvert PHD
Research Statistician
The Conference Board
845 3rd. Avenue
New York, NY 10022
Tel. No.: (212) 339-0438
Fax No.: (212) 836-3825

 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, April 28, 2000 7:52 AM
 To:   [EMAIL PROTECTED]
 Subject:  Re: Data Mining blooper and Related Subjects
 
 ...And it extends even further. Many of us who toil in areas outside of
 academia have our work and productivity "supervised" by managers or
 directors who have little or no training in statistics, beyond a survey
 course. They receive the flashy brochures and read the ads that promise
 analytical software that will provide significant information, without
 the bother of of formulating one of those fancy-shmancy hypotheses.
 
 The higher-ups come to view data mining, decision support, outcomes
 analysis,  etc. as requiring no more skill than the ability to use a PC.
  I call it "The Myth of the Statistical Meat Grinder".  The push of a
 button or two will generate the answer to all corporate questions, plus a
 few neat-o graphs for the board of directors packets.
 
 Michael T. Wyatt, Ph.D.
 (Embittered) Healthcare Analyst
 Quality Improvement Dept.
 DCH Regional Medical Center
 Tuscaloosa, AL
 
 
 
 On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts [EMAIL PROTECTED]
 writes:
  At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
  
  
  It does not surprise me one bit.  The typical statistics
  course teaches statistical methods and pronouncements, with
  no attempt to achieve understanding.   snip of more
  
  this is something i happen to agree with herman about ... but, it is 
  a much 
  broader problem than can be attributed to what happens in one course
  
  it is an attitude about what higher education is all about ... and 
  what the 
  goals are for it
  
  'going to college' ... be it undergraduate level or graduate level 
  ... has 
  become a much more hit and miss experience, residence has little 
  meaning 
  ... that is being tailored more and more to the convenience of 
  students ... 
  and to what is 'user' friendly (or it won't SELL). studying 
  principles in 
  disciplines is hard work ... NOT user friendly ... so, less and less 
  is 
  being required in the way of diligent study.
  
  take graduate school for example ... there was a time, was there not 
  ... 
  where doctoral students were REALLY expected to be responsible for 
  their 
  dissertations AND were expected to be the experts in that particular 
  area 
  of inquiry ... AND to be competent enough to have done the work 
  him/herself 
  ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
  
  but, what i have noticed over many years is that dissertations are 
  becoming 
  more of a committee effort ... yes, the student MAY have had the 
  idea 
  (though not necessarily) but, from there ... he/she gets help with 
  the 
  design ... has someone else do the analysis (because he/she did not 
  take 
  any/sufficient work in analytic methods to understand what is going 
  on) ... 
  gets help in writing and editing .. and, even gets help in terms of 
  what 
  their results MEAN ...
  
  gives new meaning to the term: "cooperative learning"
  
  
  
  
  
 
 =
 ==
  This list is open to everyone.  Occasionally, less thoughtful
  people send inappropriate messages.  Please DO NOT COMPLAIN TO
  THE POSTMASTER about these messages because the postmaster has no
  way of controlling them, and excessive complaints will result in
  termination of the list.
  
  For information about this list, including information about the
  problem of inappropriate messages and information about how to
  unsubscribe, please see the web page at
  http://jse.stat.ncsu.edu/
 
 =
 ==
 
 
 YOU'RE PAYING TOO MUCH FOR THE INTERNET!
 Juno now offers FREE Internet Access!
 Try it today - there's no risk!  For your FREE software, visit:
 http://dl.www.juno.com/get/tagj.
 
 
 ==
 =
 This list is open to everyone.  Occasionally, less thoughtful
 people send inappropriate messages.  Please DO NOT COMPLAIN TO
 THE POSTMASTER about these messages because the postmaster has no
 way of controlling them, and excessive complaints will result in
 termination of the list.
 
 For information abou

Re: Data Mining blooper and Related Subjects

2000-04-28 Thread Debasmit Mohanty

I have been following the discussion on Data Mining blooper for a while. 
Being a first year graduate student in statistics, my comments on this issue 
might sound premature. Nevertheless, I would put forward my observations.

What I have learnt so far from my interaction with the statisticians in the 
academics as well as in the industry is the following:

1) Many of the statisticians still feel that "Data Mining" as a discipline 
should be left for the people in computer science.
Of course, I don't agree to this statement at all. If you read the paper 
"Data Mining and Statistics" by Dr. J. Friedman, you would realize how 
statisticians have neglected this emerging field over last few years.

2) There are few statistics graduate programs which emphasize on "Data 
Mining" research. Of course, there are few ones like Carnegie Mellon.
But overall, we are yet to give the much needed attention it needs.

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

Thanks
Debasmit

--
Debasmit Mohanty
Graduate Student - Statistics
http://bama.ua.edu/~mohan001/
--


Date: Wed, 26 Apr 2000 11:38:28 -0400
From: dennis roberts [EMAIL PROTECTED]
Subject:

At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:


It does not surprise me one bit.  The typical statistics
course teaches statistical methods and pronouncements, with
no attempt to achieve understanding.   snip of more

this is something i happen to agree with herman about ... but, it is a much
broader problem than can be attributed to what happens in one course

it is an attitude about what higher education is all about ... and what the
goals are for it

'going to college' ... be it undergraduate level or graduate level ... has
become a much more hit and miss experience, residence has little meaning
... that is being tailored more and more to the convenience of students ...
and to what is 'user' friendly (or it won't SELL). studying principles in
disciplines is hard work ... NOT user friendly ... so, less and less is
being required in the way of diligent study.

take graduate school for example ... there was a time, was there not ...
where doctoral students were REALLY expected to be responsible for their
dissertations AND were expected to be the experts in that particular area
of inquiry ... AND to be competent enough to have done the work him/herself
... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT

but, what i have noticed over many years is that dissertations are becoming
more of a committee effort ... yes, the student MAY have had the idea
(though not necessarily) but, from there ... he/she gets help with the
design ... has someone else do the analysis (because he/she did not take
any/sufficient work in analytic methods to understand what is going on) ...
gets help in writing and editing .. and, even gets help in terms of what
their results MEAN ...

gives new meaning to the term: "cooperative learning"

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects (fwd)

2000-04-28 Thread Bob Hayden

- Forwarded message from Debasmit Mohanty -

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

- End of forwarded message from Debasmit Mohanty -

It may be a disagreement over words.  Much of the work Tukey et
al. did in the 60s, called exploratory data analysis, had to do with
looking at data and trying to detect patterns.  However, if you sift
through data you will find many "patterns" that are just flukes of
chance.  How do you avoid taking these seriously?  This was a
criticism directed at Tukey then, and even more so at what goes on
today under the name of "Data Mining".  But I have a sense that Tukey
had a much deeper awareness of the underlying statitical issues than
most of the miners have!-)
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-26 Thread Herman Rubin

In article 002601bfaf29$cfbaa9a0$[EMAIL PROTECTED],
David A. Heiser [EMAIL PROTECTED] wrote:

- Original Message -
From: T.S. Lim Use-Author-Address-Header@[127.1]
To: [EMAIL PROTECTED]
Sent: Tuesday, April 25, 2000 10:49 AM
Subject: Data Mining blooper


 While hunting for URLs for KDCentral.com, I encountered several
 misleading statements about Statistics made by Data Mining people.
 I've posted 3 of them to my bulletin board. If you encounter other
 wrong remarks, I invite you to post them to the board too at

http://www.recursive-partitioning.com/forums

 Thanks.
'''.
...
This essentially supports my argument over the last few years. The
commercial selling of overpriced black boxes generates so much profit for
these companies that they can make any claim whatsoever, and people will buy
it, just like in politics.

The basic selling line is, "you may be stupid, with absolutely no knowledge
of anything, but if you buy my overpriced $20,000 software, you become a
noted expert in anything. You don't have to know anything to use my software
(or vote for me, or)". It amazes me that college graduates buy this
hook, line and sinker. Then they ask questions on edstat about what does the
output mean.

DAH


It does not surprise me one bit.  The typical statistics
course teaches statistical methods and pronouncements, with
no attempt to achieve understanding.  How many coming out
of such a course are cognizant that a significance
statement is a statement about the probability BEFORE the
observations are taken that the null hypothesis will be
rejected?  How many understand what the likelihood function
means, and why one should even consider the likelihood
principle?

If students come out of a statistics course believing that 
statistics is a black box into which one puts the data, with
no assumptions, and it spews out the state of the universe,
or at least the "statistical conclusions", how could it be
expected that they NOT consider what is offered as just a
better black box.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-26 Thread dennis roberts

At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:


It does not surprise me one bit.  The typical statistics
course teaches statistical methods and pronouncements, with
no attempt to achieve understanding.   snip of more

this is something i happen to agree with herman about ... but, it is a much 
broader problem than can be attributed to what happens in one course

it is an attitude about what higher education is all about ... and what the 
goals are for it

'going to college' ... be it undergraduate level or graduate level ... has 
become a much more hit and miss experience, residence has little meaning 
... that is being tailored more and more to the convenience of students ... 
and to what is 'user' friendly (or it won't SELL). studying principles in 
disciplines is hard work ... NOT user friendly ... so, less and less is 
being required in the way of diligent study.

take graduate school for example ... there was a time, was there not ... 
where doctoral students were REALLY expected to be responsible for their 
dissertations AND were expected to be the experts in that particular area 
of inquiry ... AND to be competent enough to have done the work him/herself 
... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT

but, what i have noticed over many years is that dissertations are becoming 
more of a committee effort ... yes, the student MAY have had the idea 
(though not necessarily) but, from there ... he/she gets help with the 
design ... has someone else do the analysis (because he/she did not take 
any/sufficient work in analytic methods to understand what is going on) ... 
gets help in writing and editing .. and, even gets help in terms of what 
their results MEAN ...

gives new meaning to the term: "cooperative learning"





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-25 Thread David A. Heiser


- Original Message -
From: T.S. Lim Use-Author-Address-Header@[127.1]
To: [EMAIL PROTECTED]
Sent: Tuesday, April 25, 2000 10:49 AM
Subject: Data Mining blooper


 While hunting for URLs for KDCentral.com, I encountered several
 misleading statements about Statistics made by Data Mining people.
 I've posted 3 of them to my bulletin board. If you encounter other
 wrong remarks, I invite you to post them to the board too at

http://www.recursive-partitioning.com/forums

 Thanks.
'''.
...
This essentially supports my argument over the last few years. The
commercial selling of overpriced black boxes generates so much profit for
these companies that they can make any claim whatsoever, and people will buy
it, just like in politics.

The basic selling line is, "you may be stupid, with absolutely no knowledge
of anything, but if you buy my overpriced $20,000 software, you become a
noted expert in anything. You don't have to know anything to use my software
(or vote for me, or)". It amazes me that college graduates buy this
hook, line and sinker. Then they ask questions on edstat about what does the
output mean.

DAH





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===