Re: Statistics Tool For Classification/Clustering

Jim Snow Wed, 13 Feb 2002 23:10:04 -0800

"Richard Wright" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Genres are presumably groups. So linear combinations of variables that
> best separate the genres would be more effectively found by linear
> canonical variates analysis (aka discriminant analysis).
>
> Richard Wright
>
>
> On Thu, 14 Feb 2002 03:18:48 GMT, "Jim Snow" <[EMAIL PROTECTED]>
> wrote:
>
>
> snipped


> >        My inclination would be to start with an Andrews plot, possibly
> >using principal component scores for about 20 music files from several
> >genres. This will enable you to find linear combinations of variable
which
> >best separate the genres. The technique and examples is set out in:
> snipped
>

     Andrews plots and similar techniques do not replace discriminant
analysis, which , as Richard Wright said " finds  linear combinations of
variables that best separate the variables" . In the book by Gnanadesikan
which first popularised the technique, he examines the variables in the
discriminant space, ie a space defined by discriminant functions rather than
principal components or original variables.
    The techniques are doing different things.
 Andrews plots are to enable examination of the multidimensional data in a
two dimensional plot. Amongst other things, for example, several dimensions
of high difference between say jazz and pop or between jazz and flamenco may
be found,which are not necessarily orthogonal.
    Andrews plots are a data reduction technique which is ,in many
dimensions, analogous to examining a multi dimensional cluster of points
from many viewpoints ,so that no possible view point is far from one of
those used. Thus virtually all possible discriminant functions are tried and
the interesting ones noted. In a spirit of exploratory data analysis, this
seems useful.
    RishadhGupta wrote:
-" The variables can be used to perform comparison between two files. A
variable from one music file can be compared to the equivalent variable in
the other music file. By comparing all the variables in one file with the
equivalent variable in the other file, an overall similarity measurement can
be obtained."

Andrews plots reveal the directions in which the two files differ.
Incidentally, the total area between the two traces on the plot is the
Euclidean distance, I think, if the original Andrews weightings are used.
Tukey suggested weightings which examine the multidimensional space more
closely but do not have such a simple interpretation of the difference
between traces. I have not used any of this for some time and I do not have
relevant books, but the material I referred to on the web should be helpful.

    Straightforward discriminant analysis will certainly find the best
linear discriminator in the least squares sense, but stepwise elimination of
variables in this process may result in discarding a variable with intuitive
appeal in favour of one or several highly correlated with it and the least
squares metric may possibly not be the best. For this and other reasons an
exploratory approach as Rishabh Gupta has begun seems appropriate.

   I still hope this helps   Jim Snow






=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Re: Statistics Tool For Classification/Clustering

Reply via email to