"Richard Wright" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > Genres are presumably groups. So linear combinations of variables that > best separate the genres would be more effectively found by linear > canonical variates analysis (aka discriminant analysis). > > Richard Wright > > > On Thu, 14 Feb 2002 03:18:48 GMT, "Jim Snow" <[EMAIL PROTECTED]> > wrote: > > > snipped
> > My inclination would be to start with an Andrews plot, possibly > >using principal component scores for about 20 music files from several > >genres. This will enable you to find linear combinations of variable which > >best separate the genres. The technique and examples is set out in: > snipped > Andrews plots and similar techniques do not replace discriminant analysis, which , as Richard Wright said " finds linear combinations of variables that best separate the variables" . In the book by Gnanadesikan which first popularised the technique, he examines the variables in the discriminant space, ie a space defined by discriminant functions rather than principal components or original variables. The techniques are doing different things. Andrews plots are to enable examination of the multidimensional data in a two dimensional plot. Amongst other things, for example, several dimensions of high difference between say jazz and pop or between jazz and flamenco may be found,which are not necessarily orthogonal. Andrews plots are a data reduction technique which is ,in many dimensions, analogous to examining a multi dimensional cluster of points from many viewpoints ,so that no possible view point is far from one of those used. Thus virtually all possible discriminant functions are tried and the interesting ones noted. In a spirit of exploratory data analysis, this seems useful. RishadhGupta wrote: -" The variables can be used to perform comparison between two files. A variable from one music file can be compared to the equivalent variable in the other music file. By comparing all the variables in one file with the equivalent variable in the other file, an overall similarity measurement can be obtained." Andrews plots reveal the directions in which the two files differ. Incidentally, the total area between the two traces on the plot is the Euclidean distance, I think, if the original Andrews weightings are used. Tukey suggested weightings which examine the multidimensional space more closely but do not have such a simple interpretation of the difference between traces. I have not used any of this for some time and I do not have relevant books, but the material I referred to on the web should be helpful. Straightforward discriminant analysis will certainly find the best linear discriminator in the least squares sense, but stepwise elimination of variables in this process may result in discarding a variable with intuitive appeal in favour of one or several highly correlated with it and the least squares metric may possibly not be the best. For this and other reasons an exploratory approach as Rishabh Gupta has begun seems appropriate. I still hope this helps Jim Snow ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================