From: Larry Coffin <[EMAIL PROTECTED]> Date: Mon, 30 Dec 2002 15:11:24 -0500
Hey folks! I apologize that this is off topic, but I thought someone on here might know an answer or be able to point me in the direction of one. No more so than anyone else. ;-} I'm looking for an algorithm for sorting/grouping sets of items based on perception-based comparison. For example, take a set of songs, I'd like to be able to present them people, have them listen to one song, then compare that song to two (or more) other songs, then select the song that it most closely matches. From this, I'd like to build up a list or a graph (or table or whatever you want to call it) that would have related songs "grouped" together. (Although the exact boundaries between groups is not important.) Sounds like a graph to me, though that may not turn out to be the most natural way to model it. I.e. In the end, ideally, you could identify regions in the graph that would correspond to "classical" songs, or "rock" or "blues", etc. At the same time, you could identify, say, male vocals vs. female vocals (perhaps represented as a spread between "male-sounding" to "female-sounding" songs). Likewise, songs with, say, flutes in them would be closer to each other than non-flute songs (whether they were "classical" or "rock" or male or female vocals). This would seem to require some psychological research into the qualities and categories people use to classify music before you could even start to gather data for a computer model. Basically, I see this as a multi-dimensional problem where each dimension represents a particular variable property of the items in the set. But the property each dimension represents (and perhaps the exact number of dimensions) is not defined ahead of time but is "worked out" by the algorithm based on the comparison (by a person who is not comparing based on a stated set of properties). Now you are describing a "feature-space classifier," which ought to be covered by any decent introductory AI text (but I'm afraid I have none to recommend). Except that I'm not sure how one places each datapoint on a given axis if the thing the axis is measuring is "not defined ahead of time"; every axis must have a computable metric, even if doesn't have an obvious interpretation in terms of qualities perceptible to humans. As I understand it, the normal procedure is to define many such measurable properties, and then use the data to decide which are relevant. In a nutshell, feature-space classification assigns a location to each datapoint and then goes looking for clumpings in that Euclidean space. By contrast, graph-based classification starts with only a distance metric, used as the edge weight, and then tries to detect clumping based on that. There is a fairly simple polynomial algorithm for finding graph cliques, by the way -- O(N^3) space and perhaps O(N^4) time, I think. But (to tie it back to to the list topic ;-) I am not aware of any Perl implementations. And this could apply to any set of items whether it was songs, images, blocks of text, etc. Quite true, though the trick is getting relevant numbers out of the raw data. I once made some halting attempts at graph-based classification of "similar" protein sequences, for which there is a well-defined distance metric, though none of it ever worked well enough to be publishable (as code or papers). The lesson is (probably) that a single metric isn't robust enough, at least in that problem domain. Does anyone know of any algorithms or research that might apply to this? Since this is not specifically Perl related, does anyone know of a more appropriate mailing list I can send this to? Sorry, nothing specific. Try searching for "classification" and "AI" and related stuff. Also look for FAQs, such as http://www.faqs.org/faqs/ai-faq/; it sounds like you are just starting on this. -- Bob Rogers http://rgrjr.dyndns.org/ _______________________________________________ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm