[Boston.pm] OT: Algorithm for sorting/grouping sets?

Bob Rogers Mon, 30 Dec 2002 14:09:58 -0800

   From: Larry Coffin <[EMAIL PROTECTED]>
   Date: Mon, 30 Dec 2002 15:11:24 -0500


           Hey folks!

           I apologize that this is off topic, but I thought someone on here
   might know an answer or be able to point me in the direction of one.

No more so than anyone else.  ;-}

           I'm looking for an algorithm for sorting/grouping sets of items
   based on perception-based comparison.

           For example, take a set of songs, I'd like to be able to present
   them people, have them listen to one song, then compare that song to two
   (or more) other songs, then select the song that it most closely matches.

           From this, I'd like to build up a list or a graph (or table or
   whatever you want to call it) that would have related songs "grouped"
   together. (Although the exact boundaries between groups is not important.)

Sounds like a graph to me, though that may not turn out to be the most
natural way to model it.

           I.e. In the end, ideally, you could identify regions in the graph
   that would correspond to "classical" songs, or "rock" or "blues", etc. At
   the same time, you could identify, say, male vocals vs. female vocals
   (perhaps represented as a spread between "male-sounding" to
   "female-sounding" songs). Likewise, songs with, say, flutes in them would
   be closer to each other than non-flute songs (whether they were "classical"
   or "rock" or male or female vocals).

This would seem to require some psychological research into the
qualities and categories people use to classify music before you could
even start to gather data for a computer model.

           Basically, I see this as a multi-dimensional problem where each
   dimension represents a particular variable property of the items in the
   set. But the property each dimension represents (and perhaps the exact
   number of dimensions) is not defined ahead of time but is "worked out" by
   the algorithm based on the comparison (by a person who is not comparing
   based on a stated set of properties).

Now you are describing a "feature-space classifier," which ought to be
covered by any decent introductory AI text (but I'm afraid I have none
to recommend).  Except that I'm not sure how one places each datapoint
on a given axis if the thing the axis is measuring is "not defined ahead
of time"; every axis must have a computable metric, even if doesn't have
an obvious interpretation in terms of qualities perceptible to humans.
As I understand it, the normal procedure is to define many such
measurable properties, and then use the data to decide which are
relevant.

   In a nutshell, feature-space classification assigns a location to
each datapoint and then goes looking for clumpings in that Euclidean
space.  By contrast, graph-based classification starts with only a
distance metric, used as the edge weight, and then tries to detect
clumping based on that.  There is a fairly simple polynomial algorithm
for finding graph cliques, by the way -- O(N^3) space and perhaps O(N^4)
time, I think.  But (to tie it back to to the list topic ;-) I am not
aware of any Perl implementations.

           And this could apply to any set of items whether it was songs,
   images, blocks of text, etc.

Quite true, though the trick is getting relevant numbers out of the raw
data.  I once made some halting attempts at graph-based classification
of "similar" protein sequences, for which there is a well-defined
distance metric, though none of it ever worked well enough to be
publishable (as code or papers).  The lesson is (probably) that a single
metric isn't robust enough, at least in that problem domain.

           Does anyone know of any algorithms or research that might apply to
   this? Since this is not specifically Perl related, does anyone know of a
   more appropriate mailing list I can send this to?

Sorry, nothing specific.  Try searching for "classification" and "AI"
and related stuff.  Also look for FAQs, such as
http://www.faqs.org/faqs/ai-faq/; it sounds like you are just starting
on this.

                                        -- Bob Rogers
                                           http://rgrjr.dyndns.org/
_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] OT: Algorithm for sorting/grouping sets?

Reply via email to