Re: [Boston.pm] OT: Algorithm for sorting/grouping sets?

Ted Zlatanov Wed, 01 Jan 2003 22:08:09 -0800

On Mon, 30 Dec 2002, [EMAIL PROTECTED] wrote:
>       For example, take a set of songs, I'd like to be able to
> present them people, have them listen to one song, then compare that
> song to two (or more) other songs, then select the song that it most
> closely matches.
> 
>       From this, I'd like to build up a list or a graph (or table
> or whatever you want to call it) that would have related songs
> "grouped" together. (Although the exact boundaries between groups is
> not important.)
> 
>       I.e. In the end, ideally, you could identify regions in the
> graph that would correspond to "classical" songs, or "rock" or
> "blues", etc. At the same time, you could identify, say, male vocals
> vs. female vocals (perhaps represented as a spread between
> "male-sounding" to "female-sounding" songs). Likewise, songs with,
> say, flutes in them would be closer to each other than non-flute
> songs (whether they were "classical" or "rock" or male or female
> vocals).
> 
>       Basically, I see this as a multi-dimensional problem where
> each dimension represents a particular variable property of the
> items in the set. But the property each dimension represents (and
> perhaps the exact number of dimensions) is not defined ahead of time
> but is "worked out" by the algorithm based on the comparison (by a
> person who is not comparing based on a stated set of properties).


The graph structures are an obvious solution, but they can be slow
when you get to sizeable data sets.  It doesn't sound like you really
need a graph from your description, though.

If you can express the song classification as a bit string (and that
doesn't mean the individual song classifications are binary), storage
and searching are much easier.  The Perl bit strings (perldoc -f vec)
are much faster than searching through data structures for this
particular application.  You can store the bit string in a database,
either decomposed into individual database columns or as a single
string.  If your list of fields is static enough that you can
transform it into database columns, that's the better way to go IMO,
and then use something like Class::DBI that can do the legwork for
you.

You can easily find the distance between bit strings by looking at the
string distance.  You can see if two particular bit strings are
similar with String::Approx for example.  The "graph regions" are
simply string distances localized to intervals of the bit string.

As an added bonus, you could use a genetic algorithm to breed better
songs :)

Ted

_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] OT: Algorithm for sorting/grouping sets?

Reply via email to