On Apr 11, 2006, at 4:11 PM, Colleen Whitney wrote:

Jonathan Rochkind wrote:

not the right approach. And yet...I wish I could explain why it
seems as
though the clustering can tell us something.


Well, what is it you think the clustering can tell you something
_about_?  This is an interesting topic to me.

I'm not sure the clustering can tell you anything about relevance to
the user. I'm not seeing it. I mean, the number of items that are
members of a FRBR work set really just indicates how many 'versions'
(to be imprecise) of that work exist. But the number of 'versions' of
a work that exist doesn't really predict how likely that work (or any
of it's versions) is to be of interest to a user, does it?  But maybe
you're thinking of something I'm missing, I'm curious what you're
thinking about.

Yes, that's exactly what I'm stuck on.  If "more important" or "more
popular" works tend to have more manifestations, then there might be
some signal as to probability of relevance in there.  Which could be
factored in (in some *small* way).  But I'm not sure whether/how one
would test that "if".  At the moment you have me convinced that it's a
red herring.

Perhaps there is something useful about grouping and highlighting
works that have a large number of manifestations.  My gut tells me
that this would be more useful for a general audience than for
specialized researchers.  But you don't necessarily have to factor
this into your default search relevance algorithm to expose it.

Just speculating, but could one use the term "classics" to describe
works with an exceeding large number of manifestations?  Maybe this
could be a useful post-search sort option.  Or maybe you can define a
high-manifestation threshold for your collection... if the user's
search term matches any of these items, they are highlighted on the
search results page in a separate bucket.  Perhaps some people would
appreciate such a filtering service.

This may also apply for other specialized search needs.  Rather than
complicate (dilute?) your relevance algorithm by adding in factors of
relevance only to a particular audience, why not develop targeted
discovery services that complement the search results?

Tito Sierra

Reply via email to