If that isn't LCSH, then is the entirety of LCSH available electronically in
some capacity (at least available in some easily accessible file or files
that can be processed)?


>> The frequency of an LCSH term within the LC catalog could also be
>> useful for ranking, although I'm not sure if such data would be
>> readily available.
> Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a
> copy of the LC subject authority file. The entries in this file form the
> basis for subject headings, most of which add "facets" to the authority
> entry when forming the subject heading. One could do a left-anchored match
> against actual headings, and that might provide some interesting statistics.
> Edward Betts of the Open Library project did some casual data gathering for
> subjects, and posted his "top 1000" subject headings (not subject
> authorities):
> http://edwardbetts.com/ol/top_1000_subjects
> The OL has decided to break up the subject headings into their subfields,
> and somewhere there are some pages that show some subfields with the highest
> ranking subfields they appear with. (There must be a better way to say that!
> Sorry, too early, too few cups of tea.) One example is here:
> http://home.us.archive.org/~edward/related/Cheese.html<http://home.us.archive.org/%7Eedward/related/Cheese.html>
> I think that something like this will be incorporated into the next version
> of OL, which will be heavily navigation-oriented rather than
> search-oriented.
> kc
> p.s. Anyone who wants to play with a file can grab the OL data export:
> http://openlibrary.org/dev/docs/jsondump
> Unfortunately it includes both LC and non-LC subjects (mainly BISAC from
> Amazon)
>> Another possibility would be a simple count of broader terms +
>> narrower terms + related terms or something like that.  Although
>> PageRank would probably be better, since even some "important" terms
>> might have a relatively small number of immediately-adjacent links.
>> Keith
