Howdy Lucites,

I'm starting work on an index for PGXN. Not heard of PGXN? Think of it as CPAN 
for PostgreSQL.

  http://pgxn.org/

Anyway, the things I want to index are:

* Distributions. Includes name, version, tags, abstract, description, user, and 
some other stuff.

* Extensions (modules in CPAN-speak). Mainly documentation in HTML.

* Tags. Contains a list of distributions associated with tags.

* User. Includes name, email, URL, twitter nick, and a list of distributions.

* Documentation: Random docs associated with a distribution but not a specific 
extension

By default, a user will be able to search all these things at once. So I was 
thinking that I'd have just one schema/index, and use categories to separate 
the different objects. Given that, I was thinking of a schema with:

Title:     Name of a distribution/extension/tag/user
Abstract:  For distributions and extensions
Content:   Description and random docs for distributions,
           documentation body for extensions, distribution names
           for users and tags
Tags:      Tags associated with an distribution
Metadata:  Additional metadata: email addresses, URLs, dates,
           and other stuff associated with a distribution.

So for those fields that don't apply to a thing, like "tags" for a tag object, 
I'd just provide no value. Otherwise, I'd like to do a full-text search on all 
these fields.

So, does this seem like a reasonable search schema? I would appreciate any 
feedback and suggestions.

Thanks!

David

Reply via email to