Howdy Lucites, I'm starting work on an index for PGXN. Not heard of PGXN? Think of it as CPAN for PostgreSQL.
http://pgxn.org/ Anyway, the things I want to index are: * Distributions. Includes name, version, tags, abstract, description, user, and some other stuff. * Extensions (modules in CPAN-speak). Mainly documentation in HTML. * Tags. Contains a list of distributions associated with tags. * User. Includes name, email, URL, twitter nick, and a list of distributions. * Documentation: Random docs associated with a distribution but not a specific extension By default, a user will be able to search all these things at once. So I was thinking that I'd have just one schema/index, and use categories to separate the different objects. Given that, I was thinking of a schema with: Title: Name of a distribution/extension/tag/user Abstract: For distributions and extensions Content: Description and random docs for distributions, documentation body for extensions, distribution names for users and tags Tags: Tags associated with an distribution Metadata: Additional metadata: email addresses, URLs, dates, and other stuff associated with a distribution. So for those fields that don't apply to a thing, like "tags" for a tag object, I'd just provide no value. Otherwise, I'd like to do a full-text search on all these fields. So, does this seem like a reasonable search schema? I would appreciate any feedback and suggestions. Thanks! David
