On Fri, Mar 18, 2011 at 01:29:10PM -0700, David E. Wheeler wrote:
> Title: Name of a distribution/extension/tag/user
> Abstract: For distributions and extensions
> Content: Description and random docs for distributions,
> documentation body for extensions, distribution names
> for users and tags
> Tags: Tags associated with an distribution
> Metadata: Additional metadata: email addresses, URLs, dates,
> and other stuff associated with a distribution.
Here's how I would express your schema in code:
my $schema = Lucy::Plan::Schema->new;
my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->(language => 'en');
my $fulltext_type = Lucy::Plan::FullTextType(
analyzer => $polyanalyzer,
highlightable => 1, # maybe
);
$schema->spec_field(name => 'Title', type => $fulltext_type);
$schema->spec_field(name => 'Abstract', type => $fulltext_type);
$schema->spec_field(name => 'Content', type => $fulltext_type);
my $pipe_toker = Lucy::Analysis::RegexTokenizer->new(pattern => '[^|]+');
my $pipe_type = Lucy::Plan::FullTextType->new(analyzer => $pipe_toker);
$schema->spec_field(name => 'Tags', type => $pipe_type);
$schema->spec_field(name => 'Metadata', type => $pipe_type);
I think that's the most straightforward way to start out. From there, you can
tweak and try other options as necessary.
> So for those fields that don't apply to a thing, like "tags" for a tag
> object, I'd just provide no value. Otherwise, I'd like to do a full-text
> search on all these fields.
The default behavior of Lucy's QueryParser is to search all indexed fields.
The weighting's going to get a little weird with the Tags and Metadata fields
because of length normalization, but that's something to wrestle with later.
Marvin Humphrey