Re: [lucy-dev] A Schema for PGXN

Marvin Humphrey Fri, 18 Mar 2011 14:39:10 -0700

On Fri, Mar 18, 2011 at 01:29:10PM -0700, David E. Wheeler wrote:
> Title:     Name of a distribution/extension/tag/user
> Abstract:  For distributions and extensions
> Content:   Description and random docs for distributions,
>            documentation body for extensions, distribution names
>                for users and tags
> Tags:      Tags associated with an distribution
> Metadata:  Additional metadata: email addresses, URLs, dates,
>            and other stuff associated with a distribution.


Here's how I would express your schema in code:

    my $schema = Lucy::Plan::Schema->new;
    my $polyanalyzer  = Lucy::Analysis::PolyAnalyzer->(language => 'en');
    my $fulltext_type = Lucy::Plan::FullTextType(
        analyzer      => $polyanalyzer,
        highlightable => 1,             # maybe
    );
    $schema->spec_field(name => 'Title',    type => $fulltext_type);
    $schema->spec_field(name => 'Abstract', type => $fulltext_type);
    $schema->spec_field(name => 'Content',  type => $fulltext_type);
    my $pipe_toker = Lucy::Analysis::RegexTokenizer->new(pattern => '[^|]+'); 
    my $pipe_type  = Lucy::Plan::FullTextType->new(analyzer => $pipe_toker);
    $schema->spec_field(name => 'Tags',     type => $pipe_type);
    $schema->spec_field(name => 'Metadata', type => $pipe_type);

I think that's the most straightforward way to start out.  From there, you can
tweak and try other options as necessary.

> So for those fields that don't apply to a thing, like "tags" for a tag
> object, I'd just provide no value. Otherwise, I'd like to do a full-text
> search on all these fields.

The default behavior of Lucy's QueryParser is to search all indexed fields.
The weighting's going to get a little weird with the Tags and Metadata fields
because of length normalization, but that's something to wrestle with later.

Marvin Humphrey

Re: [lucy-dev] A Schema for PGXN

Reply via email to