Re: Configuring the search index for each publication really necessary?

solprovider Mon, 24 Mar 2008 11:01:43 -0700

On 3/24/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
>  atm the index fields are configured for each publication:
>    <index id="default-live" analyzer="stopword_en"
>  directory="lenya/pubs/default/work/lucene/index/live/index">
>      <structure>
>        <field id="url" type="keyword" />
>        <field id="title" type="text" storetext="true"/>
>        <field id="description" type="text" storetext="true"/>
>        <field id="subject" type="keyword" storetext="true" />
>        <field id="body" type="text" storetext="true"/>
>      </structure>
>    </index>
>  IMO this is an inappropriate place for this configuration. Furthermore,
>  it has to match the index XSLTs of all resource types.
>
>  Wouldn't it be better to
>  - index all meta data fields
>  - configure the indexable fields for each resource type (have to
>    conform to the corresponding index XSLTs)
>
>  The index structure would be automatically derived from this
>  configuration (basically the union of all fields). Changing the meta
>  data or resource type configuration would certainly require to re-index
>  the whole content of the web application, but IMO this is not a big issue.
>
>  WDYT?
>  -- Andreas


I agree one configuration for all publications is a worthy goal.  My
version was an add-on to a Lenya 1.2.2 Publication and so was not
concerned with integration into core Lenya.  The current
implementation may be derivative.

Extracting all data from any document is great for the search terms.
All text should be included, or do we have field-level security?
Should all properties be included?  Should the properties be
associated with the field (element) name?  IIRC, the search index
concatenates all text and ignores all tags and attributes.

The configuration specifies which fields are available for results.
The results (I built) include:
- URI, and
- Title or Subject and
- Description or the beginning of the Body.
Some flexibility was added by using one of two fields if available.
These fields are needed to build useful results.   Standardizing on
these field names would allow the current configuration to be used for
all Resource Types.

So:
1. The index is already based on the union of all fields.
2. Standardizing the configuration requires minimal standardization of
the Resource Types.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Configuring the search index for each publication really necessary?

Reply via email to