On 3/24/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote: > atm the index fields are configured for each publication: > <index id="default-live" analyzer="stopword_en" > directory="lenya/pubs/default/work/lucene/index/live/index"> > <structure> > <field id="url" type="keyword" /> > <field id="title" type="text" storetext="true"/> > <field id="description" type="text" storetext="true"/> > <field id="subject" type="keyword" storetext="true" /> > <field id="body" type="text" storetext="true"/> > </structure> > </index> > IMO this is an inappropriate place for this configuration. Furthermore, > it has to match the index XSLTs of all resource types. > > Wouldn't it be better to > - index all meta data fields > - configure the indexable fields for each resource type (have to > conform to the corresponding index XSLTs) > > The index structure would be automatically derived from this > configuration (basically the union of all fields). Changing the meta > data or resource type configuration would certainly require to re-index > the whole content of the web application, but IMO this is not a big issue. > > WDYT? > -- Andreas
I agree one configuration for all publications is a worthy goal. My version was an add-on to a Lenya 1.2.2 Publication and so was not concerned with integration into core Lenya. The current implementation may be derivative. Extracting all data from any document is great for the search terms. All text should be included, or do we have field-level security? Should all properties be included? Should the properties be associated with the field (element) name? IIRC, the search index concatenates all text and ignores all tags and attributes. The configuration specifies which fields are available for results. The results (I built) include: - URI, and - Title or Subject and - Description or the beginning of the Body. Some flexibility was added by using one of two fields if available. These fields are needed to build useful results. Standardizing on these field names would allow the current configuration to be used for all Resource Types. So: 1. The index is already based on the union of all fields. 2. Standardizing the configuration requires minimal standardization of the Resource Types. solprovider --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
