On 3/24/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] schrieb:
> > On 3/24/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> >> atm the index fields are configured for each publication:
> >> <index id="default-live" analyzer="stopword_en"
> >> directory="lenya/pubs/default/work/lucene/index/live/index">
> >> <structure>
> >> <field id="url" type="keyword" />
> >> <field id="title" type="text" storetext="true"/>
> >> <field id="description" type="text" storetext="true"/>
> >> <field id="subject" type="keyword" storetext="true" />
> >> <field id="body" type="text" storetext="true"/>
> >> </structure>
> >> </index>
> >> IMO this is an inappropriate place for this configuration. Furthermore,
> >> it has to match the index XSLTs of all resource types.
> >>
> >> Wouldn't it be better to
> >> - index all meta data fields
> >> - configure the indexable fields for each resource type (have to
> >> conform to the corresponding index XSLTs)
> >>
> >> The index structure would be automatically derived from this
> >> configuration (basically the union of all fields). Changing the meta
> >> data or resource type configuration would certainly require to re-index
> >> the whole content of the web application, but IMO this is not a big
> issue.
> >>
> >> WDYT?
> >> -- Andreas
> >
> > I agree one configuration for all publications is a worthy goal. My
> > version was an add-on to a Lenya 1.2.2 Publication and so was not
> > concerned with integration into core Lenya. The current
> > implementation may be derivative.
> >
> > Extracting all data from any document is great for the search terms.
> > All text should be included, or do we have field-level security?
> No, we only have document-level security.
The question was to provoke thought about future enhancements. (I
have not planned to include field-level security in Lenya-1.3.0, but
my planning includes not adding obstacles for recognized possible
improvements.) Security requires three functions:
1. Hide unauthorized information, handled by the display system.
2. Hide unauthorized pages from menus, handled by the navigation system.
3. Prevent search from using unauthorized information. This must be
handled by the search system (our current topic.) The most difficult
aspect of developing field-level (or any) security is preventing
search from creating security holes so mentioning possible
enhancements seemed useful to this discussion.
> > Should all properties be included? Should the properties be
> > associated with the field (element) name?
> ATM this is up to the resource type (done using a
> {resourceType}2index.xsl stylesheet), and IMO we can leave it like this,
> e.g. map
> <person>
> <name>Henry Hamster</name>
> </person>
> to field
> <lucene:document>
> <lucene:field name="personName">Henry Hamster</lucene:field>
> </lucene:document>
>
> It would be nice to have namespaced field names, though, to avoid
> clashes (see my other mail).
> -- Andreas
No special work is needed since search indexes all text and "Henry
Hamster" is text.
Search indexing issues arise when the data is stored:
<author name="Gabby Gerbil"/>
rather than:
<author>Gabby Gerbil</author>
Or are you concerned with searches based on particular fields, such as
searching by author? Many search systems have stopped providing those
options from apparent lack of use; people dislike organizing search
terms into multiple fields.
solprovider
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]