[jira] [Comment Edited] (OAK-3580) Make it possible to use indexes for providing excerpts

Tommaso Teofili (JIRA) Fri, 06 Nov 2015 02:30:37 -0800

    [ 
https://issues.apache.org/jira/browse/OAK-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993470#comment-14993470
 ]


Tommaso Teofili edited comment on OAK-3580 at 11/6/15 10:29 AM:
----------------------------------------------------------------

attached first patch which relies on indexes for generating the _rep:excerpt_ 
whenever possible.

When retrieving the row value, if _rep:excerpt(.)_ is used the generated value 
is returned if available, otherwise if it's not available or if some property 
level excerpt is required, e.g. _rep:excerpt(text)_, the existing 
{{SimpleExcerptProvider}} is used as a fallback mechanism.

Implementation wise Lucene index uses default Lucene's {{Highlighter}} 
implementation which relies on field values being stored, however it may be 
good to switch to {{PostingsHighlighter}} as that is supposed to be faster and 
doesn't require stored values, but just offsets and positions to be available 
for indexed terms, see this 
[blogpost|http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html].
In Solr the test configuration uses the default highlighter, but that can be 
changed in solrconfig.xml in order to use fast vector or postings highlighters.


was (Author: teofili):
attached first patch which relies on indexes for generating the _rep:excerpt_ 
whenever possible.

When retrieving the row value, if _rep:excerpt(.)_ is used the generated value 
is returned if available, otherwise if it's not available or if some property 
level excerpt is required, e.g. _rep:excerpt(text)_, the existing 
{{SimpleExcerptProvider}} is used as a fallback mechanism.

Implementation wise Lucene index uses default Lucene's {{Highlighter}} 
implementation which relies on field values being stored, however it may be 
good to switch to {{PostingsHighlighter}} as that is supposed to be faster and 
doesn't require stored values, but just offsets and positions to be available 
for indexed terms.
In Solr the test configuration uses the default highlighter, but that can be 
changed in solrconfig.xml in order to use fast vector or postings highlighters.

> Make it possible to use indexes for providing excerpts
> ------------------------------------------------------
>
>                 Key: OAK-3580
>                 URL: https://issues.apache.org/jira/browse/OAK-3580
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query, solr
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.3.10
>
>         Attachments: OAK-3580.1.patch
>
>
> Currently {{SimpleExcerptProvider}} always provides excerpt, regardless of 
> the underlying index used for the query, this having the limitation of not 
> working with binaries.
> Because of that it'd be good to leverage existing indexes capabilities to use 
> their highlighter implementations to provide excerpt support, also because 
> Lucene and Solr Oak indexes already perform full text extraction from 
> binaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (OAK-3580) Make it possible to use indexes for providing excerpts

Reply via email to