[jira] [Commented] (OAK-4368) Excerpt extraction from the Lucene index should be more selective

Tommaso Teofili (JIRA) Tue, 17 May 2016 01:12:40 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286204#comment-15286204
 ]


Tommaso Teofili commented on OAK-4368:
--------------------------------------

the fallback would apply to:
- previous version of the index that don't have offsets
- full text Lucene indexes whose properties are not analyzed and therefore the 
Lucene fields are of type {{TextField}}

> Excerpt extraction from the Lucene index should be more selective
> -----------------------------------------------------------------
>
>                 Key: OAK-4368
>                 URL: https://issues.apache.org/jira/browse/OAK-4368
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>    Affects Versions: 1.0.30, 1.2.14, 1.4.2, 1.5.2
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.5.3
>
>         Attachments: OAK-4368.0.patch
>
>
> Lucene index can be used in order to extract _rep:excerpt_ using 
> {{Highlighter}}.
> The current implementation may suffer performance issues when the result set 
> of the original query contains a lot of results, each of them possibly 
> containing lots of (stored) properties that get passed to the highlighter in 
> order to try to extract the excerpt; such a process doesn't stop as soon as 
> the first excerpt is found so that excerpt is composed using text from all 
> stored properties in all results (if there's a match on the query).
> While we can accept some cost of extracting excerpt at query time (whereas it 
> was generated at excerpt retrieval time before OAK-3580, e.g. via 
> _row.getValue("rep:excerpt")_) , that should be bounded and mitigated as much 
> as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-4368) Excerpt extraction from the Lucene index should be more selective

Reply via email to