[ 
https://issues.apache.org/jira/browse/SOLR-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seunghan Jung resolved SOLR-17474.
----------------------------------
    Resolution: Duplicate

This should have been posted to the “Lucene” project, but it was mistakenly 
posted to the “Solr” project. We are working on it in the following PR: 
https://github.com/apache/lucene/pull/13832.

> The snippet formatting does not work as intended when PassageSort is not 
> startOffset.
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-17474
>                 URL: https://issues.apache.org/jira/browse/SOLR-17474
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: main (10.0)
>            Reporter: Seunghan Jung
>            Priority: Critical
>         Attachments: image-2024-10-03-04-14-50-717.png
>
>
> DefaultPassageFormatter.format method에, 다음과 같이 (startOffset이 아닌 score순으로) 
> 정렬되어 있는 passages가 주어졌다고 합시다.
> !image-2024-10-03-04-14-50-717.png!
> 이때 content는 "When indexing data in Solr, each document is composed of various 
> fields. A document essentially represents a single record, and each document 
> typically contains a unique ID field." 이므로 각 Passage는 다음과 같습니다.
>  * Passages[0] -> "A document essentially represents a single record, and 
> each document typically contains a unique ID field."
>  * Passages[1] -> "When indexing data in Solr, each document is composed of 
> various fields. "
>  
> 의도한 formatting 결과는 다음과 같습니다.
> "A <b>document</b> essentially represents a single record, and each 
> <b>document</b> typically contains a unique ID field.\{{ellipsis}}When 
> indexing data in Solr, each <b>document</b> is composed of various fields."
>  
> 하지만 두 passage가 이어져 있는지 판단하는 조건문이 passages가 startOffset으로 정렬을 전제로 작성되어 있어, 두 
> passage가 ellipsis로 구분되어지지 않고 연결되어 하나의 snippet이 되어 버립니다.
>  
> ""A <b>document</b> essentially represents a single record, and each 
> <b>document</b> typically contains a unique ID field.When indexing data in 
> Solr, each <b>document</b> is composed of various fields."
>  
> 이에 해당 조건문을 수정합니다.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to