[jira] [Issue Comment Edited] (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Jamie Johnson (JIRA) Fri, 15 Jul 2011 14:13:24 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066207#comment-13066207
 ]


Jamie Johnson edited comment on SOLR-1954 at 7/15/11 9:12 PM:
--------------------------------------------------------------

I know this has been awhile, but I had a need for something like this and while 
I implemented (and added it to SOLR-1397) I figured I'd try this out instead as 
well.  After applying the patch I got the following response

<lst name="highlighting">
    <lst name="1">
        <arr name="subject_phonetic">
            <str><em>Test</em> subject message</str>
        </arr>
        <arr name="subject_phonetic_startPos"><int>0</int></arr>
        <arr name="subject_phonetic_endPos"><int>29</int></arr>
    </lst>
</lst>

seems that the startPos is always 0 and endPos is the length of the field 
including the highlighting start/end tags.  Is this expected?

      was (Author: jej2003):
    I know this has been awhile, but I had a need for something like this and 
while I implemented (and added it to SOLR-1397) I figured I'd try this out 
instead as well.  After applying the patch I got the following response

<lst name="highlighting">
    <lst name="1">
        <arr name="subject_phonetic">
            <str><em>Test</em> subject message</str>
        </arr>
        <arr name="subject_phonetic_startPos"><int>0</int></arr>
        <arr name="subject_phonetic_endPos"><int>29</int></arr>
    </lst>
</lst>

seems that the endPos is the length of the field including the highlighting 
start/end tags.  Is this expected?
  
> Highlighter component should expose snippet character offsets and the score.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1954
>                 URL: https://issues.apache.org/jira/browse/SOLR-1954
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> <lst name="highlighting">
>  <lst name="VS1GB400C3">
>   <arr name="text">
>       <str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt; 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>
>   </arr>
>  </lst>
> </lst>
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

Reply via email to