[jira] [Commented] (SOLR-3110) Search result comes up with truncated words at the start of highlighted fragment

Koji Sekiguchi (Commented) (JIRA) Mon, 20 Feb 2012 00:01:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211712#comment-13211712
 ]


Koji Sekiguchi commented on SOLR-3110:
--------------------------------------

Hi Shyam,

>From the mail thread:

{quote}
Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I 
see improvements, below is the highlighted value

"The synthesis tool only supports the resolution functions for 
<em>std_logic</em> and std_logic_vector."


But in other cases I also see that some of the words break in between as shown 
below

Original text: " How Are Clock Gating Checks Inferred"

When searching for the term "clock" the highlighted text is displayed as show 
below

"w Are <em>Clock</em> Gating Checks Inferred"

As you can see only w is displayed from the word How.
{quote}

I couldn't reproduce your problem. I'm using trunk. I got the following snippet 
that was I expected one:

{code}
<lst name="highlighting">
  <lst name="2">
    <arr name="includes">
      <str> How Are <em>Clock</em> Gating Checks Inferred</str>
    </arr>
  </lst>
</lst>
{code}

My BoundaryScanner setting is:

{code}
<boundaryScanner name="default"
                 default="true"
                 class="solr.highlight.SimpleBoundaryScanner">
  <lst name="defaults">
    <str name="hl.bs.maxScan">100</str>
    <str name="hl.bs.chars">.!?</str>
  </lst>
</boundaryScanner>
{code}

My request was:

http://localhost:8983/solr/select?q=clock&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true

I'm using the following sample data that's been provided by Shyam in the mail 
thread:

{code}
<add>
  <doc>
    <field name="id">1</field>
    <field name="includes">User-defined resolution functions. The synthesis 
tool only supports the</field>
    <field name="includes">resolution functions for std_logic and 
std_logic_vector.</field>
    <field name="includes"></field>
    <field name="includes">Slices with range indices that do not evaluate to 
constants</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="includes"> How Are Clock Gating Checks Inferred</field>
  </doc>
</add>
{code}

where includes field, I changed the field to multiValued in example schema.xml.

Can you verify it?


                
> Search result comes up with truncated words at the start of highlighted 
> fragment
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-3110
>                 URL: https://issues.apache.org/jira/browse/SOLR-3110
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.0
>         Environment: java Tomcat Solaris
>            Reporter: Shyam Bhaskaran
>              Labels: FastVectorHighlighter, boundaryScanner, highlighting, 
> solr
>
> It is being observed that words are getting truncated at the start of 
> Highlighter fragment displayed. 
> Following boundary scanner settings are introduced inside in the 
> solrconfig.xml file
> <str name="hl.bs.chars">.,!?  &\#9;&\#10;&\#13;</str>  
> If I change the settings to 
> <str name="hl.bs.chars">.,!?</str>
> then it is seen that this issue goes away but another issues comes up where 
> the highlighted search fragment does not start from the beginning of the 
> sentence.
> Below is the complete list of setting we are using for boundary scanner.
>    <boundaryScanner name="simple" 
> class="solr.highlight.SimpleBoundaryScanner" default="true">
>      <lst name="defaults">
>        <str name="hl.bs.maxScan">200</str>
>        <str name="hl.bs.chars">.,!? &\#9;&\#10;&\#13;</str>
>      </lst>
>    </boundaryScanner>
>    <boundaryScanner name="breakIterator" 
> class="solr.highlight.BreakIteratorBoundaryScanner">
>      <lst name="defaults">
>        <str name="hl.bs.type">SENTENCE</str>
>        <str name="hl.bs.language">en</str>
>        <str name="hl.bs.country">US</str>
>      </lst>
>    </boundaryScanner>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3110) Search result comes up with truncated words at the start of highlighted fragment

Reply via email to