[
https://issues.apache.org/jira/browse/SOLR-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211712#comment-13211712
]
Koji Sekiguchi commented on SOLR-3110:
--------------------------------------
Hi Shyam,
>From the mail thread:
{quote}
Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I
see improvements, below is the highlighted value
"The synthesis tool only supports the resolution functions for
<em>std_logic</em> and std_logic_vector."
But in other cases I also see that some of the words break in between as shown
below
Original text: " How Are Clock Gating Checks Inferred"
When searching for the term "clock" the highlighted text is displayed as show
below
"w Are <em>Clock</em> Gating Checks Inferred"
As you can see only w is displayed from the word How.
{quote}
I couldn't reproduce your problem. I'm using trunk. I got the following snippet
that was I expected one:
{code}
<lst name="highlighting">
<lst name="2">
<arr name="includes">
<str> How Are <em>Clock</em> Gating Checks Inferred</str>
</arr>
</lst>
</lst>
{code}
My BoundaryScanner setting is:
{code}
<boundaryScanner name="default"
default="true"
class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">100</str>
<str name="hl.bs.chars">.!?</str>
</lst>
</boundaryScanner>
{code}
My request was:
http://localhost:8983/solr/select?q=clock&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true
I'm using the following sample data that's been provided by Shyam in the mail
thread:
{code}
<add>
<doc>
<field name="id">1</field>
<field name="includes">User-defined resolution functions. The synthesis
tool only supports the</field>
<field name="includes">resolution functions for std_logic and
std_logic_vector.</field>
<field name="includes"></field>
<field name="includes">Slices with range indices that do not evaluate to
constants</field>
</doc>
<doc>
<field name="id">2</field>
<field name="includes"> How Are Clock Gating Checks Inferred</field>
</doc>
</add>
{code}
where includes field, I changed the field to multiValued in example schema.xml.
Can you verify it?
> Search result comes up with truncated words at the start of highlighted
> fragment
> --------------------------------------------------------------------------------
>
> Key: SOLR-3110
> URL: https://issues.apache.org/jira/browse/SOLR-3110
> Project: Solr
> Issue Type: Bug
> Components: highlighter
> Affects Versions: 4.0
> Environment: java Tomcat Solaris
> Reporter: Shyam Bhaskaran
> Labels: FastVectorHighlighter, boundaryScanner, highlighting,
> solr
>
> It is being observed that words are getting truncated at the start of
> Highlighter fragment displayed.
> Following boundary scanner settings are introduced inside in the
> solrconfig.xml file
> <str name="hl.bs.chars">.,!? &\#9;&\#10;&\#13;</str>
> If I change the settings to
> <str name="hl.bs.chars">.,!?</str>
> then it is seen that this issue goes away but another issues comes up where
> the highlighted search fragment does not start from the beginning of the
> sentence.
> Below is the complete list of setting we are using for boundary scanner.
> <boundaryScanner name="simple"
> class="solr.highlight.SimpleBoundaryScanner" default="true">
> <lst name="defaults">
> <str name="hl.bs.maxScan">200</str>
> <str name="hl.bs.chars">.,!? &\#9;&\#10;&\#13;</str>
> </lst>
> </boundaryScanner>
> <boundaryScanner name="breakIterator"
> class="solr.highlight.BreakIteratorBoundaryScanner">
> <lst name="defaults">
> <str name="hl.bs.type">SENTENCE</str>
> <str name="hl.bs.language">en</str>
> <str name="hl.bs.country">US</str>
> </lst>
> </boundaryScanner>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]