[ 
https://issues.apache.org/jira/browse/LUCY-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155701#comment-13155701
 ] 

Marvin Humphrey commented on LUCY-182:
--------------------------------------

Thanks for providing a patch to re-solve this bug, Nick. :) I test drove it with
some previously problematic queries using the sample app and couldn't make the
highlighting misbehave.  +1 to commit the Highlighter.c fragment of the patch.

With regards to the HeatMap.c portion of the patch, I like to provide an
argument against committing that and see what you think.  

It was perceptive of you to spot the redundant sort and the redundant array
copy.  However, in the grand scheme of things, Highlighter does a lot of
string concatenation and heap memory allocation/deallocation; I don't think
there is a performance rationale for making a change, as I'll bet that the
cost of that extra sorted array gets lost in the noise.  I've pushed pretty
hard in LUCY-191 on the subject of Normalizer's performance, but that's on an
inner loop which experience has demonstrated has a very significant
performance impact; this is different.

What does removing the sort buy us?  It makes HeatMap's constructor a little
shorter.  However, from the standpoint of external code (including
Highlighter), Heatmap now has a hidden gotcha.  HeatMap's constructor
documentation does not advertise that the array of Spans must be sorted --
that's only documented as a comment.  Furthermore, there's no parameter
verification in place ensuring that the array is sorted; if at some point
someone passes in an unsorted array, HeatMap will malfunction silently and we
must hope that the error gets caught by a unit test.

HeatMap is not a public class (yet), so perhaps it seems a little fussy to
insist on parameter checking for its constructor when it is only called from a
small, bounded set of internal locations.  However, all of Lucy is built this
way: our components are loosely coupled so that to the extent possible, making
a change to one is less likely to affect others, increasing the overall
reliability of the library as a whole.

What do you think, Nick?  Do you see where I'm coming from?

Hopefully it's apparent that I'm going on and on about this not because of the
one particular spot in this patch was a big deal, but because I think strong
modularization and loose coupling are very important design principles that
our software products will benefit from if we uphold, and because I want to 
provide you with feedback worthy of the high quality patches you have been
supplying.  Thanks for reading.  :)

                
> highlighter bug when searching for duplicate terms [wordX wordX]
> ----------------------------------------------------------------
>
>                 Key: LUCY-182
>                 URL: https://issues.apache.org/jira/browse/LUCY-182
>             Project: Lucy
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.1.0 (incubating), 0.2.0 (incubating), 0.2.1 
> (incubating)
>            Reporter: gk
>            Assignee: Marvin Humphrey
>             Fix For: 0.2.2 (incubating), 0.3.0 (incubating)
>
>         Attachments: LUCY-182-nwellnhof.patch, LUCY-182-test.pl, 
> LUCY-182.patch
>
>
> I stumbled onto this one when searching for [business to business].
> Source <TITLE>: ...Companies, Products, Trade Leads, Business Marketplace
> 'to' is a stopword which is ignored - no problem.
> So the query then becomes [business business].  The highlighter then produces:
> ...Companies, Products, Trade Leads, <strong>Business</strong>
> <strong>Marketp</strong>lace
> I then spent some time chasing my tail trying to reduce things down to
> a small reproducible unit, and finally decided to try searching for
> any duplicate [wordX wordX], and sure enough it's reproducible with
> all my indexes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to