Re: Lucene Highlighting and Dynamic Summaries

Amin Mohammed-Coleman Thu, 12 Mar 2009 22:50:21 -0700

Hi

I think that would be good. Probably a silly thing to ask but I guessthere is a performance implication by setting it to max value.


Is there a general setting that other developers use?

Cheers

Amin

On 12 Mar 2009, at 22:03, Michael McCandless<luc...@mikemccandless.com> wrote:

IndexWriter has such behavior too, and because it was such a commontrap(developers could not understand why their content was beingtruncated), we
made that setting explicit, up front so you were aware of it.
I think this in general is a reasonable approach for settings that"lose" stuff (content,
highlighted terms, etc.).

Maybe we should do the same for highlighter?

Mike

Amin Mohammed-Coleman wrote:
I did the following:

highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);


which works.
On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman <ami...@gmail.com>wrote:
JIRA updated. Includes new testcase which shows highlighter notworking as
expected.
On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman <ami...@gmail.com>wrote:
Hi
I have found that it is not issue with POI. I extracted textusing PoI butdifferenlty and the term is extracted properly. When I store thetext andretrieve it the term exists. However running the text throughhighlighter
doesn't work
I will post test case with plain text file on JIRA. Currently ona cramped
train!

Cheers
On 11 Mar 2009, at 18:11, markharw00d <markharw...@yahoo.co.uk>wrote:
If you can supply a Junit test that recreates the problem I thinkwe can
start to make progress on this.



Amin Mohammed-Coleman wrote:
Hi

Apologies for re sending this mail. Just wondering if anyone has
experienced the below. I'm not sure if this could happen duenature ofdocument. It does seem strange one term search returns summarywhile another
does not even though same document is being returned.

I'm asking this so I can code around this if is normal.


Apologies again for re sending this mail

Cheers

Amin

Sent from my iPhone

On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <ami...@gmail.com>
wrote:

Hi
I am seeing some strange behaviour with the highlighter and I'm
wondering if anyone else is experiencing this. In certaininstances I don'tget a summary being generated. I perform the search and thesearch returnsthe correct document. I can see that the lucene documentcontains the text
in the field.  However after doing:

SimpleHTMLFormatter simpleHTMLFormatter = new
SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
        //required for highlighting
        Query query2 = multiSearcher.rewrite(query);
        Highlighter highlighter = new
Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
...

String text= doc.get(FieldNameEnum.BODY.getDescription());
            TokenStream tokenStream =
analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
StringReader(text));
String result =highlighter.getBestFragments(tokenStream,
text, 3, "...");
the string result is empty. This is very strange, if i try adifferentterm that exists in the document then I get a summary. Forexample I have aword document that contains the term "document" and"aspectj". If I searchfor "document" I get the correct document but no highlightedsummary.
However if I search using "aspectj" I get the same doucment with
highlighted summary.
Just to mentioned I do rewrite the original query beforeperforming the
highlighting.
I'm not sure what i'm missing here. Any help would beappreciated.
Cheers
Amin

On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <
ami...@gmail.com> wrote:
Hi

Got it working!  Thanks again for your help!


Amin


On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
ami...@gmail.com> wrote:
Thanks!  The final piece that I needed to do for the project!

Cheers

Amin

On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <u...@thetaphi.de>
wrote:
cool. i will use compression and store in index. is thereanything
special
i need to for decompressing the text? i presume i can just do
doc.get("content")?
thanks for your advice all!
No just use Field.Store.COMPRESS when adding to index and
Document.get()
when fetching. The decompression is automatically done.
You may think, why not enable compression for all fields? Thecase is,
that
this is an overhead for very small and short fields. So youshould only
use
it for large contents (it's the same like compressing verysmall files
as
ZIP/GZIP: These files mostly get larger than withoutcompression).
Uwe
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
270.11.10/1995 - Release Date: 03/11/09 08:28:00
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene Highlighting and Dynamic Summaries

Reply via email to