Hi Mark,
I have solved it in another way now. I've created my own implementation
of StandardAnalyzer (which I've called AdvancedAnalyzer). This analyzer
keeps the word "zone-indeling" together, so users can simply search for
this term and it will be highlighted exactly as is. These compound words
occur in Dutch only I presume.
The problem was my highlighter was for a PDF document (through PDFBox).
This would highlight 1,2 and "indeling". Resulting in unwanted behaviour
when the word was split into zone and indeling.
Thanks for your help though.
Cheers,
Matthijs
mark harwood wrote:
It would be useful to have more details about the query input and the expected
highlights you want.
So given your 'zone-indeling' example document and the index-time tokenisation
you described, which of the following queries would you expect to match and
what would you want highlighted in each case?
1) zone
2) zone-indeling
3) "zone indeling"
4) zone-somethingElse
My assumption here is that you are using the standard Lucene Query parser and that query 3 will therefore be a phrase query.
Cheers
Mark
----- Original Message ----
From: Matthijs Bierman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 14 November, 2007 11:51:07 AM
Subject: Re: get original term for synonym
Hi Mark,
Your solution would be correct if the synonym would be a true 2-way
synonym. Unfortunately this is not the case. My analyzer takes care of
decomposition of specific Dutch words (where a "-" is used to create
compound words). For example: 'zone-indeling' would create synonyms for
'zone'-> 'zone-indeling' and 'indeling'->'zone-indeling'.
When analyzing 'zone' it will therefore not point back to
'zone-indeling' (this information is simply not available). Putting all
the results from the indexing process into a file or lucene document
(thus creating a 'lookup' index) would probably make the lookup process
rather slow, or make application startup too long (due to HashMap
generation).
Maybe you can do something with offsets?
Thanks,
Matthijs
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]