> > It mentions the ff fe bytes ( to indicate little-endian > > order) I see at the beginning of my document. > > > > The xml files contain the heading <?xml version="1.0" > > encoding="utf-16"?> specifying the encoding.
> > When I manually overwrite a document (left out the two bites > > and also the encoding) the index is being 'repaired' and only > > one hit is found with a search. It looks like the trailing > > bytes and the encoding are causing the unexpected search results. > Whoow, must admit I learned something new today :-) Great research Æde, I > would have not guessed this from the top of my head. I also know lucene trunk > has done some parts which make use of \uffff kind of special chars, so am > wondering whether this might give collisions as well as what you encountered. Don't mention it ;) I learn also from you guys and with your help I got this far. I'm just happy that I could return the favour. > Is it possible for you to store the documents as utf-8? That is definitely an option I am going to explore. Unfortunately the application is developed by an 'external' party. I don't know whether they are able to change the xml (encoding) though, since I believe they are using 'standard' windhoos components. --Æde _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
