Re: bibindex on fulltext ending with error

Tibor Simko Wed, 13 Aug 2008 14:52:18 +0200

Hi Dominic:

On Thu, 07 Aug 2008, Dominic Lukas Wyler wrote:
> The edits in search_engine for accent stripping involved adding
> support for iso-8859-2 and iso-8859-15 characters.


Your changes look good.

> Removing that character from the list fixed the issue. Thank you very
> much for your help.

Good then.

> But now, if I want to keep this character as a separator (many of our
> submitted documents contain such quotes), I assume I have to proceed
> as was done with the accent stripping: have the current phrase in
> bibindex_engine.get_words_from_phrase() in unicode, as well as all the
> regexps ?

Yes, that would be the safest approach.

P.S. Instead of doing several "Binary/UTF8 -> Unicode -> Binary/UTF8"
     transformations, we should one day probably move towards using
     Unicode strings internally everywhere, right from the run_sql()
     output, and convert to UTF-8 only before sending the results back
     to the browser...

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Re: bibindex on fulltext ending with error

Reply via email to