On Mon, 29 Dec 2003, Lachlan Andrew wrote:

> Greetings Chris,
>
> A grep for  strlen  gives 124 instances.  For each of these, we need
> to work out whether it is the number of bytes or number of characters
> (or both) which is important.  That would be a useful start.
>
> I've never used UTF-8 before.  What other things do we need to look
> for?

  This is going to be a major effort.

  Here's a list of issues:
  1) Convert non-display code to use a string class
  2) Convert all calls "strxxx()" functions to use that string class
  3) Adopt a string class that supports UTF-8 (there are some open
        ones)... ours is not good
  4) Purge all code of char*  arithmetic
  5) examine BDB interface code for issues
  6) ALL interactions with strings must use string-class methods
  7) Must properly escape sequences in regex inputs so that regex works.

  There are probably a few other issues...

  I work with a guy who did this to a large code-base... It's on my list
to have a long talk with him ASAP.

  There is also the largely undefined issue of Asian word-breaking.  May
asian languages do not use spaces to 'break' words in text, this makes it
very difficult to index by word.

  Thanks.

> On Thu, 11 Dec 2003 14:34, Christopher Murtagh wrote:
> >  How far away is it? I've just switched my entire DB (Postgres) and
> > file system to UTF-8 as it was becoming a necessity. I *might* be
> > able to hire a CS student to help with the code if that could help
> > at all. Any idea to the scope of the problem?
>
>
> --
> [EMAIL PROTECTED]
> ht://Dig developer DownUnder  (http://www.htdig.org)
>

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to