Hi Georgina! If I understood well the solution for your problem is here: http://www.htdig.org/FAQ.html#q5.22
In my case the UTF character is not given by code point, it's just entered from keyboard. Probably htdig cannot handle it correctly and treat it as two characters. So, the source of your pages probably is not UTF, only the appearance is. The difference is that for me the source is UTF too. Hope the link helps. Regards, Levi -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Georgina Allbrook Sent: Tuesday, June 20, 2006 3:36 AM To: [email protected] Subject: Re: [htdig] UTF again (Kintzel Levente) Hi, I'm interested in this too. We are using 3.2b6 and indexing sites that are mostly in english. However, we also have have web pages with Maori, French, German and Spanish words within the content, using UTF-8 for the encoding. Like Levi, its fine for us not to be able to search using one of these characters, but we would like the characters to be displayed correctly in the search results. A common character we have is the letter a with a macron, marked up using the html entity ā it seems that htdig is escaping the & so it displays on the search results as ā It is possible to make this work? Thanks Georgina. --- original message --- Date: Mon, 19 Jun 2006 15:14:54 +0300 From: "Kintzel Levente" <[EMAIL PROTECTED]> Subject: [htdig] UTF again To: <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" Hi! I know that htdig doesn't support UTF8 characters (only 8 bits characters). My question is that "doesn't support" what does it means exactly? That means that the search doesn't work well for characters with accents or special characters? Or htdig cannot return the indexed pages with correct content if it contains UTF chars? More exactly, my web pages contain UTF8 characters, and I want to user htdig for search. Let's suppose that it is OK if it doesn't search for accented characters, only for simple characters, but the returned pages contains bad characters. Where an UTF character was before, now there are two characters. Is it a consequence of the fact that htdig cannot handle UTF characters, or is it a configuration problem made by me? My locale is: hu_HU.utf8 but it still doesn't work for hu_HU. Thank you in advance. Regards, Levi ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Georgina Allbrook WebTeam : ITS Division : University of Waikato http://phonebook.waikato.ac.nz/dept/WWTE.shtml Ph 64 7 856 2889 x 6086 "Those who will play with cats must expect to be scratched." -- Cervantes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

