Hi Georgina!

If I understood well the solution for your problem is here:
http://www.htdig.org/FAQ.html#q5.22

In my case the UTF character is not given by code point, it's just entered
from keyboard. Probably htdig cannot handle it correctly and treat it as two
characters. So, the source of your pages probably is not UTF, only the
appearance is. The difference is that for me the source is UTF too.

Hope the link helps.

Regards,
   Levi



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Georgina
Allbrook
Sent: Tuesday, June 20, 2006 3:36 AM
To: [email protected]
Subject: Re: [htdig] UTF again (Kintzel Levente)

Hi,

I'm interested in this too.  We are using 3.2b6 and indexing sites that
are mostly in english.  

However, we also have have web pages with Maori, French, German and
Spanish words within the content, using UTF-8 for the encoding.

Like Levi, its fine for us not to be able to search using one of these
characters, but we would like the characters to be displayed correctly
in the search results. 

A common character we have is the letter a with a macron, marked up
using the html entity ā it seems that htdig is escaping the & so it
displays on the search results as ā

It is possible to make this work?

Thanks Georgina.


--- original message ---
Date: Mon, 19 Jun 2006 15:14:54 +0300
From: "Kintzel Levente" <[EMAIL PROTECTED]>
Subject: [htdig] UTF again
To: <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset="us-ascii"

Hi!

 

I know that htdig doesn't support UTF8 characters (only 8 bits
characters).
My question is that "doesn't support" what does it means exactly?

That means that the search doesn't work well for characters with accents
or
special characters? Or htdig cannot return the indexed pages with
correct
content if it contains UTF chars?

More exactly, my web pages contain UTF8 characters, and I want to user
htdig
for search. Let's suppose that it is OK if it doesn't search for
accented
characters, only for simple characters, but the returned pages contains
bad
characters. Where an UTF character was before, now there are two
characters.
Is it a consequence of the fact that htdig cannot handle UTF characters,
or
is it a configuration problem made by me?

My locale is: hu_HU.utf8 but it still doesn't work for hu_HU.

Thank you in advance.

 

Regards,

     Levi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Georgina Allbrook    
WebTeam : ITS Division : University of Waikato
http://phonebook.waikato.ac.nz/dept/WWTE.shtml
Ph 64 7 856 2889 x 6086

"Those who will play with cats must expect to be scratched." 
-- Cervantes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general



_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to