According to Gregory Szeszko:
> I set up htdig to index a prmerily Polish (ISO-8859-2) web site.  Things 
> appear to be working well for the most part.  I can search for 
> words/phrases as long as I type in the search keywords with the accented 
> characters.  But if I replace the Polish characters with their ASCII 
> "equivalents" then the search comes up empty even though my rundig 
> script runs the "htfuzzy accents" command.  My understanding of htfuzzy 
> accents is that it is supposed to enter into htdig's database words with 
> accented characters replaced by the unacceneted equivalents.  But it 
> would appear that it doesn't happen exactly like this.
> 
> To try to debug the problem I ran "htfuzzy -vvv accents".  This spits 
> out a long list of word pairs.  Each pair appears to contain an 
> "unaccented word" along with the original word.  But after glancing at 
> that list it appears to me that not all the original accented words are 
> in there.  That is, I know of accented words on the site's pages that 
> are not displayed in the list.  I am certain that ALL of the pages are 
> digged through, because I specify every single one of them in the 
> start_url (to avoid the fact that htdig doesn't follow JavaScript linked 
> pages).  So how come I don't see all of the accent words in that list? 
>  Am I overlooking something?

It doesn't mention this in the documentation (yet), but the accents
algorithm currently only supports the iso-8859-1 (Latin 1) character set.
The conversion from accented to unaccented characters is hard-coded in
the table "MinusculeISOLAT1" in htfuzzy/Accents.cc.  The only way to
configure this for ISO-8859-2 or other character sets right now is to
edit this table for the specific character set you need, and recompile.
If someone can suggest a better way of doing this, using the locale
information, it would be a big help.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to