I set up htdig to index a prmerily Polish (ISO-8859-2) web site. Things
appear to be working well for the most part. I can search for
words/phrases as long as I type in the search keywords with the accented
characters. But if I replace the Polish characters with their ASCII
"equivalents" then the search comes up empty even though my rundig
script runs the "htfuzzy accents" command. My understanding of htfuzzy
accents is that it is supposed to enter into htdig's database words with
accented characters replaced by the unacceneted equivalents. But it
would appear that it doesn't happen exactly like this.
To try to debug the problem I ran "htfuzzy -vvv accents". This spits
out a long list of word pairs. Each pair appears to contain an
"unaccented word" along with the original word. But after glancing at
that list it appears to me that not all the original accented words are
in there. That is, I know of accented words on the site's pages that
are not displayed in the list. I am certain that ALL of the pages are
digged through, because I specify every single one of them in the
start_url (to avoid the fact that htdig doesn't follow JavaScript linked
pages). So how come I don't see all of the accent words in that list?
Am I overlooking something?
Thanks for any help/information.
Greg Szeszko
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing
your web site with SSL, click here to get a FREE TRIAL of a Thawte
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
- Re: [htdig] word list in htfuzzy accents verbose mode Gregory Szeszko
- Re: [htdig] word list in htfuzzy accents verbose mo... Gilles Detillieux
- Re: [htdig] word list in htfuzzy accents verbos... Gregory Szeszko

