Thanks for your response, Gilles. I think my biggest mistake was not
reading *all* the documentation. I only read part of it and didn't realize
I'd missed some of the instructions in other parts of the FAQ. If I can
get everything working I'll write you an English version of what I did.

I've been working (initially) from:
Question 4.10

The detailed instructions are at:
http://www.quartier-rural.org/dl/elucu/htdig-vf/lisezmoi.html
I'm using the "kit de francisation" 1.05 (the ht://dig web site has 1.03).
I have ht://dig 3.1.6 installed with the cookies patch. I have two
different "language" config files that include the main config file with
my header sizes and pagination options and other generic config things. I
run htdig using either en.conf or fr.conf. Each of those then includes
ccfdp.conf which is the generic conf file.

> Are you sure you haven't applied a patch to ht://Dig that would force it
> to strip out accented characters?

I think I am. I would like people to be able to enter a word either with
or without the accent and find the right word. I thought that's what this
paragraph meant, but now that I see the output I understand it's not what
I wanted:

If you're running version 3.1.6 of ht://Dig, you may also be interested in
the accents fuzzy match algorithm in the search_algorithm attribute, which
lets you treat accented and unaccented letters as equivalent in words.
Note that if you use the accents algorithm, you need to rebuild the
accents database each time you update your word database, using "htfuzzy
accents". This command isn't in the default rundig script, so you may want
to add it there. The accents fuzzy match algorithm is also in the 3.2 beta
releases. There are also the boolean_keywords and boolean_syntax_errors
attributes in 3.1.6 for changing other language-specific messages in
htsearch.

> 3.1.6 and 3.2 betas, and the other was a hack that mapped all ISO-8859-1
> (Latin 1) accented letters to their unaccented counterparts.  If you've
> applied that latter patch,
> ftp://ftp.ccsf.org/htdig-patches/3.1.5/accents.zip, to any ht://Dig
> version, then that would explain the problem.

I'm confused about the difference between these two patches. I have 3.1.6
installed and am pretty sure I ran htfuzzy accents. Which of the two
patches allows users to search for "francais" OR "fran�ais" and get
matches for both? (i.e. one word, two matches not the boolean OR)

> In an earlier e-mail, you mentioned that you "also grabbed the language
> pack (for lack of a better term) from the web site."  What pack are you
> referring to specifically?
http://www.quartier-rural.org/dl/elucu/htdig-vf/htdig-fr-1.0.5.tar.gz

> It would be helpful to know the actual file
> name and the location from which you got it.  Also, which version of
> ht://Dig are you running, and what patches, if any, have been applied.
> See http://www.htdig.org/FAQ.html#q5.33

I used the instructions in 5.33. I am running 3.1.6 and have the cookies
patch installed. The patch is from here:
ftp://ftp.ccsf.org/htdig-patches/3.1.6/cookies.gz.0

>> Did you work from the language package on the web site? I will go
>> through it again and try to understand what steps I missed. I've just
>> found at one mistake. I was using fr_FR but I only have fr in
>> /usr/share/locale Found that tid bit in:
>> http://htdig.org/FAQ.html#q4.13
>> I also noticed that I don't have an LC_CTYPE in my
>> /usr/share/locale/fr folder. I'll bug the list again once I've figured
>> out how to fix this situation. I believe it involves "installing"
>> fr_CA instead of the generic fr.
>
> Also have a look in /usr/lib/locale, as some systems (namely Linux
> systems using more recent versions of glibc) put locale definitions
> there. Any French locale that has the LC_CTYPE file should do, as any
> national variations of a language shouldn't affect the character set
> used.

The locale is listed when I do locale -a. However I cannot find an
LC_CTYPE anywhere on the system (for any language). I am running Debian
woody distro (stable). (My home machine is woody unstable and does not
have LC_CTYPE files either. I've emailed the debian-user email list for
help on this.) The woody stable server has fr_CA as the system language
but I still can't find the LC_CTYPE file.

> Based on what you've reported, it doesn't sound like a locale problem to
> me.  Usually, if the locale you pick doesn't support accented letters,
> these letters are treated as punctuation, causing words to be split up
> wherever an accented character appears in a word, but the accented
> characters still show up in the excerpts.  You just can't search for
> accented words because these words aren't put in the database.  However,
> you reported that the accents are stripped from the results page.  Am I
> misunderstanding you in interpretting this as meaning that accented
> letters are replaced with their unaccented counterparts, for example,
> that "�" appears as "e"?  Or do you mean they disappear altogether?

Correct, I meant the accent is stripped leaving an unaccented equivalent.
"�" becomes "e" in my search results page and I cannot find words if I
spell it with the "�." I think you've identified my problem as being the
htfuzzy accents.

My current problem is that the search engine is spiralling out of control.
I.e. won't stop crawling. I have to read the log file tonight to see if
it's a problem with the cookie patch, or a problem with our URL
parameters.

Thanks again for your help, it's much appreciated!

--
Emma Jane Hogbin
Xtrinsic




-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to