According to Emma Jane Hogbin:
> Thanks for your response, Gilles. I think my biggest mistake was not
> reading *all* the documentation. I only read part of it and didn't realize
> I'd missed some of the instructions in other parts of the FAQ. If I can
> get everything working I'll write you an English version of what I did.
> 
> I've been working (initially) from:
> Question 4.10
> 
> The detailed instructions are at:
> http://www.quartier-rural.org/dl/elucu/htdig-vf/lisezmoi.html
> I'm using the "kit de francisation" 1.05 (the ht://dig web site has 1.03).

I've just updated that.

> I have ht://dig 3.1.6 installed with the cookies patch. I have two
> different "language" config files that include the main config file with
> my header sizes and pagination options and other generic config things. I
> run htdig using either en.conf or fr.conf. Each of those then includes
> ccfdp.conf which is the generic conf file.
> 
> > Are you sure you haven't applied a patch to ht://Dig that would force it
> > to strip out accented characters?
> 
> I think I am. I would like people to be able to enter a word either with
> or without the accent and find the right word. I thought that's what this
> paragraph meant, but now that I see the output I understand it's not what
> I wanted:
> 
> If you're running version 3.1.6 of ht://Dig, you may also be interested in
> the accents fuzzy match algorithm in the search_algorithm attribute, which
> lets you treat accented and unaccented letters as equivalent in words.
> Note that if you use the accents algorithm, you need to rebuild the
> accents database each time you update your word database, using "htfuzzy
> accents". This command isn't in the default rundig script, so you may want
> to add it there. The accents fuzzy match algorithm is also in the 3.2 beta
> releases. There are also the boolean_keywords and boolean_syntax_errors
> attributes in 3.1.6 for changing other language-specific messages in
> htsearch.

Yes, this paragraph means what you seem to be aiming for, and the accents
fuzzy algorithm, which is built-in to 3.1.6, should let you search for
accented words by typing a word with or without accents.

> > 3.1.6 and 3.2 betas, and the other was a hack that mapped all ISO-8859-1
> > (Latin 1) accented letters to their unaccented counterparts.  If you've
> > applied that latter patch,
> > ftp://ftp.ccsf.org/htdig-patches/3.1.5/accents.zip, to any ht://Dig
> > version, then that would explain the problem.
> 
> I'm confused about the difference between these two patches. I have 3.1.6
> installed and am pretty sure I ran htfuzzy accents. Which of the two
> patches allows users to search for "francais" OR "fran�ais" and get
> matches for both? (i.e. one word, two matches not the boolean OR)

With 3.1.6, you shouldn't need any of these patches.  They were designed
for 3.1.5, and worked with varying degrees of success.  Robert Marchand's
patch was the best of the lot, and it's included in the 3.1.6 code.
It lets you do just what you ask above.

Did you add the accents algorithm to your search_algorithm attribute?

> > In an earlier e-mail, you mentioned that you "also grabbed the language
> > pack (for lack of a better term) from the web site."  What pack are you
> > referring to specifically?
> http://www.quartier-rural.org/dl/elucu/htdig-vf/htdig-fr-1.0.5.tar.gz
> 
> > It would be helpful to know the actual file
> > name and the location from which you got it.  Also, which version of
> > ht://Dig are you running, and what patches, if any, have been applied.
> > See http://www.htdig.org/FAQ.html#q5.33
> 
> I used the instructions in 5.33. I am running 3.1.6 and have the cookies
> patch installed. The patch is from here:
> ftp://ftp.ccsf.org/htdig-patches/3.1.6/cookies.gz.0

None of this should pose a problem that I can see.

> >> Did you work from the language package on the web site? I will go
> >> through it again and try to understand what steps I missed. I've just
> >> found at one mistake. I was using fr_FR but I only have fr in
> >> /usr/share/locale Found that tid bit in:
> >> http://htdig.org/FAQ.html#q4.13
> >> I also noticed that I don't have an LC_CTYPE in my
> >> /usr/share/locale/fr folder. I'll bug the list again once I've figured
> >> out how to fix this situation. I believe it involves "installing"
> >> fr_CA instead of the generic fr.
> >
> > Also have a look in /usr/lib/locale, as some systems (namely Linux
> > systems using more recent versions of glibc) put locale definitions
> > there. Any French locale that has the LC_CTYPE file should do, as any
> > national variations of a language shouldn't affect the character set
> > used.
> 
> The locale is listed when I do locale -a. However I cannot find an
> LC_CTYPE anywhere on the system (for any language). I am running Debian
> woody distro (stable). (My home machine is woody unstable and does not
> have LC_CTYPE files either. I've emailed the debian-user email list for
> help on this.) The woody stable server has fr_CA as the system language
> but I still can't find the LC_CTYPE file.

Hmm.  I'm not very familiar with Debian systems, but it may be that they
encode locale definitions differently.  It could also be that you don't
have everything installed that you need.  It might not hurt to try the
testlocale.c program in http://www.htdig.org/FAQ.html#q5.8 to rule out
locale problems.

> > Based on what you've reported, it doesn't sound like a locale problem to
> > me.  Usually, if the locale you pick doesn't support accented letters,
> > these letters are treated as punctuation, causing words to be split up
> > wherever an accented character appears in a word, but the accented
> > characters still show up in the excerpts.  You just can't search for
> > accented words because these words aren't put in the database.  However,
> > you reported that the accents are stripped from the results page.  Am I
> > misunderstanding you in interpretting this as meaning that accented
> > letters are replaced with their unaccented counterparts, for example,
> > that "�" appears as "e"?  Or do you mean they disappear altogether?
> 
> Correct, I meant the accent is stripped leaving an unaccented equivalent.
> "�" becomes "e" in my search results page and I cannot find words if I
> spell it with the "�." I think you've identified my problem as being the
> htfuzzy accents.

That's very odd.  It sure sounds like the work of that old accents.zip
hack for 3.1.5.  I can't see anything in the locale support or in the
stock htdig code that would account for this.  Certainly, any htfuzzy
algorithm won't have that sort of impact on the words database or the
excerpts.  I assume you mean the unaccented letters are appearing in
htsearch's search result excerpts.  Correct?  What about in db.wordlist?
Do you see any complete words with accents in there?

Is it possible you still have an old htdig or htsearch binary lying
around that was built with the accents.zip hack or some other type of
accent stripping code?  A lot of these seemingly inexplicable problems
have boiled down to someone running a different version of the code than
they thought they had installed.

> My current problem is that the search engine is spiralling out of control.
> I.e. won't stop crawling. I have to read the log file tonight to see if
> it's a problem with the cookie patch, or a problem with our URL
> parameters.

See http://www.htdig.org/FAQ.html#q5.29

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to