On Fri, Jan 24, 2003 at 04:07:02PM -0600, Gilles Detillieux wrote: > According to Emma Jane Hogbin: > > I think I am. I would like people to be able to enter a word either with > > or without the accent and find the right word. I thought that's what this > > paragraph meant, but now that I see the output I understand it's not what > > I wanted: > > > > If you're running version 3.1.6 of ht://Dig, you may also be interested in > > the accents fuzzy match algorithm in the search_algorithm attribute, which > > lets you treat accented and unaccented letters as equivalent in words. > > Note that if you use the accents algorithm, you need to rebuild the > > accents database each time you update your word database, using "htfuzzy > > accents". This command isn't in the default rundig script, so you may want > > to add it there. The accents fuzzy match algorithm is also in the 3.2 beta > > releases. There are also the boolean_keywords and boolean_syntax_errors > > attributes in 3.1.6 for changing other language-specific messages in > > htsearch. > > Yes, this paragraph means what you seem to be aiming for, and the accents > fuzzy algorithm, which is built-in to 3.1.6, should let you search for > accented words by typing a word with or without accents.
Good. :) > With 3.1.6, you shouldn't need any of these patches. They were designed > for 3.1.5, and worked with varying degrees of success. Robert Marchand's > patch was the best of the lot, and it's included in the 3.1.6 code. > It lets you do just what you ask above. > > Did you add the accents algorithm to your search_algorithm attribute? I had a typo in the config file, it was spelled "accent" by mistake. Fixed now. > > > In an earlier e-mail, you mentioned that you "also grabbed the language > > > pack (for lack of a better term) from the web site." What pack are you > > > referring to specifically? > > http://www.quartier-rural.org/dl/elucu/htdig-vf/htdig-fr-1.0.5.tar.gz > > > > > It would be helpful to know the actual file > > > name and the location from which you got it. Also, which version of > > > ht://Dig are you running, and what patches, if any, have been applied. > > > See http://www.htdig.org/FAQ.html#q5.33 > > > > I used the instructions in 5.33. I am running 3.1.6 and have the cookies > > patch installed. The patch is from here: > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/cookies.gz.0 > > None of this should pose a problem that I can see. I've just realized a potential problem. The server that isnt' working the way I expected was installed from a debian package (Woody stable with the cookies patch). I've got the debian package locally (woody unstable, no cookies patch) running the crawl to see if it works with no cookies patch. The third server is running Debian (woody stable with cookies patch). I think it was installed from source though. I'm waiting for confirmation from the system administrator. > > The locale is listed when I do locale -a. However I cannot find an > > LC_CTYPE anywhere on the system (for any language). I am running Debian > > woody distro (stable). (My home machine is woody unstable and does not > > have LC_CTYPE files either. I've emailed the debian-user email list for > > help on this.) The woody stable server has fr_CA as the system language > > but I still can't find the LC_CTYPE file. > > Hmm. I'm not very familiar with Debian systems, but it may be that they > encode locale definitions differently. It could also be that you don't > have everything installed that you need. It might not hurt to try the > testlocale.c program in http://www.htdig.org/FAQ.html#q5.8 to rule out > locale problems. I've got a question out to debian-user about the LC_CTYPE file (no response yet). The weird thing is that I get the file on the live server (good) but I can't get it locally. I used stable packages on the live server and got the file; I'm using unstable packages on my laptop. Both were installed using debian packages on Thursday night. > > > Based on what you've reported, it doesn't sound like a locale problem to > > > me. Usually, if the locale you pick doesn't support accented letters, > > > these letters are treated as punctuation, causing words to be split up > > > wherever an accented character appears in a word, but the accented > > > characters still show up in the excerpts. You just can't search for > > > accented words because these words aren't put in the database. However, > > > you reported that the accents are stripped from the results page. Am I > > > misunderstanding you in interpretting this as meaning that accented > > > letters are replaced with their unaccented counterparts, for example, > > > that "?" appears as "e"? Or do you mean they disappear altogether? > > > > Correct, I meant the accent is stripped leaving an unaccented equivalent. > > "?" becomes "e" in my search results page and I cannot find words if I > > spell it with the "?." I think you've identified my problem as being the > > htfuzzy accents. > > That's very odd. It sure sounds like the work of that old accents.zip > hack for 3.1.5. I can't see anything in the locale support or in the > stock htdig code that would account for this. Certainly, any htfuzzy > algorithm won't have that sort of impact on the words database or the > excerpts. I assume you mean the unaccented letters are appearing in > htsearch's search result excerpts. Correct? What about in db.wordlist? > Do you see any complete words with accents in there? Yes, the unaccented chars are appearing in the search result excerpts. I just double checked the version and it *says* it's 3.1.6 when I do this: htdig@openkosmos:~$ htdig -/? Ok, on the development system (where I first had the problem) there are partial words and the break exists where the accented character would have been. This system is missing LC_CTYPE. BUT I HAVE GREAT NEWS! The new server which has locales installed properly (i.e. there is an LC_CTYPE) has accented words in the db.wordlist! YAY!! > Is it possible you still have an old htdig or htsearch binary lying > around that was built with the accents.zip hack or some other type of > accent stripping code? A lot of these seemingly inexplicable problems > have boiled down to someone running a different version of the code than > they thought they had installed. Unfortunately I have to take the word of the system administrator. I've done a check on all the systems using htdig -/? and they all say htdig 3.1.6. I believe my current problem has to do with locales. > > My current problem is that the search engine is spiralling out of control. > > I.e. won't stop crawling. I have to read the log file tonight to see if > > it's a problem with the cookie patch, or a problem with our URL > > parameters. > > See http://www.htdig.org/FAQ.html#q5.29 Thanks. :) I've spent most of my time with ht://dig dealing with this problem. I'm pretty good with the log files and I'm sure I'll be able to diagnose the problem. It's just a matter of spending some quality time reading output. :) Thanks again for all your help! Once I've got everything working I'm going to write myself a language checklist for the next time I need to do this. I'll send it along to teh list for feedback when it's ready. Also: for anyone who's interested I've written a little perl script to pull templates off the live site and put them onto the ht://dig server. It means taht I don't have to sync the templates manually. The script is at: http://xtrinsic.com/scripting/template.txt Please feel free to grab a copy for yourself. Let me know if you have any questions about it. emma -- Emma Jane Hogbin [[ 416 417 2868 ][ www.xtrinsic.com ]] ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

