I'm using htdig 3.1.6 in Debian Sarge and cannot get the synonym fuzzy
search algorithm to work. Actually I'm not sure if fuzzy search works
at all.

- I indexed the following pages:

index.html:
<html>
<head>
<title></title>
</head>
<body>
<a href="page1.html" title="Page One" />Page 1</a>
<a href="page2.html" title="Page Two" />Page 2</a>
</body>
</html>

page1.html:
<html>
<head>
<title>Page 1</title>
</head>
<body>
test
</body>
</html>

page2.html:
<html>
<head>
<title>Page 2</title>
</head>
<body>
toets
</body>
</html>

- The synonym file contains just 1 entry for testing (in Dutch): 
test onderzoek proef toets

- Now when I do:
#  /usr/lib/cgi-bin/htsearch keywords=test
I only get Page 1 as a result

- When I do:
#  /usr/lib/cgi-bin/htsearch keywords=toets
I only get Page 2 as a result

- In both cases I expect to get both Page 1 and Page 2 as a result
because as specified in the synonym file the words "test" and "toets"
are synonyms of each other.

- I indexed the pages using the following commands in order:
# htdig -v -i
# htmerge
# htfuzzy synonyms
# htfuzzy endings
# htfuzzy accents

- These commands created the following files in the right directories:
synonyms.db (I verified it contains the synonyms)
word2root.db
root2word.db
db.accents.db
db.docdb
db.docs.index
db.wordlist (this one contains the words page, test and toets)
db.words.db

-  I already verified the locale settings and tried different  
"search_algorithm" combinations like using the comma instead of the
dot as the decimal symbol.

- Here follows my htdig configuration file:

maintainer:             [EMAIL PROTECTED]
bad_extensions:         .wav .gz .z .bz2 .sit .au .zip .tar .hqx .exe .com \
   .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css
minimum_word_length:    2

iso_8601:       true
date_format:    %Y-%m-%d
search_results_contenttype:     text/xml
search_results_wrapper: /etc/htdig/wrapper.xml
nothing_found_file:     /etc/htdig/nomatch.xml
syntax_error_file:      /etc/htdig/syntax.xml
template_map:           xml xml /etc/htdig/result.xml
template_name:          xml

locale: nl_NL
search_algorithm:       exact:1 synonyms:0.5 accents:0.2 endings:0.1
lang_dir:               /usr/lib/htdig/dutch
bad_word_list:          ${lang_dir}/bad_words
endings_affix_file:     ${lang_dir}/dutch.aff
synonym_db:             ${lang_dir}/synonyms.db
synonym_dictionary:     ${lang_dir}/synonyms
endings_root2word_db:   ${lang_dir}/root2word.db
endings_word2root_db:   ${lang_dir}/word2root.db
endings_dictionary:     ${lang_dir}/dutch.0

database_dir:           /var/lib/htdig/httest
start_url:              http://localhost/httest
exclude_urls:           /cgi-bin/ .cgi
limit_urls_to:          ${start_url}
max_head_length:        10000
max_doc_size:           200000
no_excerpt_show_top:    true
matches_per_page:       999
maximum_pages:          1

- Anyone knows what's wrong?


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to