I'm using htdig 3.1.6 in Debian Sarge and cannot get the synonym fuzzy
search algorithm to work. Actually I'm not sure if fuzzy search works
at all.
- I indexed the following pages:
index.html:
<html>
<head>
<title></title>
</head>
<body>
<a href="page1.html" title="Page One" />Page 1</a>
<a href="page2.html" title="Page Two" />Page 2</a>
</body>
</html>
page1.html:
<html>
<head>
<title>Page 1</title>
</head>
<body>
test
</body>
</html>
page2.html:
<html>
<head>
<title>Page 2</title>
</head>
<body>
toets
</body>
</html>
- The synonym file contains just 1 entry for testing (in Dutch):
test onderzoek proef toets
- Now when I do:
# /usr/lib/cgi-bin/htsearch keywords=test
I only get Page 1 as a result
- When I do:
# /usr/lib/cgi-bin/htsearch keywords=toets
I only get Page 2 as a result
- In both cases I expect to get both Page 1 and Page 2 as a result
because as specified in the synonym file the words "test" and "toets"
are synonyms of each other.
- I indexed the pages using the following commands in order:
# htdig -v -i
# htmerge
# htfuzzy synonyms
# htfuzzy endings
# htfuzzy accents
- These commands created the following files in the right directories:
synonyms.db (I verified it contains the synonyms)
word2root.db
root2word.db
db.accents.db
db.docdb
db.docs.index
db.wordlist (this one contains the words page, test and toets)
db.words.db
- I already verified the locale settings and tried different
"search_algorithm" combinations like using the comma instead of the
dot as the decimal symbol.
- Here follows my htdig configuration file:
maintainer: [EMAIL PROTECTED]
bad_extensions: .wav .gz .z .bz2 .sit .au .zip .tar .hqx .exe .com \
.gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css
minimum_word_length: 2
iso_8601: true
date_format: %Y-%m-%d
search_results_contenttype: text/xml
search_results_wrapper: /etc/htdig/wrapper.xml
nothing_found_file: /etc/htdig/nomatch.xml
syntax_error_file: /etc/htdig/syntax.xml
template_map: xml xml /etc/htdig/result.xml
template_name: xml
locale: nl_NL
search_algorithm: exact:1 synonyms:0.5 accents:0.2 endings:0.1
lang_dir: /usr/lib/htdig/dutch
bad_word_list: ${lang_dir}/bad_words
endings_affix_file: ${lang_dir}/dutch.aff
synonym_db: ${lang_dir}/synonyms.db
synonym_dictionary: ${lang_dir}/synonyms
endings_root2word_db: ${lang_dir}/root2word.db
endings_word2root_db: ${lang_dir}/word2root.db
endings_dictionary: ${lang_dir}/dutch.0
database_dir: /var/lib/htdig/httest
start_url: http://localhost/httest
exclude_urls: /cgi-bin/ .cgi
limit_urls_to: ${start_url}
max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
matches_per_page: 999
maximum_pages: 1
- Anyone knows what's wrong?
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general