Hello,
I have a problem a boolean search and numbers in word documents...
I have indexed a folder with word, excel and prf Documents,
everything worked fine. A boolean seach with the search term like
"testword not 301" shows in the result TWO word-documents which
contain both terms, the testword and the number: In the indexed
files there are 9 documents which contain the testword and the
number.
I use a debian-woody htdig 3.1.6-3 with doc2html.pl and catdoc.
Maybe somebody has an idea what happened.
Ahoj Joerg xxx
Here is a part of my config-file:
####CUT####
method_names: or Any and All boolean Boolean
template_map: builtin-long builtin-long builtin-long builtin-short
builtin-short builtin- short
sort_names: score Score time Time revscore Revscore revtime Revtime
title A-Z revtitle Z-A
minimum_word_length: 2
maximum_word_length: 30
allow_numbers: true
max_head_length: 10000
max_doc_size: 20000000
no_excerpt_show_top: true
locale: de_DE
#search_algorithm: exact:1 synonyms:0.5 endings:0.1
search_algorithm: exact:1 endings:0.1 substring:0.1
#search_algorithm: exact:1 endings:0.5
lang_dir: ${common_dir}/german
bad_word_list: ${lang_dir}/bad_words
endings_affix_file: ${lang_dir}/german.aff
endings_dictionary: ${lang_dir}/german.0
endings_root2word_db: ${lang_dir}/root2word.db
endings_word2root_db: ${lang_dir}/word2root.db
external_parsers: application/msword->text/html
/opt/htdig/bin/doc2html.pl \
application/vnd.ms-excel->text/html
/opt/htdig/bin/doc2html.pl \
application/pdf->text/html
/opt/htdig/bin/doc2html.pl
debian_pdf_parser: xpdf
####CUT#####
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general