According to Mohler, Jeff:
> So..a workaround would be to search using the AND function.
> 
> this.that.other.thing would be searched using 'AND', and entering "this that
> other thing".
> 
> Correct?

Yes, this is correct, though it's not at all clear to me how you reached
that conclusion from what I was saying in my last reply.  What you're
describing works because when htdig indexes a word containing punctuation
listed in valid_punctuation, it not only indexes the whole word with the
punctuation stripped out, but also all the parts that make up the whole
word, e.g. thisthatother, thatotherthing, thisthat, thatother, otherthing,
this, that, other and thing.  So, when you search for ALL the individual
words it will find a match if the document also contains all these words,
whether separated by space, punctuation, or even other words.

However, the main point I was trying to make before is YOU DON"T
NEED a workaround!  A search for this.that.other.thing will find any
documents that contain this.that.other.thing, because htdig will put
thisthatotherthing in the word database, and htsearch will look for, and
find, thisthatotherthing in the word database.  If you don't believe me,
give it a try with a valid-punctuation-separated compound word that you
know is in your documents somewhere, and htsearch should find a match.

All that your "workaround" will accomplish is to increase the chance of
finding false positives, because it will match any document that contains
all the words this, that, other and thing anywhere and in any order.

> Is it worthwhile to ask for an EXACT search method in addition to AND, OR, and
> BOOLEAN.?

Probably not.  First of all, you have to make a distinction between search
methods and search algorithms.  At least in htsearch's terminology,
exact is one of several search algorithms already available (in
addition to a number of "fuzzy" search algorithms like endings, accents,
soundex and others).  So, htsearch already does exact matches of words.
The only fuzziness involved in the "exact" algorithm is the stripping of
punctuation, which is configurable as I already explained, and treating
upper and lowercase letters as equal (which is not configurable).
The search methods like and, or and boolean are methods of combining
the results of searches for multiple words.

So, htsearch's exact algorithm is in all likelyhood exact enough for
the purposes you describe.  When your user complained that a search
for bla.bla.bla didn't work because htsearch was actually searching for
blablabla, I think you just jumped to the conclusion that there was indeed
a document that did contain bla.bla.bla and htsearch missed it because
it was looking for something different than what went in the index.
That is not the case!  If there are documents that were indexed which
contained bla.bla.bla, then blablabla would be in the index.  If it's not,
then those documents weren't indexed, so you'd need to figure out why.
(See FAQ 4.1.)

> -----Original Message-----
...
> According to Mohler, Jeff:
> > > When I use:
> > >  http://gso-sparky.hq.netapp.com/form.htm
> > > to search for 'bla.bla.bla' I get error:
> > >  "No matches were found for 'blablabla'"
> > > Note the missing dots.
> > 
> > There are configuration details in our lists that need searched on
> > from time to time, what do I need to change to get htdig to use that
> > specific string and not wipe out the dots?
> 
> Well, first of all, are you sure that stripping out the punctuation
> is a problem?  Note that the same process is done during the indexing
> phase, so that if there was a document that was indexed that contained
> bla.bla.bla, a search for blablabla would find it!  This is only a problem
> if you get a lot of false positives, i.e. if you MUST treat bla.bla.bla
> and blabl.abla as different words, and you can't allow a search for one
> of these to match another with different punctuation in it.
> 
> If you really need to treat the period as a significant character, i.e.
> just like a letter, then you can remove it from valid_punctuation (set
> valid_punctuation in your htdig.conf to something other than the built-in
> default) and add the period to extra_word_characters.  If you do this,
> though, the period will be treated as a letter in all contexts.  That
> means that searching for the word "context" in the previous sentence would
> fail because it would be indexed as "context." rather than "context".
> 
> See http://www.htdig.org/attrs.html#valid_punctuation
> and http://www.htdig.org/attrs.html#extra_word_characters


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to