On Mon, 12 Aug 2002, Gilles Detillieux wrote:

Hello Giles,

I downloaded and installed htdig-3.2.0b4-20020811 and still had
a problem with bad_words when they are inside the phrase.

For Example :

I have a document that contains the following phrase :

"ABOUT THIS SEARCH ENGINE - for a basic search do not use"

from the above my bad_words file contains "this, for, not"

If the whole phrase is searched as listed above I get No matches.

If "THIS SEARCH ENGINE - for" is searched I get hits for documents
that contain "search engine" So it appears to strip the leading and
trailing bad words.

If "THIS SEARCH ENGINE - for a" is searched I get no hits. Since I
index single characters and I do not have "a" in the bad words list
this should find something. The presence of "a" makes "for" an
internal bad word.

Likewise if "for a basic search do not" is searched it strips the
initial and trailing bad words and finds the documnet.

But if "for a basic search do not use" is searched the fact that
"not" is an internal stopword causes no hits to be found.

Let me know if there is anything that can be done.

Dave

> Date: Mon, 12 Aug 2002 21:46:28 -0500 (CDT)
> From: Gilles Detillieux <[EMAIL PROTECTED]>
> To: Dave Hoover <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig-dev] htdigb4 - phrase search bad_words
>
> According to Dave Hoover:
> > I am using htdig-3.2.0b4-20020210 so I can get phrase searching to
> > work.
> >
> > I noticed that if I type a word that is in the bad words file in a
> > search string
> > surronded by quotes for a phrase I know should match, I get no
> > result.
> >
> > So a documnet cntains "After crossing the Atlantic"  but if this
> > search is
> > entered in htdig with quotes - there are no hits found. I assume it
> > is because the
> > bad_word "the" is not being stripped, or the index wasn't built
> > properly.
> >
> > Is there a way around this ?
>
> I know there were problems with "bad words" in phrases in the past,
> but I thought those had been fixed by Feb. 10.  Perhaps not, though.
> Do you get the same results with the latest 3.2.0b4 snapshot?  I know
> there's a new query parser under development, which should solve some of
> the problems with phrase searches, but I don't think it's been integrated
> into the current CVS development tree yet.  There may be some incremental
> fixes right now, though.
>
> --
> Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
> Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
>
>

Dave Hoover
Systems Programmer
Rutgers University Libraries
[EMAIL PROTECTED]

Crippled but free, I was blind all the time I was learning to see.




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to