: I'm about to embark on implementing the full-text search feature of XQuery:
Good luck with that.
Here's some quick suggestions on how i'd try to tackle the things you
asked about, w/o putting much thought into...
: title ftcontains "usability" occurs at least 2 times
assuming this is just term based (and not complex subclauses) i would
write a custom subclass of TermQuery that enforces a minimum term frequency.
: title ftcontains "improve" with stemming
index two versions of every field - one with stemming and one w/o
: This allows you to specify -- at query-time -- one of "case
: insensitive", "case sensitive", "lowercase", "uppercase".
I have no idea what it would mean to match something "uppercase" or
"lowercase" -- unless that's just syntactic suger for "uppercase by input,
and then look for a case sensitve match) but again: two fields for case
sensitive/insensitive
: This is similar to the Cast Option except its "diacritics insensitive"
: or "diacritics sensitive. How about implementing this?
two fields, again.
...at this point, if you need to support all permutations of these options
you are looking at 2*2*2 index fields per source field ... so you start
getting into hte realm where i might consider keeping them all in one
field, using Payloads to note the various attributes that each Term has.
: abstract ftcontains "propagating of errors"
: with stop words ("a", "the", "of")
:
: would match a document with an abstract that contains "propagating few
: errors". It seems odd, I know. It's as if the stop words become
: wildcards, i.e.:
are you serious? ... so if i query for "A of the B" with stop words ("of",
"the") then that has to match "A totally ridiculous B" ? ... that makes
no sense what so ever. why require so much verbosity just to get a "gap"
that matches anything?
that seems like a straight query parsing problem ... if you see one of the
terms in teh stop work list, strip it out, and increase the phrase slop on
the PhraseQuery you are building.
: body ftcontains "Mexico" not in "New Mexico"
SpanNotQuery
: title ftcontains ("web site" ftand "usability") ordered
SpanNearQuery
: abstract ftcontains "usability" ftand "web site" same sentence
:
: You can also do any combination of {same|different}
: {sentence|paragraph}. My guess for this would also be to keep track of
: sentence/paragraph data in a payload. Yes?
sounds right.
: book ftcontains "Web Usability" without content $x//annotation
depends on how you plan on indexing all of hte context stuff ... if the
tags are Terms then a SpanNOtQuery would work ... if they are Payloads you
just need some sort of SpanTermNotMatchingPayload query.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]