Yeah, they are more complex than the "exactish" match -- basically, there are
more fields involved -- combined sometimes with AND and sometimes with OR,
and sometimes negated field values, sometimes groupings, etc. These other
field values are all single words (no spaces), and a search might involve a
wildcard on them. Hope that helps.
Thanks.
Chris Hostetter wrote:
>
>
> : Thanks for your input. I'm sure I could do as you suggest (and maybe
> that
> : will end up being my best option), but I had hoped to use a string for
> : creating the query object, particularly as some of my queries are a bit
> : complex.
>
> you have to clarify what you mean by "use a string for creating the query
> object" ... there's nothing in what i suggested that implies you can't do
> that, that's exactly what i'm suggesting you do...
>
> String input = ...;
> Analyzer a = new YourCustomAnalyzer();
> // because you know your analyzer allways produces exactly one token...
> Token t = a.tokenStream("yourField", new StringReader(input)).next();
> Query yourQuery = new TermQuery("yourField", t.termText());
>
> ...if your queries are more complex then just the "exactish" matching you
> described before, then that's a seperate issue -- what you described
> didn't sound like it required any special input processing -- you said you
> had a "string" and you wanted to find exact matches on that string (with
> some normalization) ... but that you didn't want your input split on
> whitespace, or hyphens, or any of the "special" characters QueryParser
> uses.
>
> If you want other things then that certainly makes things more
> complicated, but the basic idea is still the same ... so what exactly do
> you mean when you say it's more complicated?
>
>
> : > I haven't really been following this thread, but it's gotten so long
> : > i got interested.
> : >
> : > from whta i can tell skimming the discussion so far, it seems like the
> : > biggest confusion is about the definition of a "phrase" and what
> analyzers
> : > do with "quote" characters and what the QueryParser does with "quote"
> : > charcters -- when ultimately you don't seem to really care about
> "phrases"
> : > in a textual searching sense; nor do you seem to care about any of the
> : > "features" of the QueryParser.
> : >
> : > it seems that what you care about is:
> : >
> : > 1) making documents, and adding a list of "text chunks" to those
> : > documents (what you've been calling phrases)
> : > 2) you then want to be able to search for "almost-exact" matches on
> those
> : > "text chunks" ... these matches should be "exactish" because you
> don't
> : > want partial matches based on white spaces, or splitting on
> hyphens,
> : > but they shouldn't be truely exact because you want some simple
> : > normalization...
> : >
> : > : actually would like to "normalize" a phrase (spaces) or a hyphenated
> : > word or
> : > : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or
> "MS
> : > : Word" --> ms_word.
> : >
> : > ...in which case, you should:
> : > a) write yourself an analyzer which does no "tokenizing" (ie: each
> input
> : > Field value generates a single token) but does the normalization
> you
> : > want.
> : > b) use this Analyzer when you add the fields to your documents, even
> : > though you don't want *real* tokenization, add make the field type
> : > TOKENIZED so your analyzer gets used.
> : > c) when you get some text input to serach on, pass it to the same
> : > Analyzer, take the Token you get back and manualy construct a
> : > TermQuery out of it for the neccessary field.
> : >
> : > ...that's it. that's all she wrote -- don't even look in
> QueryParser's
> : > general direction, at all.
> : >
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: [EMAIL PROTECTED]
> : > For additional commands, e-mail: [EMAIL PROTECTED]
> : >
> : >
> : >
> :
> : --
> : View this message in context:
> http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6128827
> : Sent from the Lucene - Java Users forum at Nabble.com.
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [EMAIL PROTECTED]
> : For additional commands, e-mail: [EMAIL PROTECTED]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6134864
Sent from the Lucene - Java Users forum at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]