No, I've never used Luke. Is there an easy way to examine my RAMDirectory index? I can create the index with no quoted keywords, and when I search for a keyword, I get back the expected results (just can't search for a phrase that has whitespace in it). If I create the index with phrases in quotes, then when I search for anything in double quotes, I get back nothing. If I create the index with everything in quotes, then when I search for anything by the keyword field, I get nothing, regardless of whether I use quotes in the query string or not. (I can get results back by searching on other fields.) What do you think?
Philip Erick Erickson wrote: > > OK, I've gotta ask. Have you examined your index with Luke to see if what > you *think* is in the index actually *is*??? > > Erick > > On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote: >> >> >> Interesting...just ran a test where I put double quotes around everything >> (including single keywords) of source text and then ran searches for a >> known >> keyword with and without double quotes -- doesn't find either time. >> >> >> Mark Miller-5 wrote: >> > >> > Sorry to hear you're having trouble. You indeed need the double quotes >> in >> > the source text. You will also need them in the query string. Make sure >> > they >> > are in both places. My machine is hosed right now or I would do it for >> you >> > real quick. My guess is that I forgot to mention...no only do you need >> to >> > add the <QUOTED> definiton to the TOKEN section, but below that you >> will >> > find the grammer...you need to add <QUOTED> to the grammer. If you look >> > how >> > <NUM> and <APOSTROPHE> are done you will prob see what you should do. >> If >> > not, my machine should be back up tomarrow... >> > >> > - Mark >> > >> > On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> Well, I tried that, and it doesn't seem to work still. I would be >> happy >> >> to >> >> zip up the new files, so you can see what I'm using -- maybe you can >> get >> >> it >> >> to work. The first time, I tried building the documents without >> quotes >> >> surrounding each phrase. Then, I retried by enclosing every phrase >> >> within >> >> double quotes. Neither seemed to work. When constructing the query >> >> string >> >> for the search, I always added the double quotes (otherwise, it'd >> think >> >> it >> >> was multiple terms). (I didn't even test the underscore and >> hyphenated >> >> terms.) I thought Lucene was (sort of by default) set up to search >> >> quoted >> >> phrases. From http://lucene.apache.org/java/docs/api/index.html --> A >> >> Phrase is a group of words surrounded by double quotes such as "hello >> >> dolly". So, this should be easy, right? I must be missing something >> >> stupid. >> >> >> >> Thanks, >> >> >> >> Philip >> >> >> >> >> >> Mark Miller-5 wrote: >> >> > >> >> > So this will recognize anything in quotes as a single token and '_' >> and >> >> > '-' will not break up words. There may be some repercussions for the >> >> NUM >> >> > token but nothing I'd worry about. maybe you want to use Unicode for >> >> '-' >> >> > and '_' as well...I wouldn't worry about it myself. >> >> > >> >> > - Mark >> >> > >> >> > >> >> > TOKEN : { // token patterns >> >> > >> >> > // basic word: a sequence of digits & letters >> >> > <ALPHANUM: (<LETTER>|<DIGIT>|<KOREAN>)+ > >> >> > >> >> > | <QUOTED: "\"" (~["\""])+ "\""> >> >> > >> >> > // internal apostrophes: O'Reilly, you're, O'Reilly's >> >> > // use a post-filter to remove possesives >> >> > | <APOSTROPHE: <ALPHA> ("'" <ALPHA>)+ > >> >> > >> >> > // acronyms: U.S.A., I.B.M., etc. >> >> > // use a post-filter to remove dots >> >> > | <ACRONYM: <ALPHA> "." (<ALPHA> ".")+ > >> >> > >> >> > // company names like AT&T and [EMAIL PROTECTED] >> >> > | <COMPANY: <ALPHA> ("&"|"@") <ALPHA> > >> >> > >> >> > // email addresses >> >> > | <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)* "@" <ALPHANUM> >> >> > (("."|"-") <ALPHANUM>)+ > >> >> > >> >> > // hostname >> >> > | <HOST: <ALPHANUM> ("." <ALPHANUM>)+ > >> >> > >> >> > // floating point, serial, model numbers, ip addresses, etc. >> >> > // every other segment must have at least one digit >> >> > | <NUM: (<ALPHANUM> <P> <HAS_DIGIT> >> >> > | <HAS_DIGIT> <P> <ALPHANUM> >> >> > | <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+ >> >> > | <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+ >> >> > | <ALPHANUM> <P> <HAS_DIGIT> (<P> <ALPHANUM> <P> >> <HAS_DIGIT>)+ >> >> > | <HAS_DIGIT> <P> <ALPHANUM> (<P> <HAS_DIGIT> <P> >> <ALPHANUM>)+ >> >> > ) >> >> > > >> >> > | <#P: ("_"|"-"|"/"|"."|",") > >> >> > | <#HAS_DIGIT: // at least one digit >> >> > (<LETTER>|<DIGIT>)* >> >> > <DIGIT> >> >> > (<LETTER>|<DIGIT>)* >> >> > > >> >> > >> >> > | < #ALPHA: (<LETTER>)+> >> >> > | < #LETTER: // unicode letters >> >> > [ >> >> > "\u0041"-"\u005a", >> >> > "\u0061"-"\u007a", >> >> > "\u00c0"-"\u00d6", >> >> > "\u00d8"-"\u00f6", >> >> > "\u00f8"-"\u00ff", >> >> > "\u0100"-"\u1fff", >> >> > "-", "_" >> >> > ] >> >> > > >> >> > >> >> > >> --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> > For additional commands, e-mail: [EMAIL PROTECTED] >> >> > >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6106920 >> >> Sent from the Lucene - Java Users forum at Nabble.com. >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6107649 >> Sent from the Lucene - Java Users forum at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6109067 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]