Re: index U.K. U.S. U.N. U.V.

crspan Mon, 16 Jul 2007 20:17:05 -0700

Are we sure about KeywordAnalyzer here? Which suppose to "Tokenizes"the entire stream as a single token. (useful for data like zip codes,ids, and some product names.)

In the scenario we are discussing, U.S. is just a token within thetext and we still would like to leverage from StandardAnalyzer for allother goodies. I am sorry for the incomplete set up in previous message.

More or less, I expect somewhere we can instruct StandardTokenizer.jjthat U.S. is a special token (even it is indeed an ACRONYM) and weprefer to index it as U.S. as is. Can we do that?


Charlie



Otis Gospodnetic wrote:

Use KeywordAnalyzer to leave "U.S." as-is and index it as-is.

Otis
--
Lucene Consulting -- http://lucene-consulting.com/

----- Original Message ----
From: crspan <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, July 14, 2007 5:18:59 PM
Subject: index U.K. U.S. U.N. U.V.

Would you please advice the best practice of indexing:

  U.S.

The standard analyzer will transform it to be "us", which collide with"us"(we).


Thanks,

Charlie



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: index U.K. U.S. U.N. U.V.

Reply via email to