Are we sure about KeywordAnalyzer here? Which suppose to "Tokenizes" the entire stream as a single token. (useful for data like zip codes, ids, and some product names.)

In the scenario we are discussing, U.S. is just a token within the text and we still would like to leverage from StandardAnalyzer for all other goodies. I am sorry for the incomplete set up in previous message.

More or less, I expect somewhere we can instruct StandardTokenizer.jj that U.S. is a special token (even it is indeed an ACRONYM) and we prefer to index it as U.S. as is. Can we do that?

Charlie



Otis Gospodnetic wrote:
Use KeywordAnalyzer to leave "U.S." as-is and index it as-is.

Otis
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: crspan <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, July 14, 2007 5:18:59 PM
Subject: index U.K. U.S. U.N. U.V.

Would you please advice the best practice of indexing:

  U.S.

The standard analyzer will transform it to be "us", which collide with "us"(we).

Thanks,

Charlie


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to