Are we sure about KeywordAnalyzer here? Which suppose to "Tokenizes"
the entire stream as a single token. (useful for data like zip codes,
ids, and some product names.)
In the scenario we are discussing, U.S. is just a token within the
text and we still would like to leverage from StandardAnalyzer for all
other goodies. I am sorry for the incomplete set up in previous message.
More or less, I expect somewhere we can instruct StandardTokenizer.jj
that U.S. is a special token (even it is indeed an ACRONYM) and we
prefer to index it as U.S. as is. Can we do that?
Charlie
Otis Gospodnetic wrote:
Use KeywordAnalyzer to leave "U.S." as-is and index it as-is.
Otis
--
Lucene Consulting -- http://lucene-consulting.com/
----- Original Message ----
From: crspan <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, July 14, 2007 5:18:59 PM
Subject: index U.K. U.S. U.N. U.V.
Would you please advice the best practice of indexing:
U.S.
The standard analyzer will transform it to be "us", which collide with
"us"(we).
Thanks,
Charlie
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]