
Just realized that the various fields I have are part of the same document. But in order to leverage the KeywordAnalyzer, I would have to now have two sets of document. One document with the fields: title, content <--- analyzed by custom analyzer Other document with the fields: categoryNames < ---- analyzed by keyword analyzer

Is there a way I could have a single document object have some fields analyzed by my custom analyzer and the one field - "categoryNames" analyzed by the keyword analyzer?


Mufaddal Khumri wrote:

Hi Steve,

If I understand you right, I could use something like the Keyword analyzer to tokenize the entire stream as a single token and store that in the index. I could definitely the keyword analyzer while indexing this particular field "categoryNames".

Now my questions is on how to search and boost this since this is part of a bigger boolean query in my case.

My typical query actually looks like:

+(+content:digit +content:camera) +entity:product +(title:"digit camera"~2^40.0 ((title:digit title:camera)^10.0) content:"digit camera"~2^20.0 (content:digit content:camera) categoryNames:"digit camera"^80.0)

As you can see i was trying to do a phrase query on the categoryNames field and boosting it by 80.0. Also I am using the potter stemming filter to stem while searching. (I do this while indexing as well). If I go with the KeywordAnalyzer approach I can index the categoryNames field using this analyzer .

Would I be using the QueryParser to create my query and specify the keyword analyzer to it while searching on categoryNames ? (and then make that query part of my global boolean query?)


Steven Rowe wrote:

Mufaddal Khumri wrote:

lets say i do this while indexing:

doc.add(Field.Text("categoryNames", categoryNames));

Now while searching categoryNames, I do a search for "digital cameras". I only want to match the exact phrase digital cameras with documents who have exactly the phrase "digital cameras" in the categoryNames field. I do not want results that have "digital camera batteries" part of the result.

Whats the best way to accomplish this?

Hi Mufaddal,

One way to do this is to use the KeywordAnalyzer (in the Lucene Subversion trunk, but not in v1.4.3; will be in forthcoming v1.9) for the "categoryNames" field. This analyzer does not tokenize field contents, so "digital cameras" would be a single token, and the only thing that would match it would be the exact same single token. Be careful when you search to construct the search tokens similarly.

If you have other fields you want to search, and you want to tokenize their contents when you index them, you could use the PerFieldAnalyzerWrapper, so that the KeywordAnalyzer is only used for the "categoryNames" field.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to