Would using Field.Index.UN_TOKENIZED be the same as tokenizing a field
into one token?

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 17, 2007 12:53 PM
To: java-user@lucene.apache.org
Subject: Re: thoughts/suggestions for analyzing/tokenizing class names

Either index them as a series of tokens:

org
org.apache
org.apache.lucene
org.apache.lucene.document
org.apache.lucene.document.Document

or index them as a single token, and use prefix queries (this is what  
I do for reverse domain names):

classname:(org.apache org.apache.*)

Note that "classname:org.apache*" would probably be wrong--you might  
not want to match

org.apache-fake.lucene.document

regards,
-Mike

On 17-Dec-07, at 9:39 AM, Beyer,Nathan wrote:

> Good point.
>
> I don't want the sub-package names on their own to match.
>
> Text (class name)
>  - "org.apache.lucene.document.Document"
> Queries that would match
>  - "org.apache", "org.apache.lucene.document"
> Queries that DO NOT match
>  - "apache", "lucene", "document"
>
> -Nathan
>
> -----Original Message-----
> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> Sent: Monday, December 17, 2007 11:29 AM
> To: java-user@lucene.apache.org
> Subject: Re: thoughts/suggestions for analyzing/tokenizing class names
>
> On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote:
>
>> I have a few fields that use package names and class names and I've
>> been
>> looking for some suggestions for analyzing these fields.
>>
>> A few examples -
>>
>> Text (class name)
>> - "org.apache.lucene.document.Document"
>> Queries that would match
>> - "org.apache" , "org.apache.lucene.document"
>>
>> Text (class name + method signature)
>> -- "org.apache.lucene.document.Document#add(Fieldable)"
>> Queries that would match
>> -- "org.apache.lucene", "org.apache.lucene.document.Document#add"
>>
>> Any thoughts on how to approach tokenizing these types of texts?
>
> Perhaps it would help to include some examples of queries you _don't_
> want to match.  For all the examples above, simply tokenizing
> alphanumeric components would suffice.
>
> -Mike
>
> ----------------------------------------------------------------------
> CONFIDENTIALITY NOTICE This message and any included attachments  
> are from Cerner Corporation and are intended only for the  
> addressee. The information contained in this message is  
> confidential and may constitute inside or non-public information  
> under international, federal, or state securities laws.  
> Unauthorized forwarding, printing, copying, distribution, or use of  
> such information is strictly prohibited and may be unlawful. If you  
> are not the addressee, please promptly delete this message and  
> notify the sender of the delivery error by e-mail or you may call  
> Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)  
> (816)221-1024.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to