> -----Original Message----- > From: Oshima, Scott [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 08, 2002 11:53 PM > To: 'Lucene Users List' > Subject: character filter issue > > > Suppose we have one field with one string abc-xxx.com > > When I query for abc-xxx.com it returns 0 hits. > > BUT when i query for something like xxx.com it returns results fine. > > not sure what lucene is doing with the dashes. i am using the default > standardfilter, lowercasefilter, stopfilter and porterstemfilter. > > Does anyone know how to get around this? > > thanks. > > -scott > I don't but I would like to suggest that you should not analyze (tokenize) host names. Sorry if you didn't want to do that, only it was an example. If you analyze host names with standard stuffes, abc.xxx.com and abc-xxx.com are indexed with the same terms. I think the best way you can do is doing three fields: 1. site (host) name: jakarta.apache.org 2 (maybe you don't need it) domain name: apache.org 2. url: jakarta.apache.org/lucene/docs/index.html host and site name fields are not tokenized (and you can use a hash function to save space) but url field is tokenized why? we are running a search engine (not lucene based ;) and we have only tokenized url field; we can't search for documents from only index.hu domain (Hungarian portal) because we get back a lot of documents from such urls: www.something.com/locales/hu/index.html or index_hu.html or www.hu.com/index on google you can filter with site name on northern light you can filter with words in url peter