<<<I am using the StandardAnalyzer as most of the other fields being indexed are free form text. >>>
If you try Ahmet's suggestion, PerFieldAnalyzerWrapper is your friend. The snippet above makes me wonder if you've seen this class...... Best Erick On Sun, Nov 8, 2009 at 5:54 AM, AHMET ARSLAN <iori...@yahoo.com> wrote: > > Hi, > > > > How do I go about indexing domain names? I currently index > > the domain, but > > it only works if I put the exact full domain in. For > > example: > > > > site:www.youtube.com (this works) > > site:youtube.com (this doesn't work) > > > > I am using the StandardAnalyzer as most of the other fields > > being indexed > > are free form text. Currently the "site" field is stored > > and tokenized. > > StandardTokenizer recognizes www.youtube.com and youtube.com as singe > token. Therefore they do not match. You can use SimpleAnalyzer which uses > LetterTokenizer. So > > www.youtube.com will be broken into three tokens: www youtube com > youtube.com will be boreken into two tokens : youtube com > > By doing so site:youtube.com will bring you www.youtube.com > > But query site:youtube.com will also match a document like > www.foo.com/youtube.com > > Note that LetterTokenizer uses Character.isLetter() method to break text. > If your input has numbers like www.645cafe.com it will cause you problems. > > In your case it is better to extend CharTonizer and override protected > boolean isTokenChar(char c) method according to your needs. > > > As an additional improvement it would be even better if > > something like this > > worked: > > > > site:youtube.com/foo > > To accomplish this, you can pre-process your queries to strip from first > '/' char to the end. You need to convert youtube.com/foo/bla/bla to > youtube.com. > You can do it in a TokenFilter along with KeywordTokenizer with writing > custom code. > > Hope this helps. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >