Thank you Phil and Shai. I will write a different Analyzer.
On Sun, Aug 2, 2009 at 11:50 PM, Shai Erera <[email protected]> wrote: > You can always create your own Analyzer which creates a TokenStream just > like StandardAnalyzer, but instead of using StandardFilter, write another > TokenFilter which receives the HOST token type, and breaks it further to > its > components (e.g., extract "en", "wikipedia" and "org"). You can also return > the original HOST token and its components. > > I hope this helps. > > Shai > > On Sun, Aug 2, 2009 at 8:58 PM, prashant ullegaddi < > [email protected] > > wrote: > > > Hi Phil, > > > > The query you gave did work. Well, that proves StandardAnalyzer has a > > different way > > of tokenizing URLs. > > > > Thanks, > > Prashant. > > > > On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan <[email protected]> wrote: > > > > > Hi Prashant, > > > > > > I agree with Shai, that using Luke and printing out what the Document > > > looks like before it goes into the index, are going to be your best > > > bet for debugging this problem. > > > > > > The problem you're having is that StandardAnalyzer does not break-up > > > the hostname into separate terms, as it has a special case for > > > hostnames and acronyms. > > > > > > This should work... > > > +title:"rahul dravid" +url:"en.wikipedia.org" > > > > > > Thanks, > > > Phil > > > > > > On Sun, Aug 2, 2009 at 10:14 AM, prashant > > > ullegaddi<[email protected]> wrote: > > > > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and > > there > > > is > > > > a document relevant to this query as well. > > > > The following query and its results proves it: > > > > > > > > Enter query: > > > > Searching for: +title:"rahul dravid" +url:wiki > > > > 4 total matching documents > > > > trec-id: clueweb09-enwp02-13-14368, URL: > > > > http://en.wikipedia.org/wiki/Rahul_Dravid > > > > trec-id: clueweb09-enwp01-83-11378, URL: > > > > http://en.wikipedia.org/wiki/Rahul_S_Dravid > > > > trec-id: clueweb09-en0011-08-22737, URL: > > > > http://www.reference.com/browse/wiki/Rahul_Dravid > > > > trec-id: clueweb09-enwp01-69-13556, URL: > > > > http://en.wikipedia.org/wiki/Rahul_Sharad_Dravid > > > > Press (q)uit or enter number to jump to a page. > > > > > > > > But see following query: > > > > > > > > Enter query: > > > > +title:"rahul dravid" +url:"wikipedia" > > > > Searching for: +title:"rahul dravid" +url:wikipedia > > > > 0 total matching documents > > > > Press (q)uit or enter number to jump to a page. > > > > > > > > Isn't it weird? > > > > > > > > -- Prashant. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > >
