I think that I have problems querying for numbers and words with digits in them. Now that I think of it, is it possible it has something to do with the stemming in either the query filter or indexing? In either case, I would print out the text that is being indexed and the phrases added to the query. You could also using luke to inspect your index and see whether 20060801 shows up anywhere.
Howie >I tried looked for a page that had the date 20060801 and the text "test" in >the page. I tried the following: > >date: 20060801 test > >and > >date 20060721-20060803 test > >Neither worked, any ideas?? > >Matt > >Matthew Holt wrote: >>Thanks Jake, >> However, it seems to me that it makes most sense that a query should >>return all pages that match the query, instead of acting as a content >>filter. However, I know its something easy to suggest when you're not >>having to implement it, so just a suggestion. >> >>Matt >> >>Vanderdray, Jacob wrote: >>>Try querying with both the date and something you'd expect to find in the >>>content. The field query filter is just a filter. It only restricts >>>your results to things that match the basic query and has the contents >>>you require in the field. So if you query for "date:2006080 text" you'll >>>be searching for documents that contain "text" in one of the default >>>query fields and has the value 2006080 in the date field. Leaving out >>>text in that example would essentially be asking for nothing in the >>>default fields and 2006080 in the date field which is why it doesn't >>>return any results. >>> >>>Hope that helps, >>>Jake. >>> >>> >>>-----Original Message----- >>>From: Matthew Holt [mailto:[EMAIL PROTECTED] >>>Sent: Wed 8/2/2006 4:58 PM >>>To: [email protected] >>>Subject: Querying Fields >>> I am unable to query fields in my index in the method that has been >>>suggested. I used Luke to examine my index and the following field types >>>exist: >>>anchor, boost, content, contentLength, date, digest, host, lastModified, >>>primaryType, segment, site, subType, title, type, url >>> >>>However, when I do a search using one of the fields, followed by a colon, >>>an incorrect result is returned. I used Luke to find the top term in the >>>date field which is '20060801'. I then searched using the following >>>query: >>>date: 20060801 >>> >>>Unfortunately, nothing was returned. The correct plugins are enabled, >>>here is an excerpt from my nutch-site.xml: >>> >>><property> >>> <name>plugin.includes</name> >>> >>><value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|oo|pdf|msword|mspowerpoint|rtf|zip)|index-(basic|more)|query-(more|site|stemmer|url)|summary-basic|scoring-opic</value> >>> >>> <description>Regular expression naming plugin directory names to >>> include. Any plugin not matching this expression is excluded. >>> In any case you need at least include the nutch-extensionpoints >>>plugin. By >>> default Nutch includes crawling just HTML and plain text via HTTP, >>> and basic indexing and search plugins. >>> </description> >>></property> >>> >>> >>>Any ideas? I'm not the only one having the same problem, I saw an earlier >>>mailing list post but couldn't find any resolve... Thanks, >>> >>> Matt >>> >>> >>> >>> >> ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
