Alan Tanaman wrote:
> Arnaud,
>
> Absolutely.  As Nutch comes, the url field is searchable (and tokenized).
> You predicate the search to a specific field using a colon, for example by
> typing
>
> url:motherboard or url:"unix shell"
>
> The default search field (when no predicate is specified) is content.
>
> Generally the Lucene search syntax is supported (although I believe there
> are Nutch specific issues):
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> Best regards,
> Alan
> _________________________
> Alan Tanaman
> iDNA Solutions
> Tel: +44 (20) 7257 6125
> Mobile: +44 (7796) 932 362
> http://blog.idna-solutions.com
>
> -----Original Message-----
> From: Arnaud Goupil [mailto:[EMAIL PROTECTED] 
> Sent: 10 January 2007 10:04
> To: [email protected]
> Subject: How to index and return files names ?
>
> Hi,
>
> I would like Nutch to return results when search terms
> are found in the name of files known by the index.
>
> For example, my http location indexed by nutch
> contains various files, named :
>
>
> computer security.pdf
> unix shell.pdf
> motherboard specifications.pdf
>
>
> If I search "motherboard", I want Nutch to return a
> result pointing to my third document, even if this
> document does not contain the word "motherboard", only
> because it's in the name of the file.
>
> Is there a way to do this ?
>
> Thanks
>
> __________________________________________________
> Do You Yahoo!?
> En finir avec le spam? Yahoo! Mail vous offre la meilleure protection
> possible contre les messages non sollicités 
> http://mail.yahoo.fr Yahoo! Mail 
>
>
>   
As Alan suggested, you should search the url field. For searching the 
url field, you should include query-url plugin. But query-basic also 
queries the url field without adding the url: prefix to the query.
Also I suggest you to use the URLTokenizer in the 
http://issues.apache.org/jira/browse/NUTCH-389, which tokenizes the urls 
better.




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to