Hi Guys,

The problem I've found with the url: field is that if you try to search for
a word document with
url:doc  it will not only return foo.doc but also things like
/doc/text.html.

So is there an easy way to search on file type?  I don't believe it's
indexed out of the box, but that way Arnaud could do searches such as:

filetype:pdf motherboard

Regards,
Karl.



Enis Soztutar wrote:
> 
> Alan Tanaman wrote:
>> Arnaud,
>>
>> Absolutely.  As Nutch comes, the url field is searchable (and tokenized).
>> You predicate the search to a specific field using a colon, for example
>> by
>> typing
>>
>> url:motherboard or url:"unix shell"
>>
>> The default search field (when no predicate is specified) is content.
>>
>> Generally the Lucene search syntax is supported (although I believe there
>> are Nutch specific issues):
>> http://lucene.apache.org/java/docs/queryparsersyntax.html
>>
>> Best regards,
>> Alan
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-index-and-return-files-names---tf2951610.html#a8257753
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to