In conf/nutch-site.xml you'll have to specify the parser plugins in
order to index these files. For example, you can have a look at the
relevant markup in my file:-

<property>
  <name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|js|pdf|mp3|oo|msexcel|mspowerpoint|msword|pdf|rss|swf|zip)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin bla bla bla...</description>
</property>

As you can see, I have added parse-mp3 and parse-pdf in
'plugin.includes' property in conf/nutch-site.xml

If you want your search to be limited to these type of files, then you
have to configure the urlfilter too.

I am not sure whether I have understood your question properly but I
hope this information helps you.

Regards,
Susam Pal
http://susam.in/

On 7/27/07, Dmitry <[EMAIL PROTECTED]> wrote:
>
> What need to be configuration to search just spesific mp3 files or pdf
> files? Only using plugings? how to set crawlers in this case?
>
> thanks,
> DT,
> www.ejinz.com
> Search news
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to