To use these plugins you have to edit your conf/nutch-site.xml configuration
file incluing something like this:

<property>
 <name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|language-identifier|urlfilter-regex|parse-(text|html|pdf|msword)|index-(basic|more)|query-(basic|site|url|more)</value>
 <description>Regular expression naming plugin directory names to include.
Any plugin not matching this expression is excluded.</description>
</property>

This have to be done both in the backend application and the web
application. When you crawl a site, nutch will parse the file type in
parse-(text|html|pdf|msword). The index-more will give you more field's to
your lucene index (date, filetype, etc), making seachable by query-more
plugin, in your web application.

For a while, that's all. I hope I may help you :) . For more explanations
about nutch see:

http://lucene.apache.org/nutch/
http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html
http://wiki.media-style.com/display/nutchDocu/Home
http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine

Regards,

Lourival Júnior


On 4/24/07, ekoje ekoje <[EMAIL PROTECTED]> wrote:

I'm not sure to understand everything. I'm still a novice.

How can i use index-more and query-more ?
Do you mind to help me ?

Thanks
E
> You can use the plugins index-more and query-more to create a field on
> your
> index indicating the file type of the document. So, in you search you
can
> use "type:pdf" or "type:msword" to filter these files. I used nutch
0.7.2
> to
> make it work...
>
> Regards,
>
> Lourival Júnior
>
> On 4/24/07, ekoje ekoje <[EMAIL PROTECTED]> wrote:
>>
>> Hi Guys,
>>
>> I would like to add a new button on my webpage to make an adanced
search
>> using the keywords.
>> Once the user will click on it it will search for keywords only in the
>> different PDF/WORD or Excel document indexed.
>>
>> Do you know how i can filter/limit my search on PDF/WORD/EXCEL
documents
>> ?
>>
>> Thanks for your help.
>> E
>>
>
>
>
> --
> Lourival Junior
> Universidade Federal do Pará
> Curso de Bacharelado em Sistemas de Informação
> http://www.ufpa.br/cbsi
> Msn: [EMAIL PROTECTED]
>




--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to