prashant_nutch wrote:
> Hi,
> Thanks for your early response.
> finally i got search result using subcollection,but still some issues,
> 1.can we should  search on more than 2 subcollection at same time?
>    like command 
>    subcollection:<subcollection name1> <term for search> .......
>  
>  can we extend this as  subcollection:<subcollection name1> <term for search
> <subcollection name2> <term for search2>
>  or how to achieve this?
>
>   
Actually you can but it requires a little work. Nutch parses the query 
by a predefined syntax using JavaCC generated classes, namely 
NutchAnalysis.java and NutchAnalysis.cc (Also see Query.parse()). 
Unfortunatelly the query syntax does not allow for parsing multiple 
terms for a field. And also the query syntax does not include boolean OR 
operation. So a query like

<query_term> <field1> : <term1>, <term2>

is not possible as well as a query like
<query_term> (<field1> :<term1> OR <field1>:<term2>)

So for your case, you can add this functionality to NutchAnalysis so 
share this with the community, so nutch has this wanted feature. 
Alternatively you can add the clauses to the Query object 
programmatically if you know the field a priori.

> 2.in subcollection if we want adding URLs after crawling,or removing from
> subcollection or 
>    merging two subcollection, each time we should do new crawl?
>
>   can we manage our subcollection according requirement and we don't want to
> recrawl again?(like subcollection A , B. Now we want add some URL from  A
> into B)
>
>  
>
>   
like above this is also not an issue of subcollection, but an issue of 
lucene herself. All the subcollection indexing extension does is to add 
a subcollection field to the document with possible values of 
subcollection names. Thus you can do all the operations on the index as 
you like. I suggest you learn more about lucene, by reading their wiki 
or one of the books. Also you can check out Solr, which manages the 
index more dynamically.




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to