prashant_nutch wrote: > Hi, > Thanks for your early response. > finally i got search result using subcollection,but still some issues, > 1.can we should search on more than 2 subcollection at same time? > like command > subcollection:<subcollection name1> <term for search> ....... > > can we extend this as subcollection:<subcollection name1> <term for search > <subcollection name2> <term for search2> > or how to achieve this? > > Actually you can but it requires a little work. Nutch parses the query by a predefined syntax using JavaCC generated classes, namely NutchAnalysis.java and NutchAnalysis.cc (Also see Query.parse()). Unfortunatelly the query syntax does not allow for parsing multiple terms for a field. And also the query syntax does not include boolean OR operation. So a query like
<query_term> <field1> : <term1>, <term2> is not possible as well as a query like <query_term> (<field1> :<term1> OR <field1>:<term2>) So for your case, you can add this functionality to NutchAnalysis so share this with the community, so nutch has this wanted feature. Alternatively you can add the clauses to the Query object programmatically if you know the field a priori. > 2.in subcollection if we want adding URLs after crawling,or removing from > subcollection or > merging two subcollection, each time we should do new crawl? > > can we manage our subcollection according requirement and we don't want to > recrawl again?(like subcollection A , B. Now we want add some URL from A > into B) > > > > like above this is also not an issue of subcollection, but an issue of lucene herself. All the subcollection indexing extension does is to add a subcollection field to the document with possible values of subcollection names. Thus you can do all the operations on the index as you like. I suggest you learn more about lucene, by reading their wiki or one of the books. Also you can check out Solr, which manages the index more dynamically. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
