0.8 has subcollection plugin. It can add subollection id for set of urls and then you can limit searching to subcollections. Is that what you're after?
-- Sami Siren Stefan Neufeind (JIRA) wrote: > [ > http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_12422226 > ] > >Stefan Neufeind commented on NUTCH-271: >--------------------------------------- > >Does somebody have an existing demo-plugin for that, that would catch >URL-prefixes from a file and in case matches are found certain tags are then >added? I don't yet fully get it how to do it "the elegant way" :-) > > > >>Meta-data per URL/site/section >>------------------------------ >> >> Key: NUTCH-271 >> URL: http://issues.apache.org/jira/browse/NUTCH-271 >> Project: Nutch >> Issue Type: New Feature >> Affects Versions: 0.7.2 >> Reporter: Stefan Neufeind >> >>We have the need to index sites and attach additional meta-data-tags to them. >>Afaik this is not yet possible, or is there a "workaround" I don't see? What >>I think of is using meta-tags per start-url, only indexing content below that >>URL, and have the ability to limit searches upon those meta-tags. E.g. >>http://www.example1.com/something1/ -> meta-tag "companybranch1" >>http://www.example2.com/something2/ -> meta-tag "companybranch2" >>http://www.example3.com/something3/ -> meta-tag "companybranch1" >>http://www.example4.com/something4/ -> meta-tag "companybranch3" >>search for everything in companybranch1 or across 1 and 3 or similar >> >> > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
