Hi Sami, 1) That sound quite interesting. Is there any basic information how to work with that? Might be useful for something else I'm trying :-)
2) Original intent of my question was because I would need certain meta-data by which I can group (reduce? what's the correct word?) search-results. I have: - Website 1 to 4 with 50 pages each - Summary-website which olds one URL for each of the websites with a short profile etc. When doing a search I'd like to display only two matches per website but: - either show all matches from the summary-website and two matches per website 1 to 4 that have matches: showing e.g. 2 matches from website 1, for website 2 to 4 no pages match - but profiles for website 2 to 4 might match and thus would need to be displayed) - or display matches grouped by website, including the appropriate pages from the summary-website as well: in case there are matches from website 2 but also the profile for website 2 matches there would be the 2 best matches shown for website 2, which could be the profile from the summary-website as well as one match from website 2. But still a profile for website 3 might be shown as well - since that counts towards website 3, although it's URL (site-value) is actually part of the summary-website. What I currently have is that max. 2 matches are shown per website - but that also from the summary-website only 2 matches are shown. Either I'd need to be able to show only 2 matches per website but _all_ matches from the summary-website (would be okay in this case) or give website 1 to 4 individual "IDs per website" and also assign each URL from the summary-website the corresponding ID of the website it belongs to. (Note: I know all URLs of the summary-website beforehand, and know which website/website-ID each URL belongs to.) Sorry for the long explanation - but I hope I made it clear. How would that be doable? Regards, Stefan Sami Siren wrote: > 0.8 has subcollection plugin. It can add subollection id for set of urls > and then you can limit searching to subcollections. Is that what you're > after? > > -- > Sami Siren > > Stefan Neufeind (JIRA) wrote: > >> [ >> http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_12422226 >> ] Stefan Neufeind commented on NUTCH-271: >> --------------------------------------- >> >> Does somebody have an existing demo-plugin for that, that would catch >> URL-prefixes from a file and in case matches are found certain tags >> are then added? I don't yet fully get it how to do it "the elegant >> way" :-) >> >> >> >>> Meta-data per URL/site/section >>> ------------------------------ >>> >>> Key: NUTCH-271 >>> URL: http://issues.apache.org/jira/browse/NUTCH-271 >>> Project: Nutch >>> Issue Type: New Feature >>> Affects Versions: 0.7.2 >>> Reporter: Stefan Neufeind >>> >>> We have the need to index sites and attach additional meta-data-tags >>> to them. Afaik this is not yet possible, or is there a "workaround" I >>> don't see? What I think of is using meta-tags per start-url, only >>> indexing content below that URL, and have the ability to limit >>> searches upon those meta-tags. E.g. >>> http://www.example1.com/something1/ -> meta-tag "companybranch1" >>> http://www.example2.com/something2/ -> meta-tag "companybranch2" >>> http://www.example3.com/something3/ -> meta-tag "companybranch1" >>> http://www.example4.com/something4/ -> meta-tag "companybranch3" >>> search for everything in companybranch1 or across 1 and 3 or similar ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
