ok thx, looks great. 2009/8/26 Fuad Efendi <[email protected]>
> Probably this is suitable: > > > <property> > <name>generate.max.per.host</name> > <value>-1</value> > <description>The maximum number of urls per host in a single > fetchlist. -1 if unlimited.</description> > </property> > > > [-topN N] - Number of top URLs to be selected > > > > -----Original Message----- > From: MilleBii [mailto:[email protected]] > Sent: August-26-09 5:39 AM > To: [email protected] > Subject: Re: Limiting number of URL from the same site in a fetch cycle > > db.max.outlinks.per.page will result in missing links ? Don't want that. > I just would want to balance them on a next fetch cycle. > > > > > 2009/8/26 Fuad Efendi <[email protected]> > > > You can filter some unnecessary "tail" using UrlFilter; for instance, > some > > sites may have long forums which you don't need, or shopping cart / > process > > to checkout pages which they forgot to restrict via robots.txt... > > > > Check regex-urlfilter.txt.template in /conf > > > > > > Another parameter which equalizes 'per-site' URLs is > > db.max.outlinks.per.page=100 (some sites may have 10 links per page, > others > > - 1000...) > > > > > > -Fuad > > http://www.linkedin.com/in/liferay > > http://www.tokenizer.org > > > > > > > > -----Original Message----- > > From: MilleBii [mailto:[email protected]] > > Sent: August-25-09 5:48 PM > > To: [email protected] > > Subject: Limiting number of URL from the same site in a fetch cycle > > > > I'm wondering if there is a setting by which you can limit the number of > > urls per site on a fetch list, not a on a total site. > > In this way I could avoid long tails in a fetch list all from the same > site > > so it takes damn long (5s per URL), I'd like to fetch them on the next > > cycle. > > > > -- > > -MilleBii- > > > > > > > > > -- > -MilleBii- > > > -- -MilleBii-
